HTMLUnit is not properly encoding UTF-8 request URLs. This causes failures when servers strictly validate those URLs.
Minimal test case:
A Spring controller that calls HttpServletRequest.getParameterMap(), such as:
@RequestMapping(value = "/", method = {RequestMethod.GET, RequestMethod.HEAD})
public String welcome(HttpServletRequest request) {
request.getParameterMap();
return "index";
}
Run on Jetty 9.4.x (9.4.9.v20180320 and 9.4.11.v20180605 are confirmed), eg using the Jetty Maven plugin
Expected: The page loads and ignores the parameter
Actual: A server error occurs, reporting:
org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte A0 in state 0
at org.eclipse.jetty.util.Utf8Appendable.appendByte (Utf8Appendable.java:253)
at org.eclipse.jetty.util.Utf8Appendable.append (Utf8Appendable.java:158)
at org.eclipse.jetty.util.UrlEncoded.decodeUtf8To (UrlEncoded.java:354)
at org.eclipse.jetty.util.UrlEncoded.decodeUtf8To (UrlEncoded.java:296)
at org.eclipse.jetty.http.HttpURI.decodeQueryTo (HttpURI.java:615)
at org.eclipse.jetty.server.Request.extractQueryParameters (Request.java:437)
at org.eclipse.jetty.server.Request.getParameters (Request.java:401)
at org.eclipse.jetty.server.Request.getParameterMap (Request.java:1035)
Wire logging from com.gargoylesoftware.htmlunit.WebClient indicates that the URL is being encoded to /?param=Publisher%60s?%A0International%E9?Pty%A9Ltd
Firefox would instead encode this URL as /?param=Publisher`s%E2%80%93%C2%A0International%C3%A9%E2%80%94Pty%C2%A9Ltd
Note: The unencoded URL is
Last edit: Thrawn 2018-07-12
Thanks for the report, hopefully i have addressed the issue in the right way, was not that simple and many of our test expectations where outdated. Please try the latest snapshot build and report if it now works for you.
Thanks for using HtmlUnit
2.32 is out, will close this