Hello. Currently HtmlUnit sends HTTP request headers in random order. This is ok, according to the RFC, but browsers, e.g. IE, FF and Chrome send them in a specific order, placing Host
at the second line, then User-Agent
, Accept
and so on. This may cause differences in results from real browsers and HtmlUnit Could you please add a way to set the preferred order, in which headers must appear? Presetting that order based on the BrowserVersion would be just great too.
I knew this would came one day :-(
This makes sense to me but I fear that it wont be easy. We will have to go more into details of HttpClient, maybe we will need to take the control of some parts to be sure to determine the header sequence.
Can you provide some example of requests with the headers sent by HtmlUnit and the one send by real browsers?
Sure, for google.com, here are the headers:
HtmlUnit 2.7 (BrowserVersion.FIREFOX_3)
GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1
Accept-Language: en-us
Accept: /
Host: google.com
Chrome 5.0.375.99
GET / HTTP/1.1
Host: google.com
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4
Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,/;q=0.5
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8,uk-UA;q=0.6,uk;q=0.4,ru;q=0.2
Accept-Charset: windows-1251,utf-8;q=0.7,*;q=0.3
Cookie: ***
Firefox 3.6.6
GET / HTTP/1.1
Host: google.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: ***
IE 8.0.7600
GET / HTTP/1.1
Accept: image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, /
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; .NET4.0C; .NET4.0E)
Accept-Encoding: gzip, deflate
Host: google.com
Connection: Keep-Alive
Cookie: ***
Actually the problem can seen better if there are some headers set with addRequestHeader. I've added a bunch of the latter and with every request, they are sent in random order.
Do you think there is a quick-and-dirty way to reorder headers until a proper solution comes out? Is it possible to intercept the outgoing headers, parse them apart and join together in another order?
I don't think that it is possible to intercept headers a resort them. I've had a quick look at the implementation in HttpClient-4 yesterday and it seems that it is a FIFO order. I can imagine that a workaround would be to set some header fields by yourself (for ex Host) early and in the right order instead of letting HttpClient doing it later. This could be easily done by subclassing HttpWebConnection and modifying some methods there.
Hi James,
As you know, HtmlUnit currently focuses on FF and IE.
Would putting "Host" the first "User-Agent" the second be sufficient for you, in FF simulation?
Thanks for reporting, fixed in SVN, by making "Host" the first header, followed by "User-Agent" in FF simulation only.
Hi Ahmed,
thanks, is it correct that to give it a try I have to svn co, mvn eclipse:eclipse and mvn package? BTW, I\'ve tried the latter, but got some tests wrong...
There was some issue in the code not related to this bug. Now resolved
Please use the trunk, or you get latest snapshot from http://build.canoo.com/htmlunit/artifacts/