Menu

#179 HTTP header order

closed
None
5
2012-10-21
2010-07-07
James
No

Hello. Currently HtmlUnit sends HTTP request headers in random order. This is ok, according to the RFC, but browsers, e.g. IE, FF and Chrome send them in a specific order, placing Host at the second line, then User-Agent, Accept and so on. This may cause differences in results from real browsers and HtmlUnit Could you please add a way to set the preferred order, in which headers must appear? Presetting that order based on the BrowserVersion would be just great too.

Discussion

  • Marc Guillemot

    Marc Guillemot - 2010-07-07

    I knew this would came one day :-(

    This makes sense to me but I fear that it wont be easy. We will have to go more into details of HttpClient, maybe we will need to take the control of some parts to be sure to determine the header sequence.

    Can you provide some example of requests with the headers sent by HtmlUnit and the one send by real browsers?

     
  • James

    James - 2010-07-07

    Sure, for google.com, here are the headers:

    HtmlUnit 2.7 (BrowserVersion.FIREFOX_3)

    GET / HTTP/1.1
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1
    Accept-Language: en-us
    Accept: /
    Host: google.com

    Chrome 5.0.375.99

    GET / HTTP/1.1
    Host: google.com
    Connection: keep-alive
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4
    Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,/;q=0.5
    Accept-Encoding: gzip,deflate,sdch
    Accept-Language: en-US,en;q=0.8,uk-UA;q=0.6,uk;q=0.4,ru;q=0.2
    Accept-Charset: windows-1251,utf-8;q=0.7,*;q=0.3
    Cookie: ***

    Firefox 3.6.6

    GET / HTTP/1.1
    Host: google.com
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
    Accept-Language: en-us,en;q=0.5
    Accept-Encoding: gzip,deflate
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
    Keep-Alive: 115
    Connection: keep-alive
    Cookie: ***

    IE 8.0.7600

    GET / HTTP/1.1
    Accept: image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, /
    Accept-Language: en-US
    User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; .NET4.0C; .NET4.0E)
    Accept-Encoding: gzip, deflate
    Host: google.com
    Connection: Keep-Alive
    Cookie: ***

    Actually the problem can seen better if there are some headers set with addRequestHeader. I've added a bunch of the latter and with every request, they are sent in random order.

     
  • James

    James - 2010-07-07

    Do you think there is a quick-and-dirty way to reorder headers until a proper solution comes out? Is it possible to intercept the outgoing headers, parse them apart and join together in another order?

     
  • Marc Guillemot

    Marc Guillemot - 2010-07-08

    I don't think that it is possible to intercept headers a resort them. I've had a quick look at the implementation in HttpClient-4 yesterday and it seems that it is a FIFO order. I can imagine that a workaround would be to set some header fields by yourself (for ex Host) early and in the right order instead of letting HttpClient doing it later. This could be easily done by subclassing HttpWebConnection and modifying some methods there.

     
  • Ahmed Ashour

    Ahmed Ashour - 2010-07-09

    Hi James,

    As you know, HtmlUnit currently focuses on FF and IE.

    Would putting "Host" the first "User-Agent" the second be sufficient for you, in FF simulation?

     
  • Ahmed Ashour

    Ahmed Ashour - 2010-07-09

    Thanks for reporting, fixed in SVN, by making "Host" the first header, followed by "User-Agent" in FF simulation only.

     
  • James

    James - 2010-07-09

    Hi Ahmed,

    thanks, is it correct that to give it a try I have to svn co, mvn eclipse:eclipse and mvn package? BTW, I\'ve tried the latter, but got some tests wrong...

     
  • Ahmed Ashour

    Ahmed Ashour - 2010-07-13

    There was some issue in the code not related to this bug. Now resolved

    Please use the trunk, or you get latest snapshot from http://build.canoo.com/htmlunit/artifacts/

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.