Menu

#157 Migrate to Apache HTTP Client 4.0

closed
None
5
2012-10-21
2010-02-12
No

Following this discussion : http://markmail.org/message/zxojka3zwxnylqov

I am submitting a patch that migrates HTMLUnit to HttpClient 4.0. The patch currently is only for the main source code. The patch for all the jUnit tests is almost complete, but I will wait for your comments before I submit it.

A few remarks :
- My experience with HTMLUnit is still a bit limited
- HttpWebConnection and WebClient contain the core changes
- WebClient now encapsulate an HttpClient instance. Seems a more logically fit than HttpWebConnection for HttpClient 4.0.

Well, share your comments !

Regards,

Nicolas

Discussion

1 2 > >> (Page 1 of 2)
  • Daniel Gredler

    Daniel Gredler - 2010-02-12

    Thanks for the patch! I'll have a look over the weekend, but in the meantime, can you make sure all of the code uses spaces for indentation? The tabs probably won't pass our formatting checks (mvn checkstyle:checkstyle) and they make the diff larger than necessary.

     
  • Daniel Gredler

    Daniel Gredler - 2010-02-12

    Also, while I have a look through the code, can you elaborate on your comment that WebClient is a better place for the HttpClient instance? From a design perspective, the various WebConnection implementations are supposed to abstract away the connection details (and make things like the MockWebConnection possible).

     
  • Anonymous

    Anonymous - 2010-02-17

    In HttpClient 4, http method objects are merely messages. They do not hold the response body and connection status. The HttpResponse object now encapsulate these things.

    A quick comparison:

    In HttpClient 3 :
    GetMethod get = new GetMethod(url);
    httpclient.executeMethod(method);
    byte[] responseBody = method.getResponseBody();

    HttpClient 4
    HttpGet httpget = new HttpGet("http://localhost/");
    HttpResponse response = httpclient.execute(httpget);
    HttpEntity entity = response.getEntity();
    InputStream content = entity.getContent();

    Also, a lot of configuration options, like cookies, are now managed in httpClient.
    So, in my humble opinion, HttpClient and WebClient are now closer in their purpose and WebClient should encapsulate an instance of HttpClient.
    (The execution of methods still happends in HttpWebConnection.)

     
  • Marc Guillemot

    Marc Guillemot - 2010-02-17

    @nicolasbelisle
    the comparison doesn't help me to understand why it would be valuable to have an HttpClient instance in WebClient :-(
    Do you have other arguments?

     
  • Anonymous

    Anonymous - 2010-02-17

    In WebClient (from the patch), the HttpClient instance is used in mostly two places :

    getCookieManager()
    In HttpClient 4, the cookie policy is managed by the HttpClient object, not the method objects.

    setUseInsecureSSL()
    It now updates the schemeRegistry for the HttpClient instance. This replaces the Protocol.registerProtocol/unregisterProtocol in HttpClient 3.1.

    AbstractHttpClient has more configuration options in HtpClient 4.0, similar to the configuration of WebClient.

    Is it possible to manage the HttpClient instance in HttpWebConnection. Some information would have to be passed to HttpWebConnection, but nothing complicated.

    If you are still not sure… I see no problem in updating my patch.

     
  • Anonymous

    Anonymous - 2010-02-23

    Hello again,
    I just added an updated patch and the unit tests. Also added my pom.xml file.
    The patch still uses an httpclient instance in webclient. See previous comment(s) for explanations.

     
  • Daniel Gredler

    Daniel Gredler - 2010-02-23

    Cool! I'll have a look in the next couple of days and post feedback.

    BTW, I've opened an issue with the HttpClient guys regarding serialization of Credentials instances; if this makes it into HC 4.1, it would get rid of some of the serialization-related hacks in the HU code.

    https://issues.apache.org/jira/browse/HTTPCLIENT-916

     
  • Daniel Gredler

    Daniel Gredler - 2010-02-25

    I had a quick look at v2 of the patch, but it's very hard to see what has really changed; there are lots of these sorts of changes:

    • }
    • else {
    • } else {

      • Creates a clone of this instance, and clears cached state
      • to be not shared with the original.
    • *
      • Creates a clone of this instance, and clears cached state to be not
      • shared with the original.

    This is just noise that keeps the patch from being readable... can you get rid of all of these pure formatting changes and resubmit? Thanks!

     
  • Anonymous

    Anonymous - 2010-02-26

    Added 2 new patches. Should look better !
    Quick question: the ClientConnectionManager of HttpClient (v.4.0) should be shutdown() after usage. Would WebClient.closeAllWindows() be the proper place to do this ?
    Thanks

     
  • Anonymous

    Anonymous - 2010-05-05

    Just added the shutdown call in WebClient.closeAllWindows().
    See patch_main_v4.txt.
    The patch is getting old now. I'm using it in Constellio (our open-source crawl/search engine).
    Any interest in getting this added ?
    Regards

     
  • Ahmed Ashour

    Ahmed Ashour - 2010-05-06

    Hi Nicolas,

    Thanks for updating the patch, you can remove old versions.

    I have the following comments:
    - You can add @author of your name to all significantly changed files
    - WebClient should never know about httpClient, so make the latter part of HttpWebConnection instead.
    - I am inclined to make closing httpClient after each request, not in WebClient.closeAllWindows()

    @Daniel: your feedback please?

     
  • Anonymous

    Anonymous - 2010-05-07

    Hi,

    -- WebClient should never know about httpClient, so make the latter part of HttpWebConnection instead. --

    From HttpClient documentation : http://hc.apache.org/httpcomponents-client/tutorial/html/connmgmt.html
    "HttpClient has a complete control over the process of connection initialization and termination as well as I/O operations on active connections."
    Creating and closing an HttpClient instance for each HttpWebConnection is not using HttpClient appropriately.

    In the patch, connection opening/closing is still done in HttpWebConnection, but it can now benefit from HttpClient's connection pool and centralized configuraion.

    HttpClient has changed and I think it's important to re-consider how it now fits into HtmlUnit.
    By the way, HttpClient initialization is still lazy.

    Also, if there's some design aspects you want to discuss, I'm available on Skype (nicolas.belisle).

    • If can get me agree my design is just bad, I offer a pint next time you come in Canada :)

    Cheers !

    Nicolas

     
  • Ahmed Ashour

    Ahmed Ashour - 2010-05-07

    Hi Nicolas,

    I have no issue of delgating whatever supported by HttpClient (cookie management, open connections, etc).

    By "WebClient should never know about httpClient", I meant that HttpClient is specific to HttpWebConnection, not to other types of connections (e.g. FalsifyingWebConnection), and when HttpClient is upgraded to future versions, only HttpWebConnection should be modified
    When I said ""

     
  • Anonymous

    Anonymous - 2010-05-09

    I get it.

    I've modified the patches.

    A few things;
    - I added code to implement a maximum content length for WebClient and a corresponding test case (WebClient4Test).
    - There's a few tests that fail. I would need input on those. My knowledge of HtmlUnit lacks on certain aspects.
    - I've added a few FIXME comments in the code.

     
  • Ahmed Ashour

    Ahmed Ashour - 2010-05-09

    Hi Nicolas,

    The patch is not updated, I uploaded one against SVN trunk (httpclient4.patch).

    - Where is WebClient4Test?
    - Please read http://htmlunit.sourceforge.net/submittingPatches.html, code should pass 'mvn checkstyle:checkstyle'
    - You have to investigate about the failure of GWT20Test, HtmlSubmitInputTest, WebClient3Test, HTMLImageElementTest and GAESupportTest.
    - What input you need for those tests? You can use the dev-list
    - I wonder why WebClient.setMaxContentLength() was added? Should the user be allow to restrict it?
    
     
  • Ahmed Ashour

    Ahmed Ashour - 2010-05-09

    Against r5703

     
  • Anonymous

    Anonymous - 2010-05-11

    src/main/java

     
  • Anonymous

    Anonymous - 2010-05-11

    src/test/java

     
  • Anonymous

    Anonymous - 2010-05-11

    maven project file

     
  • Anonymous

    Anonymous - 2010-05-11

    Hi,

    I updated my patch files (see main_patch.txt and test_patch.txt below). Ran checkstyle and unit tests. Did some important changes.

    WebClient3Test still fails because of a bug in HttpClient 4. I just filled an issue in their JIRA
    https://issues.apache.org/jira/browse/HTTPCLIENT-939

    WebClient.setMaxContentLength() what added so that users can restrict the maximum length for any content fetched by WebClient. It will throw an IOException in two cases:
    1. The header specifies a larger than allowed content length, or
    2. The content Inputstream reads farther the allowed content length.

    I added "WebClient4Test.java" in the patch to test the new feature.

    I added an updated "pom.xml" file with proper formatting.

    There are few key changes:
    - NTLM support in HttpClient 4 is not officially provided. See http://hc.apache.org/httpcomponents-client/ntlm.html
    - WebClient. setUseInsecureSSL() cannot be statically registered, as with HttpClient 3. I’m using the "instanceOf" operator as I workaround which is a fragile approach.
    - WebClient.closeAllWindows() has a similar problem for HttpWebConnection shutdown.

    It's a lot of work so far, hope it's helpful.

    Regards

    • And my word for the pint :)
     
  • Jakub Liska

    Jakub Liska - 2010-05-14

    Hi
    I checked out the trunk now, installed core-js and when I'm installing htmlunit itself, I'm getting a lot of compilation errors like this:
    HtmlPage.java:[33,49] package net.sourceforge.htmlunit.corejs.javascript does not exist
    HTMLElement.java:[30,49] package net.sourceforge.htmlunit.corejs.javascript does not exist

    I really need to use httpclient 4.0 together with htmlunit, could please anybody explain to me what to do?

    Thank you, Joseph

     
  • Ahmed Ashour

    Ahmed Ashour - 2010-05-14

    Please use the user-list for such questions, this is not related to this Feature Request

     
  • Ahmed Ashour

    Ahmed Ashour - 2010-05-19

    Nicolas, your work speaks for itself, thanks so much for the effort.

    Most of the patch is committed in SVN, with the following comments:
    1- Huge amount of HttpClient debugging is automatically printed, to be switched off
    2- NTLM authentication to be available (with allowing the possibility to use HttpClient unofficial mechanism)
    3- Insecure SSL to be tested

    Those three points must be verified/resolved before closing this issue

    The following changes was made to pass all tests:
    4- Custom httpClient_.setRedirectHandler() was used to resolve the HttpClient 'bug', see HttpWebConnection.getHttpClient()
    5- HttpWebConnection.setUseInsecureSSL() code was moved to a package private class to allow GAE tests to pass.

    Others:
    6- setMaxContentLength() was removed as I don't find it useful, please use the dev-list for further discussion
    7- XMLHttpRequest: there was an incorrectly hardcoded "username"/"password"
    8- You can put your name in test cases if you wish, please provide a patch

    Thanks again

     
  • Ahmed Ashour

    Ahmed Ashour - 2010-05-20

    Hi again,

    1 HttpClient debugging is now resolved

    3 Insecure SSL is not working, please verify

     
  • Anonymous

    Anonymous - 2010-05-22

    SSLInsecure Patch

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.