Menu

Suggested fixes to WH

newbee
2011-05-09
2012-09-04
  • newbee

    newbee - 2011-05-09

    I just came across a few interesting cases of various uncommon http requests,
    which I think warrant improvements in WH. Here they are::

    1) Post redirect requests. RIght now, if WH receives redirect request it
    assumes that it is a get method. It is true in 98% of the cases, but nothing
    limits the site to send post redirect request, which obviously causes issues
    in WH. Browsers detect the type of the redirect request correctly. I did not
    spend too much time on it and just put a simple fix by adding attribute to
    http definition. However, the correct way should probably detect the type of
    the redirect (just like browsers do).

    2) Multiple redirects. One http call might trigger multiple redirect. Current
    WH only supports one redirect, which again is correct for majority of the
    cases. However, nothing prohibits site designers from having multiple
    redirects after a call. Therefore I would suggest fixing this behavior in WH.

    On a different topic. Are you moving to http-client 4 in the next version? I
    think that should probably happen at some point...

     
  • Alex Wajda

    Alex Wajda - 2011-05-09

    Post redirect requests

    Redirects could ONLY be made by GET.

    It is forbidden by HTTP protocol to use anything else then GET or HEAD on
    redirects. It is clearly stated in RFC and it's done that way mainly for
    security reasons.

    I don't really believe browsers behave as you described, but even if some of
    them do it is rather a bug than a feature.

    Current WH only supports one redirect,

    That's not true. WH uses Apache HttpClient underneath which handles all the
    redirects and I am not aware about any issue here so far. Can you give me an
    example where subsequent redirects are not handled properly?

    P.S. I hope you have tried it in the WH 2.1 as WH 2.0 does have some issues in
    <http> processor.

     
  • Alex Wajda

    Alex Wajda - 2011-05-09

    No, WH 2.1 still uses HttpClient 3.x. I was trying to upgrade it, but in some
    places refactoring required much more changes to be made in the codebase than
    I expected. And since I didn't have any strong enough reasons to upgrade
    (excepts the wish to be up-to-date) I reconsidered and decided to stick with
    3.x for awhile.

     
  • newbee

    newbee - 2011-05-09

    I have to say that I did not realize that POSTs are not allowed in redirects.
    Though I use WH 2.0 (actually I use modified version of 1.0 to be precise, but
    changed it quite a bit for performance reasons) I cannot enable redirect
    support in http client due to the "bug" in the site where it requires
    processing of post redirects. Again, I cannot upgrade to the latest WH, and
    was just telling you some of the issues I find in case you want to fix them in
    the main repo.

     
  • Alex Wajda

    Alex Wajda - 2011-05-10

    What is that site that requires POST redirect? Can you give me an sample
    scraper covering this case?

     
  • jacob

    jacob - 2012-05-30

    some IO error often happened .

    httpclient may be inflexible

     

Log in to post a comment.