I just came across a few interesting cases of various uncommon http requests,
which I think warrant improvements in WH. Here they are::
1) Post redirect requests. RIght now, if WH receives redirect request it
assumes that it is a get method. It is true in 98% of the cases, but nothing
limits the site to send post redirect request, which obviously causes issues
in WH. Browsers detect the type of the redirect request correctly. I did not
spend too much time on it and just put a simple fix by adding attribute to
http definition. However, the correct way should probably detect the type of
the redirect (just like browsers do).
2) Multiple redirects. One http call might trigger multiple redirect. Current
WH only supports one redirect, which again is correct for majority of the
cases. However, nothing prohibits site designers from having multiple
redirects after a call. Therefore I would suggest fixing this behavior in WH.
On a different topic. Are you moving to http-client 4 in the next version? I
think that should probably happen at some point...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is forbidden by HTTP protocol to use anything else then GET or HEAD on
redirects. It is clearly stated in RFC and it's done that way mainly for
security reasons.
I don't really believe browsers behave as you described, but even if some of
them do it is rather a bug than a feature.
Current WH only supports one redirect,
That's not true. WH uses Apache HttpClient underneath which handles all the
redirects and I am not aware about any issue here so far. Can you give me an
example where subsequent redirects are not handled properly?
P.S. I hope you have tried it in the WH 2.1 as WH 2.0 does have some issues in
<http> processor.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
No, WH 2.1 still uses HttpClient 3.x. I was trying to upgrade it, but in some
places refactoring required much more changes to be made in the codebase than
I expected. And since I didn't have any strong enough reasons to upgrade
(excepts the wish to be up-to-date) I reconsidered and decided to stick with
3.x for awhile.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have to say that I did not realize that POSTs are not allowed in redirects.
Though I use WH 2.0 (actually I use modified version of 1.0 to be precise, but
changed it quite a bit for performance reasons) I cannot enable redirect
support in http client due to the "bug" in the site where it requires
processing of post redirects. Again, I cannot upgrade to the latest WH, and
was just telling you some of the issues I find in case you want to fix them in
the main repo.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I just came across a few interesting cases of various uncommon http requests,
which I think warrant improvements in WH. Here they are::
1) Post redirect requests. RIght now, if WH receives redirect request it
assumes that it is a get method. It is true in 98% of the cases, but nothing
limits the site to send post redirect request, which obviously causes issues
in WH. Browsers detect the type of the redirect request correctly. I did not
spend too much time on it and just put a simple fix by adding attribute to
http definition. However, the correct way should probably detect the type of
the redirect (just like browsers do).
2) Multiple redirects. One http call might trigger multiple redirect. Current
WH only supports one redirect, which again is correct for majority of the
cases. However, nothing prohibits site designers from having multiple
redirects after a call. Therefore I would suggest fixing this behavior in WH.
On a different topic. Are you moving to http-client 4 in the next version? I
think that should probably happen at some point...
Redirects could ONLY be made by GET.
It is forbidden by HTTP protocol to use anything else then GET or HEAD on
redirects. It is clearly stated in RFC and it's done that way mainly for
security reasons.
I don't really believe browsers behave as you described, but even if some of
them do it is rather a bug than a feature.
That's not true. WH uses Apache HttpClient underneath which handles all the
redirects and I am not aware about any issue here so far. Can you give me an
example where subsequent redirects are not handled properly?
P.S. I hope you have tried it in the WH 2.1 as WH 2.0 does have some issues in
<http> processor.
No, WH 2.1 still uses HttpClient 3.x. I was trying to upgrade it, but in some
places refactoring required much more changes to be made in the codebase than
I expected. And since I didn't have any strong enough reasons to upgrade
(excepts the wish to be up-to-date) I reconsidered and decided to stick with
3.x for awhile.
I have to say that I did not realize that POSTs are not allowed in redirects.
Though I use WH 2.0 (actually I use modified version of 1.0 to be precise, but
changed it quite a bit for performance reasons) I cannot enable redirect
support in http client due to the "bug" in the site where it requires
processing of post redirects. Again, I cannot upgrade to the latest WH, and
was just telling you some of the issues I find in case you want to fix them in
the main repo.
What is that site that requires POST redirect? Can you give me an sample
scraper covering this case?
some IO error often happened .
httpclient may be inflexible