Thread: [Httplib2-discuss] Optional non-'redirect following'
Status: Beta
Brought to you by:
jcgregorio
From: Rene S. <re...@pt...> - 2007-04-19 09:07:41
|
Hello, I use the library in a proxy to retrieve pages from the target servers. In this context, the automatic redirect following is no good, as the browser will not know about the redirect. Consequently, further requests will be sent to the wrong host, resulting in errors. My solution to the problem is to add a flag to the Http class, in the constructor: self.ignore_redirects = False and to change this line > if (self.follow_all_redirects or method in ["GET", "HEAD"]) or response.status == 303: to: < if (not self.ignore_redirects) and (self.follow_all_redirects or method in ["GET", "HEAD"]) or response.status == 303: Thus, the default behavior of the library remains unchanged, but gives the calling code a chance to retrieve the 'raw' reply from the server. Does it make sense to add this feature to the library? What about caching for those replies? René Schmit |
From: Rene S. <re...@pt...> - 2007-04-19 09:36:47
|
Hello again, sorry for this error in my post, it should read: change this line: if response.status in [300, 301, 302, 303, 307]: by: if (not self.ignore_redirects) and response.status in [300, 301, 302, 303, 307]: Rene Schmit wrote: > Hello, > > I use the library in a proxy to retrieve pages from the target servers. > > In this context, the automatic redirect following is no good, as the > browser will not know about the redirect. Consequently, further requests > will be sent to the wrong host, resulting in errors. > > My solution to the problem is to add a flag to the Http class, in the > constructor: > > self.ignore_redirects = False > > and to change this line > > if (self.follow_all_redirects or method in ["GET", "HEAD"]) or > response.status == 303: > > to: > < if (not self.ignore_redirects) and (self.follow_all_redirects or > method in ["GET", "HEAD"]) or response.status == 303: > > > Thus, the default behavior of the library remains unchanged, but gives > the calling code a chance to retrieve the 'raw' reply from the server. > > Does it make sense to add this feature to the library? What about > caching for those replies? > > René Schmit > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Httplib2-discuss mailing list > Htt...@li... > https://lists.sourceforge.net/lists/listinfo/httplib2-discuss > > > |
From: Joe G. <jo...@bi...> - 2007-04-19 19:00:59
|
Rene, The 0.3.0 release of httplib2 adds a 'content-location' header to every response. That 'content-location' header contains the last URI in the redirect chain, which should be used on subsequent requests. Is that content-location not getting through back to the requester? -joe On 4/19/07, Rene Schmit <re...@pt...> wrote: > Hello, > > I use the library in a proxy to retrieve pages from the target servers. > > In this context, the automatic redirect following is no good, as the > browser will not know about the redirect. Consequently, further requests > will be sent to the wrong host, resulting in errors. > > My solution to the problem is to add a flag to the Http class, in the > constructor: > > self.ignore_redirects =3D False > > and to change this line > > if (self.follow_all_redirects or method in ["GET", "HEAD"]) or > response.status =3D=3D 303: > > to: > < if (not self.ignore_redirects) and (self.follow_all_redirects or > method in ["GET", "HEAD"]) or response.status =3D=3D 303: > > > Thus, the default behavior of the library remains unchanged, but gives > the calling code a chance to retrieve the 'raw' reply from the server. > > Does it make sense to add this feature to the library? What about > caching for those replies? > > Ren=E9 Schmit > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Httplib2-discuss mailing list > Htt...@li... > https://lists.sourceforge.net/lists/listinfo/httplib2-discuss > --=20 Joe Gregorio http://bitworking.org |
From: Rene S. <re...@pt...> - 2007-04-20 14:48:11
|
Joe, I am probably completely wrong, but here is how I understand things: - redirects are made when the return status is 3xx - the redirect is made to the URI contained in the 'location' header When retrieving a page with httplib2, I get the final page of the chain, thus: status = 200 headers: content-location=<<<new url, as you said>>> content of final page There is NO location field (which is correct for 200, no?). So, the browser (which is a client of my program) does not get a 3xx, and no 'location', so does not (and cannot) redirect based on 3xx/location. If I interpret RFC 2616 (14.14) correctly, the content-location MAY (not MUST) be used by the client, (uppercase here not used to shout, but RFC-style:-), and it seems that the three browsers I used to test (Firefox, Opera and Konqueror) consistently ignore it, and consequently fetch page content (images etc) from the wrong site. Switching of redirections solves the problem... Here is another thing I saw while testing: when redirecting to another host name, the host header in the request is not modified. This causes a redirection loop. Example: import httplib2 t='cnn.com' h=httplib2.Http(cache='/tmp/cache') h.force_exception_to_status_code = False r,o=h.request('http://%s/' % t,headers={'Host':t },) Here, a redirection should be made to www.cnn.com, but the host header is never updated (inserting a header print into the lib shows this). Inserting the following: > headers['host'] = authority right before the redirection call in _request(...) > (response, content) = self.request(location, redirect_method ... solves this problem (which is quite common, there are many sites that have 'alias' virtual host names that redirect to the 'primary' site name) Thanks for your assistance, René Joe Gregorio wrote: > Rene, > The 0.3.0 release of httplib2 adds a 'content-location' header > to every response. That 'content-location' header contains > the last URI in the redirect chain, which should be used > on subsequent requests. Is that content-location not getting > through back to the requester? > > -joe > > > On 4/19/07, Rene Schmit <re...@pt...> wrote: >> Hello, >> >> I use the library in a proxy to retrieve pages from the target servers. >> >> In this context, the automatic redirect following is no good, as the >> browser will not know about the redirect. Consequently, further requests >> will be sent to the wrong host, resulting in errors. >> >> My solution to the problem is to add a flag to the Http class, in the >> constructor: >> >> self.ignore_redirects = False >> >> and to change this line >> > if (self.follow_all_redirects or method in ["GET", "HEAD"]) or >> response.status == 303: >> >> to: >> < if (not self.ignore_redirects) and (self.follow_all_redirects or >> method in ["GET", "HEAD"]) or response.status == 303: >> >> >> Thus, the default behavior of the library remains unchanged, but gives >> the calling code a chance to retrieve the 'raw' reply from the server. >> >> Does it make sense to add this feature to the library? What about >> caching for those replies? >> >> René Schmit >> >> ------------------------------------------------------------------------- >> >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> Httplib2-discuss mailing list >> Htt...@li... >> https://lists.sourceforge.net/lists/listinfo/httplib2-discuss >> > > |
From: Joe G. <jo...@bi...> - 2007-05-03 13:30:34
|
On 4/20/07, Rene Schmit <re...@pt...> wrote: > Joe, > > I am probably completely wrong, but here is how I understand things: > > - redirects are made when the return status is 3xx > - the redirect is made to the URI contained in the 'location' header > > When retrieving a page with httplib2, I get the final page of the chain, > thus: > status = 200 > headers: content-location=<<<new url, as you said>>> > content of final page > > There is NO location field (which is correct for 200, no?). So, the > browser (which is a client of my program) does not get a 3xx, and no > 'location', so does not (and cannot) redirect based on 3xx/location. If > I interpret RFC 2616 (14.14) correctly, the content-location MAY (not > MUST) be used by the client, (uppercase here not used to shout, but > RFC-style:-), and it seems that the three browsers I used to test > (Firefox, Opera and Konqueror) consistently ignore it, and consequently > fetch page content (images etc) from the wrong site. Switching of > redirections solves the problem... Ok, now that makes perfect sense. I have added a 'follow_redirects' attribute to Http() and updated the documentation and unit tests, all available on trunk. > Here is another thing I saw while testing: > when redirecting to another host name, the host header in the request is > not modified. This causes a redirection loop. Example: > > import httplib2 > t='cnn.com' > h=httplib2.Http(cache='/tmp/cache') > h.force_exception_to_status_code = False > r,o=h.request('http://%s/' % t,headers={'Host':t },) Is there a particular reason you are manually setting the Host: header? Httplib2 does that for you automatically by pulling it out of the request URI. In general, I like to allow the user to set any header and have that override the default behavior, following the basic principle that the user knows best, but I'm open to the idea that that principle might break down on redirects. Thanks, -joe -- Joe Gregorio http://bitworking.org |
From: Rene S. <re...@pt...> - 2007-05-03 17:16:55
|
Right, not setting the header is the way to go. My proxy simply reused headers it got from the browser, so normally, I would have to remove 'head : some.target.site' .. except that now, I can turn redirects of :-), and this becomes a non-issue! Thanks for you support, René > Is there a particular reason you are manually setting the > Host: header? Httplib2 does that for you automatically > by pulling it out of the request URI. > > In general, I like to allow the user to set any header > and have that override the default behavior, following > the basic principle that the user knows best, but I'm open > to the idea that that principle might break down > on redirects. > > Thanks, > -joe > |