Re: [Httplib2-discuss] Optional non-'redirect following'
Status: Beta
Brought to you by:
jcgregorio
From: Rene S. <re...@pt...> - 2007-04-20 14:48:11
|
Joe, I am probably completely wrong, but here is how I understand things: - redirects are made when the return status is 3xx - the redirect is made to the URI contained in the 'location' header When retrieving a page with httplib2, I get the final page of the chain, thus: status = 200 headers: content-location=<<<new url, as you said>>> content of final page There is NO location field (which is correct for 200, no?). So, the browser (which is a client of my program) does not get a 3xx, and no 'location', so does not (and cannot) redirect based on 3xx/location. If I interpret RFC 2616 (14.14) correctly, the content-location MAY (not MUST) be used by the client, (uppercase here not used to shout, but RFC-style:-), and it seems that the three browsers I used to test (Firefox, Opera and Konqueror) consistently ignore it, and consequently fetch page content (images etc) from the wrong site. Switching of redirections solves the problem... Here is another thing I saw while testing: when redirecting to another host name, the host header in the request is not modified. This causes a redirection loop. Example: import httplib2 t='cnn.com' h=httplib2.Http(cache='/tmp/cache') h.force_exception_to_status_code = False r,o=h.request('http://%s/' % t,headers={'Host':t },) Here, a redirection should be made to www.cnn.com, but the host header is never updated (inserting a header print into the lib shows this). Inserting the following: > headers['host'] = authority right before the redirection call in _request(...) > (response, content) = self.request(location, redirect_method ... solves this problem (which is quite common, there are many sites that have 'alias' virtual host names that redirect to the 'primary' site name) Thanks for your assistance, René Joe Gregorio wrote: > Rene, > The 0.3.0 release of httplib2 adds a 'content-location' header > to every response. That 'content-location' header contains > the last URI in the redirect chain, which should be used > on subsequent requests. Is that content-location not getting > through back to the requester? > > -joe > > > On 4/19/07, Rene Schmit <re...@pt...> wrote: >> Hello, >> >> I use the library in a proxy to retrieve pages from the target servers. >> >> In this context, the automatic redirect following is no good, as the >> browser will not know about the redirect. Consequently, further requests >> will be sent to the wrong host, resulting in errors. >> >> My solution to the problem is to add a flag to the Http class, in the >> constructor: >> >> self.ignore_redirects = False >> >> and to change this line >> > if (self.follow_all_redirects or method in ["GET", "HEAD"]) or >> response.status == 303: >> >> to: >> < if (not self.ignore_redirects) and (self.follow_all_redirects or >> method in ["GET", "HEAD"]) or response.status == 303: >> >> >> Thus, the default behavior of the library remains unchanged, but gives >> the calling code a chance to retrieve the 'raw' reply from the server. >> >> Does it make sense to add this feature to the library? What about >> caching for those replies? >> >> René Schmit >> >> ------------------------------------------------------------------------- >> >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> Httplib2-discuss mailing list >> Htt...@li... >> https://lists.sourceforge.net/lists/listinfo/httplib2-discuss >> > > |