Re: [Httplib2-discuss] Optional non-'redirect following'

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Joe,

I am probably completely wrong, but here is how I understand things:

- redirects are made when the return status is 3xx
- the redirect is made to the URI contained in the 'location' header

When retrieving a page with httplib2, I get the final page of the chain, 
thus:
status = 200
headers: content-location=<<<new url, as you said>>>
content of final page

There is NO location field (which is correct for 200, no?). So, the 
browser (which is a client of my program) does not get a 3xx, and no 
'location', so does not (and cannot) redirect based on 3xx/location. If 
I interpret RFC 2616 (14.14) correctly, the content-location MAY (not 
MUST) be used by the client, (uppercase here not used to shout, but 
RFC-style:-), and it seems that the three browsers I used to test 
(Firefox, Opera and Konqueror) consistently ignore it, and consequently 
fetch page content (images etc) from the wrong site. Switching of 
redirections solves the problem...

Here is another thing I saw while testing:
when redirecting to another host name, the host header in the request is 
not modified. This causes a redirection loop. Example:

import httplib2
t='cnn.com'
h=httplib2.Http(cache='/tmp/cache')
h.force_exception_to_status_code = False
r,o=h.request('http://%s/' % t,headers={'Host':t },)

Here, a redirection should be made to www.cnn.com, but the host header 
is never updated (inserting a header print into the lib shows this). 
Inserting the following:

 > headers['host'] = authority

right before  the redirection call in _request(...)

 > (response, content) = self.request(location, redirect_method ...

solves this problem (which is quite common, there are many sites that 
have 'alias' virtual host names that redirect to the 'primary' site name)

Thanks for your assistance,
René

Joe Gregorio wrote:
> Rene,
>    The 0.3.0 release of httplib2 adds a 'content-location' header
> to every response. That 'content-location' header contains
> the last URI in the redirect chain, which should be used
> on subsequent requests. Is that content-location not getting
> through back to the requester?
>
>   -joe
>
>
> On 4/19/07, Rene Schmit <re...@pt...> wrote:
>> Hello,
>>
>> I use the library in a proxy to retrieve pages from the target servers.
>>
>> In this context, the automatic redirect following is no good, as the
>> browser will not know about the redirect. Consequently, further requests
>> will be sent to the wrong host, resulting in errors.
>>
>> My solution to the problem is to add a flag to the Http class, in the
>> constructor:
>>
>> self.ignore_redirects = False
>>
>> and to change this line
>>  >       if  (self.follow_all_redirects or method in ["GET", "HEAD"]) or
>> response.status == 303:
>>
>> to:
>> <     if  (not self.ignore_redirects) and (self.follow_all_redirects or
>> method in ["GET", "HEAD"]) or response.status == 303:
>>
>>
>> Thus, the default behavior of the library remains unchanged, but gives
>> the calling code a chance to retrieve the 'raw' reply from the server.
>>
>> Does it make sense to add this feature to the library? What about
>> caching for those replies?
>>
>> René Schmit
>>
>> ------------------------------------------------------------------------- 
>>
>> This SF.net email is sponsored by DB2 Express
>> Download DB2 Express C - the FREE version of DB2 express and take
>> control of your XML. No limits. Just data. Click to get it now.
>> http://sourceforge.net/powerbar/db2/
>> _______________________________________________
>> Httplib2-discuss mailing list
>> Htt...@li...
>> https://lists.sourceforge.net/lists/listinfo/httplib2-discuss
>>
>
>