Thread: [Httplib2-discuss] Optional non-'redirect following'

Status: Beta

Brought to you by: jcgregorio

httplib2-discuss

[Httplib2-discuss] Optional non-'redirect following'

From: Rene S. <re...@pt...> - 2007-04-19 09:07:41

Hello,

I use the library in a proxy to retrieve pages from the target servers.

In this context, the automatic redirect following is no good, as the 
browser will not know about the redirect. Consequently, further requests 
will be sent to the wrong host, resulting in errors.

My solution to the problem is to add a flag to the Http class, in the 
constructor:

self.ignore_redirects = False

and to change this line
 >       if  (self.follow_all_redirects or method in ["GET", "HEAD"]) or 
response.status == 303:

to:
<     if  (not self.ignore_redirects) and (self.follow_all_redirects or 
method in ["GET", "HEAD"]) or response.status == 303:


Thus, the default behavior of the library remains unchanged, but gives 
the calling code a chance to retrieve the 'raw' reply from the server.

Does it make sense to add this feature to the library? What about 
caching for those replies?

René Schmit

Re: [Httplib2-discuss] Optional non-'redirect following'

From: Rene S. <re...@pt...> - 2007-04-19 09:36:47

 Hello again,

sorry for this error in my post, it should read:

 change this line:


          if response.status in [300, 301, 302, 303, 307]:

by:
           if (not self.ignore_redirects) and response.status in [300, 
301, 302, 303, 307]:


Rene Schmit wrote:
> Hello,
>
> I use the library in a proxy to retrieve pages from the target servers.
>
> In this context, the automatic redirect following is no good, as the 
> browser will not know about the redirect. Consequently, further requests 
> will be sent to the wrong host, resulting in errors.
>
> My solution to the problem is to add a flag to the Http class, in the 
> constructor:
>
> self.ignore_redirects = False
>
> and to change this line
>  >       if  (self.follow_all_redirects or method in ["GET", "HEAD"]) or 
> response.status == 303:
>
> to:
> <     if  (not self.ignore_redirects) and (self.follow_all_redirects or 
> method in ["GET", "HEAD"]) or response.status == 303:
>
>
> Thus, the default behavior of the library remains unchanged, but gives 
> the calling code a chance to retrieve the 'raw' reply from the server.
>
> Does it make sense to add this feature to the library? What about 
> caching for those replies?
>
> René Schmit
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Httplib2-discuss mailing list
> Htt...@li...
> https://lists.sourceforge.net/lists/listinfo/httplib2-discuss
>
>
>

Re: [Httplib2-discuss] Optional non-'redirect following'

From: Joe G. <jo...@bi...> - 2007-04-19 19:00:59

Rene,
    The 0.3.0 release of httplib2 adds a 'content-location' header
to every response. That 'content-location' header contains
the last URI in the redirect chain, which should be used
on subsequent requests. Is that content-location not getting
through back to the requester?

   -joe


On 4/19/07, Rene Schmit <re...@pt...> wrote:
> Hello,
>
> I use the library in a proxy to retrieve pages from the target servers.
>
> In this context, the automatic redirect following is no good, as the
> browser will not know about the redirect. Consequently, further requests
> will be sent to the wrong host, resulting in errors.
>
> My solution to the problem is to add a flag to the Http class, in the
> constructor:
>
> self.ignore_redirects =3D False
>
> and to change this line
>  >       if  (self.follow_all_redirects or method in ["GET", "HEAD"]) or
> response.status =3D=3D 303:
>
> to:
> <     if  (not self.ignore_redirects) and (self.follow_all_redirects or
> method in ["GET", "HEAD"]) or response.status =3D=3D 303:
>
>
> Thus, the default behavior of the library remains unchanged, but gives
> the calling code a chance to retrieve the 'raw' reply from the server.
>
> Does it make sense to add this feature to the library? What about
> caching for those replies?
>
> Ren=E9 Schmit
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Httplib2-discuss mailing list
> Htt...@li...
> https://lists.sourceforge.net/lists/listinfo/httplib2-discuss
>


--=20
Joe Gregorio        http://bitworking.org

Re: [Httplib2-discuss] Optional non-'redirect following'

From: Rene S. <re...@pt...> - 2007-04-20 14:48:11

Joe,

I am probably completely wrong, but here is how I understand things:

- redirects are made when the return status is 3xx
- the redirect is made to the URI contained in the 'location' header

When retrieving a page with httplib2, I get the final page of the chain, 
thus:
status = 200
headers: content-location=<<<new url, as you said>>>
content of final page

There is NO location field (which is correct for 200, no?). So, the 
browser (which is a client of my program) does not get a 3xx, and no 
'location', so does not (and cannot) redirect based on 3xx/location. If 
I interpret RFC 2616 (14.14) correctly, the content-location MAY (not 
MUST) be used by the client, (uppercase here not used to shout, but 
RFC-style:-), and it seems that the three browsers I used to test 
(Firefox, Opera and Konqueror) consistently ignore it, and consequently 
fetch page content (images etc) from the wrong site. Switching of 
redirections solves the problem...


Here is another thing I saw while testing:
when redirecting to another host name, the host header in the request is 
not modified. This causes a redirection loop. Example:

import httplib2
t='cnn.com'
h=httplib2.Http(cache='/tmp/cache')
h.force_exception_to_status_code = False
r,o=h.request('http://%s/' % t,headers={'Host':t },)

Here, a redirection should be made to www.cnn.com, but the host header 
is never updated (inserting a header print into the lib shows this). 
Inserting the following:

 > headers['host'] = authority

right before  the redirection call in _request(...)

 > (response, content) = self.request(location, redirect_method ...

solves this problem (which is quite common, there are many sites that 
have 'alias' virtual host names that redirect to the 'primary' site name)

Thanks for your assistance,
René




Joe Gregorio wrote:
> Rene,
>    The 0.3.0 release of httplib2 adds a 'content-location' header
> to every response. That 'content-location' header contains
> the last URI in the redirect chain, which should be used
> on subsequent requests. Is that content-location not getting
> through back to the requester?
>
>   -joe
>
>
> On 4/19/07, Rene Schmit <re...@pt...> wrote:
>> Hello,
>>
>> I use the library in a proxy to retrieve pages from the target servers.
>>
>> In this context, the automatic redirect following is no good, as the
>> browser will not know about the redirect. Consequently, further requests
>> will be sent to the wrong host, resulting in errors.
>>
>> My solution to the problem is to add a flag to the Http class, in the
>> constructor:
>>
>> self.ignore_redirects = False
>>
>> and to change this line
>>  >       if  (self.follow_all_redirects or method in ["GET", "HEAD"]) or
>> response.status == 303:
>>
>> to:
>> <     if  (not self.ignore_redirects) and (self.follow_all_redirects or
>> method in ["GET", "HEAD"]) or response.status == 303:
>>
>>
>> Thus, the default behavior of the library remains unchanged, but gives
>> the calling code a chance to retrieve the 'raw' reply from the server.
>>
>> Does it make sense to add this feature to the library? What about
>> caching for those replies?
>>
>> René Schmit
>>
>> ------------------------------------------------------------------------- 
>>
>> This SF.net email is sponsored by DB2 Express
>> Download DB2 Express C - the FREE version of DB2 express and take
>> control of your XML. No limits. Just data. Click to get it now.
>> http://sourceforge.net/powerbar/db2/
>> _______________________________________________
>> Httplib2-discuss mailing list
>> Htt...@li...
>> https://lists.sourceforge.net/lists/listinfo/httplib2-discuss
>>
>
>

Re: [Httplib2-discuss] Optional non-'redirect following'

From: Joe G. <jo...@bi...> - 2007-05-03 13:30:34

On 4/20/07, Rene Schmit <re...@pt...> wrote:
> Joe,
>
> I am probably completely wrong, but here is how I understand things:
>
> - redirects are made when the return status is 3xx
> - the redirect is made to the URI contained in the 'location' header
>
> When retrieving a page with httplib2, I get the final page of the chain,
> thus:
> status = 200
> headers: content-location=<<<new url, as you said>>>
> content of final page
>
> There is NO location field (which is correct for 200, no?). So, the
> browser (which is a client of my program) does not get a 3xx, and no
> 'location', so does not (and cannot) redirect based on 3xx/location. If
> I interpret RFC 2616 (14.14) correctly, the content-location MAY (not
> MUST) be used by the client, (uppercase here not used to shout, but
> RFC-style:-), and it seems that the three browsers I used to test
> (Firefox, Opera and Konqueror) consistently ignore it, and consequently
> fetch page content (images etc) from the wrong site. Switching of
> redirections solves the problem...

Ok, now that makes perfect sense. I have added a 'follow_redirects'
attribute to Http() and updated the documentation and unit tests, all
available on trunk.

> Here is another thing I saw while testing:
> when redirecting to another host name, the host header in the request is
> not modified. This causes a redirection loop. Example:
>
> import httplib2
> t='cnn.com'
> h=httplib2.Http(cache='/tmp/cache')
> h.force_exception_to_status_code = False
> r,o=h.request('http://%s/' % t,headers={'Host':t },)

Is there a particular reason you are manually setting the
Host: header? Httplib2 does that for you automatically
by pulling it out of the request URI.

In general, I like to allow the user to set any header
and have that override the default behavior, following
the basic principle that the user knows best, but I'm open
to the idea that that principle might break down
on redirects.

   Thanks,
   -joe

-- 
Joe Gregorio        http://bitworking.org

Re: [Httplib2-discuss] Optional non-'redirect following'

From: Rene S. <re...@pt...> - 2007-05-03 17:16:55

Right, not setting the header is the way to go.

My proxy simply reused headers it got from the browser, so normally, I 
would have to remove  'head : some.target.site' .. except that now, I 
can turn redirects of :-), and this becomes a non-issue!

Thanks for you support,
René

> Is there a particular reason you are manually setting the
> Host: header? Httplib2 does that for you automatically
> by pulling it out of the request URI.
>
> In general, I like to allow the user to set any header
> and have that override the default behavior, following
> the basic principle that the user knows best, but I'm open
> to the idea that that principle might break down
> on redirects.
>
>   Thanks,
>   -joe
>