httplib2-discuss Mailing List for httplib2 (Page 3)
Status: Beta
Brought to you by:
jcgregorio
You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
(4) |
Apr
(11) |
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(7) |
Nov
(8) |
Dec
(8) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(9) |
Feb
(1) |
Mar
|
Apr
(4) |
May
(4) |
Jun
(1) |
Jul
(5) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
(2) |
Oct
|
Nov
|
Dec
(8) |
2010 |
Jan
(6) |
Feb
(3) |
Mar
(3) |
Apr
(4) |
May
(4) |
Jun
(7) |
Jul
(1) |
Aug
(1) |
Sep
(2) |
Oct
(4) |
Nov
|
Dec
|
From: Joe G. <jo...@bi...> - 2006-11-15 15:01:30
|
On 11/8/06, Blair Zajac <bl...@or...> wrote: > Hi, > > We're building a PyQt app around httplib2 and don't want to block the GUI while > httplib2 is making a request of our database backed REST API. > > Besides just doing your own threading around httplib2, is there another way to > do this? What about Twisted or asyncore? I haven't looked into this in great > detail, so if anybody has any directions and/or experience, that would be > appreciated. Right now that's not possible given how httplib2 (and httplib upon which it depends) is built. Given the number of requests I am getting it seems like a good idea to add a selector/generator style of interaction. But that looks like it will take a major re-architecting, so it may be a while. Thanks, -joe -- Joe Gregorio http://bitworking.org |
From: Joe G. <jo...@bi...> - 2006-11-15 06:04:29
|
On 11/14/06, Sam Ruby <ru...@in...> wrote: > It looks like the current implementation takes the md5 of a somewhat > normalized URI and passes that as a key to the cache. > > For debug-ability and to increase the potential for integration with > other subsystems, I'd like to suggest that this be changed to pass > either the original URI unaltered or a normalized URI with the logic to > do the normalization refactored out into a separate function. This is a great idea, I was frankly jealous when I looked at the Venus cache and saw that it was filled with non-opaque named files. > Either way, the current FileCache could do the remaining > normalization/hashing of the key. Other storage systems could either > use the key as is, or could employ a different hash mechanism. That works for me, shouldn't take long to implement. Thanks, -joe -- Joe Gregorio http://bitworking.org |
From: Sam R. <ru...@in...> - 2006-11-15 02:38:30
|
It looks like the current implementation takes the md5 of a somewhat normalized URI and passes that as a key to the cache. For debug-ability and to increase the potential for integration with other subsystems, I'd like to suggest that this be changed to pass either the original URI unaltered or a normalized URI with the logic to do the normalization refactored out into a separate function. Either way, the current FileCache could do the remaining normalization/hashing of the key. Other storage systems could either use the key as is, or could employ a different hash mechanism. Potential use cases, based on Planet Venus: 1) Occasionally, I find it useful to force a re-fetch/re-parse, and this is most easily done if I can delete an individual file. This is easier to do reliably if the file name is less opaque. Venus already has a function which will compute a readable name in most cases. 2) Like Robert Leftwich, I have a time consuming process that I would like to optimize away whenever possible. And there are a lot of broken servers out there which do not support either ETag or Last Modified (for feeds, this has been estimated at about 30%). If I can retrieve the feed from the cache before the fetch, and compare it to the value after the fetch, I can treat the response as if it were a 304. 3) Not all systems are CPU constrained. Others are memory constrained. The threading logic that you created for Venus (which is much appreciated) builds up a queue of feeds. Given that this data is out on disk, it need not be in memory while in the queue. Ideally for this scenario, it would be ideal if there were an option so that httplib2 doesn't return the content in the first place, as it does now even for a 304. For use cases #2 and #3, it would be desirable to be able to access the cache without involving HTTP, and for that there needs to at a minimum be a predictable algorithm for computing the cache key. I can imagine other hypothetical scenarios involving sharing this cache with other applications, but hopefully these illustrate the requirement sufficiently. - Sam Ruby P.S. Other normalization ideas can be found here: http://www.intertwingly.net/blog/2004/08/04/Urlnorm For example, scheme is supposed to be case insensitive. |
From: Blair Z. <bl...@or...> - 2006-11-08 23:29:13
|
Hi, We're building a PyQt app around httplib2 and don't want to block the GUI while httplib2 is making a request of our database backed REST API. Besides just doing your own threading around httplib2, is there another way to do this? What about Twisted or asyncore? I haven't looked into this in great detail, so if anybody has any directions and/or experience, that would be appreciated. Regards, Blair -- Blair Zajac, Ph.D. http://www.orcaware.com/svn/ |
From: Robert L. <ro...@le...> - 2006-10-28 06:01:45
|
Joe Gregorio wrote: > Ok, I added in the ignore_etag attribute with a unit test > and updated the documentation. Try the latest > version from svn trunk and see if that works for you. > All looks good. I have not yet been able to verify that after an etag change the cache is not invalidated if ignore_etag is True, as the etag has been unchanged for a while now. But as soon as it does change I'll let you know as a final, real life confirmation :-) (not that I'm concerned at all, the tests cover it all). Thanks a lot! Robert |
From: Joe G. <jo...@bi...> - 2006-10-28 05:14:59
|
Ok, I added in the ignore_etag attribute with a unit test and updated the documentation. Try the latest version from svn trunk and see if that works for you. Thanks, -joe On 10/27/06, Robert Leftwich <ro...@le...> wrote: > Joe Gregorio wrote: > > Actually I would prefer to see somthing like the > > Http.follow_all_redirects attribute: > > > > http://bitworking.org/projects/httplib2/ref/http-objects.html > > > > I worry about cluttering up request() with even more > > parameters than it already has. > > > > Yep, good point. > > How's this look: > > $ diff -n __init__orig.py __init__.py > a549 1 > self.ignore_etag = False > d698 1 > a698 1 > if method in ["PUT"] and self.cache and info.has_key('etag') and not > self.ignore_etag: > d733 1 > a733 1 > if info.has_key('etag') and not self.ignore_etag: > > > with the documentation along the lines of: > > ignore_etag > Defaults to false. If true, then any etags present in the cached response > are ignored when processing the current request, i.e. httplib2 does *not* use > 'if-match' for PUT or 'if-none-match' when GET or HEAD requests are made. This > is mainly to deal with broken servers, such as IIS, which change the etag after > a server configuration change or reboot, even if the content is unchanged. > > Robert > -- Joe Gregorio http://bitworking.org |
From: Robert L. <ro...@le...> - 2006-10-28 03:42:49
|
Joe Gregorio wrote: > Actually I would prefer to see somthing like the > Http.follow_all_redirects attribute: > > http://bitworking.org/projects/httplib2/ref/http-objects.html > > I worry about cluttering up request() with even more > parameters than it already has. > Yep, good point. How's this look: $ diff -n __init__orig.py __init__.py a549 1 self.ignore_etag = False d698 1 a698 1 if method in ["PUT"] and self.cache and info.has_key('etag') and not self.ignore_etag: d733 1 a733 1 if info.has_key('etag') and not self.ignore_etag: with the documentation along the lines of: ignore_etag Defaults to false. If true, then any etags present in the cached response are ignored when processing the current request, i.e. httplib2 does *not* use 'if-match' for PUT or 'if-none-match' when GET or HEAD requests are made. This is mainly to deal with broken servers, such as IIS, which change the etag after a server configuration change or reboot, even if the content is unchanged. Robert |
From: Joe G. <jo...@bi...> - 2006-10-28 03:07:42
|
Actually I would prefer to see somthing like the Http.follow_all_redirects attribute: http://bitworking.org/projects/httplib2/ref/http-objects.html I worry about cluttering up request() with even more parameters than it already has. -joe On 10/27/06, Robert Leftwich <ro...@le...> wrote: > Joe Gregorio wrote: > > > > Yeah, I certainly wouldn't want it to be the default > > behaviour, but a switch should be easy to add. > > > > So, something like? > > $ diff -n __init__orig.py __init__.py > d650 1 > a650 1 > def request(self, uri, method="GET", body=None, headers=None, > redirections=DEFAULT_MAX_REDIRECTS, ignore_etag=False): > a653 1 > ignore_etag - set to True to not use etag even if present (e.g. IIS can > change etags on configuration change or reboot w/o change to content) > d698 1 > a698 1 > if method in ["PUT"] and self.cache and info.has_key('etag') and not > ignore_etag: > d733 1 > a733 1 > if info.has_key('etag') and not ignore_etag: > > Robert > -- Joe Gregorio http://bitworking.org |
From: Robert L. <ro...@le...> - 2006-10-28 02:56:18
|
Joe Gregorio wrote: > > Yeah, I certainly wouldn't want it to be the default > behaviour, but a switch should be easy to add. > So, something like? $ diff -n __init__orig.py __init__.py d650 1 a650 1 def request(self, uri, method="GET", body=None, headers=None, redirections=DEFAULT_MAX_REDIRECTS, ignore_etag=False): a653 1 ignore_etag - set to True to not use etag even if present (e.g. IIS can change etags on configuration change or reboot w/o change to content) d698 1 a698 1 if method in ["PUT"] and self.cache and info.has_key('etag') and not ignore_etag: d733 1 a733 1 if info.has_key('etag') and not ignore_etag: Robert |
From: Joe G. <jo...@bi...> - 2006-10-28 02:35:00
|
On 10/27/06, Robert Leftwich <ro...@le...> wrote: > I'm attempting to use httplib2 on a site served by IIS, owned by a government > department. An issue I'm seeing is that the etag is changing reasonably often, > even though the documents I want are unchanged and this triggers off a time > consuming process that I'd rather not do if nothing has changed (hence the use > of httplib2). A quick google tuns up the fact that on IIS, by default etags will > change when a configuration change is made or the server is rebooted. There are > fixes available for this but getting the department to apply them would be > difficult, if not impossible. It would be handy if I could tell httplib2 *not* > to use the etag, even if it were available and instead rely on the last-modified > header. > > Is this something that is likely to be allowed into httplib2? Yeah, I certainly wouldn't want it to be the default behaviour, but a switch should be easy to add. -joe -- Joe Gregorio http://bitworking.org |
From: Robert L. <ro...@le...> - 2006-10-28 01:38:45
|
I'm attempting to use httplib2 on a site served by IIS, owned by a government department. An issue I'm seeing is that the etag is changing reasonably often, even though the documents I want are unchanged and this triggers off a time consuming process that I'd rather not do if nothing has changed (hence the use of httplib2). A quick google tuns up the fact that on IIS, by default etags will change when a configuration change is made or the server is rebooted. There are fixes available for this but getting the department to apply them would be difficult, if not impossible. It would be handy if I could tell httplib2 *not* to use the etag, even if it were available and instead rely on the last-modified header. Is this something that is likely to be allowed into httplib2? Robert |
From: Joe G. <joe...@gm...> - 2006-07-27 17:16:10
|
Simon, Sorry for the slow response, I was out getting educated on what it means to be an IBMer :) More comments inline: On 7/25/06, Simon Willison <swi...@gm...> wrote: > I've been playing with httplib2 and so far I like it - it sucks an > awful lot less than the stuff in the standard library. That said, > here are a few observations. > > 1. If you give it a URL to a site that is down / doesn't exist you > get a low level socket error. It would be nice if this was a > documented httplib2 exception. > > >>> import httplib2 > >>> httplib2.Http().request('http://non-existant-domain-oeu.com') > Traceback (most recent call last): > File "<stdin>", line 1, in ? > File "httplib2/__init__.py", line 781, in request > (response, content) = self._request(conn, authority, uri, > request_uri, method, body, headers, redirections, cachekey) > File "httplib2/__init__.py", line 603, in _request > (response, content) = self._conn_request(conn, request_uri, > method, body, headers) > File "httplib2/__init__.py", line 581, in _conn_request > conn.connect() > File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/ > python2.3/httplib.py", line 535, in connect > socket.SOCK_STREAM): > socket.gaierror: (7, 'No address associated with nodename') > >>> Yup, that's a bug. Now logged. > 2. I'm a little uncomfortable with the way authentication works - it > seems like it would be very easy to add some credentials and then > forget to clear them, potentially passing them to other sites as you > reuse the Http() instance. I would prefer something like this: > > http = Http() > http.add_credentials('https://del.icio.us/', 'username', 'password') > > That way the class knows to only send that username and password to > URLs that start with https://del.icio.us/. Is there a reason this > isn't done at the moment? For the most part I thought that you would only be connecting to one server, but I agree, this does have the possibility of leaking names and passwords to other servers unintentionally. I like the solution you suggested but would put the domain name as a last, and optional, parameter. Logged as a feature request. > > 3. Finally, I ran in to a gotcha when doing a POST request: the > service I was talking to required me to include a Content-Type: > application/x-www-form-urlencoded header. It would be nice if one of > the examples in the documentation showed this - it would have saved > me quite a bit of debugging. > > On that last note, has any consideration been given to adding a > convenience method for the common case where you wish to POST a > dictionary of name/value pairs? Something like the following: > > http = Http() > http.post_form(url, {'name': 'value'}) > > The post_form method would run urllib.urlencode on the argument > dictionary and automatically add the application/x-www-form-urlencode > header. Good idea, at the very least the docs need to be updated, I will need to poke around at what url encoding functions already exist before deciding what to add to httplib2. Logged as an enchancement request. Thanks, -joe -- Joe Gregorio http://bitworking.org |
From: Simon W. <swi...@gm...> - 2006-07-25 12:42:19
|
I've been playing with httplib2 and so far I like it - it sucks an awful lot less than the stuff in the standard library. That said, here are a few observations. 1. If you give it a URL to a site that is down / doesn't exist you get a low level socket error. It would be nice if this was a documented httplib2 exception. >>> import httplib2 >>> httplib2.Http().request('http://non-existant-domain-oeu.com') Traceback (most recent call last): File "<stdin>", line 1, in ? File "httplib2/__init__.py", line 781, in request (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey) File "httplib2/__init__.py", line 603, in _request (response, content) = self._conn_request(conn, request_uri, method, body, headers) File "httplib2/__init__.py", line 581, in _conn_request conn.connect() File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/ python2.3/httplib.py", line 535, in connect socket.SOCK_STREAM): socket.gaierror: (7, 'No address associated with nodename') >>> 2. I'm a little uncomfortable with the way authentication works - it seems like it would be very easy to add some credentials and then forget to clear them, potentially passing them to other sites as you reuse the Http() instance. I would prefer something like this: http = Http() http.add_credentials('https://del.icio.us/', 'username', 'password') That way the class knows to only send that username and password to URLs that start with https://del.icio.us/. Is there a reason this isn't done at the moment? 3. Finally, I ran in to a gotcha when doing a POST request: the service I was talking to required me to include a Content-Type: application/x-www-form-urlencoded header. It would be nice if one of the examples in the documentation showed this - it would have saved me quite a bit of debugging. On that last note, has any consideration been given to adding a convenience method for the common case where you wish to POST a dictionary of name/value pairs? Something like the following: http = Http() http.post_form(url, {'name': 'value'}) The post_form method would run urllib.urlencode on the argument dictionary and automatically add the application/x-www-form-urlencode header. Thanks, Simon Willison |
From: Dan C. <con...@w3...> - 2006-04-27 12:33:29
|
On Apr 26, 2006, at 10:16 PM, Joe Gregorio wrote: > Dan, > I'm not quite sure I understand what you are trying to > accomplish with that patch. The idea of how it works > now is if _entry_disposition returns FRESH but there > is no cache entry then a 504 is returned. Do you have > test case or site where that fails? I neglected to capture the test details; maybe I can reconstruct it. But what actually happens on a cache miss is not a 504 but a python stacktrace. -- Dan Connolly, W3C http://www.w3.org/People/Connolly/ |
From: Joe G. <joe...@gm...> - 2006-04-27 02:16:54
|
Dan, I'm not quite sure I understand what you are trying to accomplish with that patch. The idea of how it works now is if _entry_disposition returns FRESH but there is no cache entry then a 504 is returned. Do you have test case or site where that fails? Thanks, -joe On 4/19/06, Dan Connolly <con...@w3...> wrote: > On Wed, 2006-04-19 at 11:03 -0500, Dan Connolly wrote: > > I re-dicovered http://bitworking.org/projects/httplib2/ . > > http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3 > > > > Then I see max-stale is what I want, but then in httplib2.py, I see: > > > > We will never return a stale document as > > fresh as a design decision, and thus the non-implementation > > of 'max-stale'. > > > > So I'm getting no help from either side. Sigh. > > I'm having some luck with only-if-cached. But I think the case > of a cache miss is buggy. Patch attached. > > > > -- > Dan Connolly, W3C http://www.w3.org/People/Connolly/ > D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E > > > -- Joe Gregorio http://bitworking.org |
Sorry, that took waaay too long for me to get back to. I have applied the patch, thanks! -joe On 4/7/06, Thomas Broyer <t.b...@gm...> wrote: > 2006/4/3, Thomas Broyer <t.b...@gm...>: > > There are two options here: > > - unfold and normalize white space in _normalize_headers > > - unfold and normalize white space only in _parse_www_authenticate, > > only for www-authenticate or authentication-info headers before > > processing > [...] > > And as I was investigating in _parse_www_authentication, I also > > noticed quoted pairs (in quoted strings) are never "unquoted", so the > > following (new) unit test fails: > > res =3D httplib2._parse_www_authenticate({ 'www-authenticate': > > 'Test realm=3D"a \\"test\\" realm"'}) > > self.assertEqual(res['test']['realm'], 'a "test" realm') > > as res['test']['realm'] contains 'a \\"test\\" realm'. > > This (unquoting) can be done using either a regex and the "sub" > > method, or splitting and joining the string. I personnaly have no > > preference. > > > > Also (and finally), as strict WWW-Authenticate "parsing" might cause > > unrecoverable errors (I mean, a parameter treated as an auth-scheme, > > or consuming the following challenge, instead of exceptions), I tend > > to go for the simpler regex ("strict send/lax receive") I provided in > > my previous mail (it still needs some testing though). Or how about > > putting both regexes in the code and providing a switch for the one to > > use (e.g. "httplib2.USE_STRICT_WWW_AUTHENTICATE_PARSING =3D true", > > defaulting to false)? > > The attached patch fixes bug #1461941 and: > - unfold and normalize spaces in _normalize_headers > (_parse_www_authenticate assumes unfolded header values; regex fixed > to accept \t as well, just in case, so doesn't assume fully-normalized > spaces, only unfolded header value) > - unquote "quoted-pairs" (done with a regex, split/join version > available in comments) > - use relaxed parsing by default but can be switched to strict > parsing using a global variable > - adds unit tests exercising both regex's > > -- > Thomas Broyer > > > -- Joe Gregorio http://bitworking.org |
From: Joe G. <joe...@gm...> - 2006-04-19 19:13:10
|
On 4/19/06, Dan Connolly <con...@w3...> wrote: > If I implement max-stale, any chance you'll reconsider that > design decision? Or will I have to maintain a fork? I will accept a patch to incorporate max-stale. You're timing is impeccable since I just today discovered a situation where I may need max-stale myself :) > Any suggestions on getting wikipedia to change their caching > policy? Seems to me that no cache-control header at all > is The Right Thing for them, no? Given how rapidly I've seen their pages changes when a topic is hot I'm not so sure what their caching policy should be. -joe -- Joe Gregorio http://bitworking.org |
From: Dan C. <con...@w3...> - 2006-04-19 16:24:15
|
On Wed, 2006-04-19 at 11:03 -0500, Dan Connolly wrote: > I re-dicovered http://bitworking.org/projects/httplib2/ . > http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3 > > Then I see max-stale is what I want, but then in httplib2.py, I see: > > We will never return a stale document as > fresh as a design decision, and thus the non-implementation > of 'max-stale'. > > So I'm getting no help from either side. Sigh. I'm having some luck with only-if-cached. But I think the case of a cache miss is buggy. Patch attached. -- Dan Connolly, W3C http://www.w3.org/People/Connolly/ D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E |
From: Dan C. <con...@w3...> - 2006-04-19 16:03:49
|
I'm writing a little thingy to get airport lat/long info out of wikipedia. Wikipedia suffered an outage today, so I'm working on caching and offline access. I re-dicovered http://bitworking.org/projects/httplib2/ . I integratet that into my little aptdata.py thingy; it seems to work. Then I try again, expecting the program to work out of the local disk cache. Nope. So I add max-age=3600 to my requests... still no joy... I look in the cache, and... no wonder: cache-control: private, s-maxage=0, max-age=0, must-revalidate That seems like a "don't bother to help me with my load; just melt down my servers, please" caching policy. Grumble. At first I thought setting max-age in a request would override the server; but I see: freshness_lifetime = min(freshness_lifetime, int(cc['max-age'])) I thought maybe that should be max, but then I read up... "If both the new request and the cached entry include "max-age" directives, then the lesser of the two values is used for determining the freshness of the cached entry for that request." -- http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3 Then I see max-stale is what I want, but then in httplib2.py, I see: We will never return a stale document as fresh as a design decision, and thus the non-implementation of 'max-stale'. So I'm getting no help from either side. Sigh. If I implement max-stale, any chance you'll reconsider that design decision? Or will I have to maintain a fork? Any suggestions on getting wikipedia to change their caching policy? Seems to me that no cache-control header at all is The Right Thing for them, no? -- Dan Connolly, W3C http://www.w3.org/People/Connolly/ D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E |
From: Thomas B. <t.b...@gm...> - 2006-04-07 10:11:28
|
2006/4/3, Thomas Broyer <t.b...@gm...>: > There are two options here: > - unfold and normalize white space in _normalize_headers > - unfold and normalize white space only in _parse_www_authenticate, > only for www-authenticate or authentication-info headers before > processing [...] > And as I was investigating in _parse_www_authentication, I also > noticed quoted pairs (in quoted strings) are never "unquoted", so the > following (new) unit test fails: > res =3D httplib2._parse_www_authenticate({ 'www-authenticate': > 'Test realm=3D"a \\"test\\" realm"'}) > self.assertEqual(res['test']['realm'], 'a "test" realm') > as res['test']['realm'] contains 'a \\"test\\" realm'. > This (unquoting) can be done using either a regex and the "sub" > method, or splitting and joining the string. I personnaly have no > preference. > > Also (and finally), as strict WWW-Authenticate "parsing" might cause > unrecoverable errors (I mean, a parameter treated as an auth-scheme, > or consuming the following challenge, instead of exceptions), I tend > to go for the simpler regex ("strict send/lax receive") I provided in > my previous mail (it still needs some testing though). Or how about > putting both regexes in the code and providing a switch for the one to > use (e.g. "httplib2.USE_STRICT_WWW_AUTHENTICATE_PARSING =3D true", > defaulting to false)? The attached patch fixes bug #1461941 and: - unfold and normalize spaces in _normalize_headers (_parse_www_authenticate assumes unfolded header values; regex fixed to accept \t as well, just in case, so doesn't assume fully-normalized spaces, only unfolded header value) - unquote "quoted-pairs" (done with a regex, split/join version available in comments) - use relaxed parsing by default but can be switched to strict parsing using a global variable - adds unit tests exercising both regex's -- Thomas Broyer |
From: Joe G. <joe...@gm...> - 2006-04-03 13:50:36
|
On 4/3/06, Thomas Broyer <t.b...@gm...> wrote: > If you don't like my: > if headers.has_key('connection'): > hopbyhop =3D HOP_BY_HOP + [x.strip() for x in > response.get('connection', '').split(',')] > else: > hopbyhop =3D HOP_BY_HOP > you can just do: > hopbyhop =3D list(HOP_BY_HOP) > hopbyhop.extend([x.strip() for x in response.get('connection', > '').split(',')]) > > By using "list(HOP_BY_HOP)", we're making a copy of HOP_BY_HOP that we > can then extend without modifying the global variable. Thanks, I went with the list(HOP_BY_HOP) solution. -joe -- Joe Gregorio http://bitworking.org |
From: Thomas B. <t.b...@gm...> - 2006-04-03 13:25:10
|
2006/3/31, Joe Gregorio <joe...@gm...>: > Thomas, > That's really great work on the regex, can you also add some > unit tests that exercise the regex? When running the existing unit tests with my modified regex, the test "HttpPrivateTest.testParseWWWAuthenticateMultiple4" fails: the value of the "qop" auth-param contains a tab (\t). Since then, I assumed the headers had been normalized, while actually _normalize_headers only normalizes field names. Precisely, HTTP/1.1 says that "A recipient MAY replace any linear white space with a single SP before interpreting the field value or forwarding the message downstream." I assumed it was done while actually it's not. There are two options here: - unfold and normalize white space in _normalize_headers - unfold and normalize white space only in _parse_www_authenticate, only for www-authenticate or authentication-info headers before processing I reject the option of modifying the regex to accomodate folded field values: it's much easier normalizing the values before processing and it's totally HTTP/1.1-compliant. And as I was investigating in _parse_www_authentication, I also noticed quoted pairs (in quoted strings) are never "unquoted", so the following (new) unit test fails: res =3D httplib2._parse_www_authenticate({ 'www-authenticate': 'Test realm=3D"a \\"test\\" realm"'}) self.assertEqual(res['test']['realm'], 'a "test" realm') as res['test']['realm'] contains 'a \\"test\\" realm'. This (unquoting) can be done using either a regex and the "sub" method, or splitting and joining the string. I personnaly have no preference. Also (and finally), as strict WWW-Authenticate "parsing" might cause unrecoverable errors (I mean, a parameter treated as an auth-scheme, or consuming the following challenge, instead of exceptions), I tend to go for the simpler regex ("strict send/lax receive") I provided in my previous mail (it still needs some testing though). Or how about putting both regexes in the code and providing a switch for the one to use (e.g. "httplib2.USE_STRICT_WWW_AUTHENTICATE_PARSING =3D true", defaulting to false)? -- Thomas Broyer |
From: Thomas B. <t.b...@gm...> - 2006-04-03 07:56:29
|
My patch has been incorporated but the HOP_BY_HOP.extends() part of the previous code has been kept as-is, which means HOP_BY_HOP is modified at each request. As an example, I've added "print httplib2.HOP_BY_HOP" at the beggining and end of httplib2test.HttpPrivateTest.testEnd2End as well as between each test in the method. This is the result: ['connection', 'keep-alive', 'proxy-authenticate', 'proxy-authorization', 'te', 'trailers', 'transfer-encoding', 'upgrade'] ['connection', 'keep-alive', 'proxy-authenticate', 'proxy-authorization', 'te', 'trailers', 'transfer-encoding', 'upgrade', ''] ['connection', 'keep-alive', 'proxy-authenticate', 'proxy-authorization', 'te', 'trailers', 'transfer-encoding', 'upgrade', '', 'content-type'] ['connection', 'keep-alive', 'proxy-authenticate', 'proxy-authorization', 'te', 'trailers', 'transfer-encoding', 'upgrade', '', 'content-type', ''] ['connection', 'keep-alive', 'proxy-authenticate', 'proxy-authorization', 'te', 'trailers', 'transfer-encoding', 'upgrade', '', 'content-type', '', 'content-type'] As you can see, the global variable HOP_BY_HOP is extended between each call to httplib2._get_end2end_headers. This means that if I get a first response with: Connection: foo, bar then a second with: Connection: bar Foo: this should be end-to-end the "foo" header in the second response will be considered hop-by-hop. If you don't like my: if headers.has_key('connection'): hopbyhop =3D HOP_BY_HOP + [x.strip() for x in response.get('connection', '').split(',')] else: hopbyhop =3D HOP_BY_HOP you can just do: hopbyhop =3D list(HOP_BY_HOP) hopbyhop.extend([x.strip() for x in response.get('connection', '').split(',')]) By using "list(HOP_BY_HOP)", we're making a copy of HOP_BY_HOP that we can then extend without modifying the global variable. -- Thomas Broyer |
From: Joe G. <joe...@gm...> - 2006-04-02 03:09:47
|
On 3/30/06, Thomas Broyer <t.b...@gm...> wrote: > I think this could be solved in many ways: > - either line 606, by initializing "info" to a "Status: 504" message, > but we must then make sure this message doesn't have an "etag", or > other things that could break "cache freshness" computation > - or line 647, looking for an empty "info" (or an "info" lacking a > "status", or catching the KeyError, or =96better=96 looking for an empty > cacheFullPath) and then returning a "Status: 504" message > - or line 721, using int(self.get('status', 504)) instead of > int(self['status']) --or catching the KeyError exception and then > defaulting to a "Status: 504" message. > > I'd rather go for the second choice, replacing lines 646 to 649 with > something like: > if entry_disposition =3D=3D "FRESH": > if not os.path.exists(cacheFullPath): > # This should be the case only for a > "Cache-Control: only-if-cached" request > return CACHED_VERSION_UNAVAILABLE > else: > response =3D Response(info) > response.fromcache =3D True > return (response, content) > > after having defined a global variable: > CACHED_VERSION_UNAVAILABLE =3D ( > Response(rfc822.Message(StringIO.StringIO("""\ > Status: 504 > Content-Type: text/plain > """)), > "You asked for a cached version only, and no cached version is availa= ble." > ) Yes, I prefer this last option, and that's the form of the fix I put in place. Thanks, -joe -- Joe Gregorio http://bitworking.org |
From: Joe G. <joe...@gm...> - 2006-03-31 15:22:08
|
Thomas, That's really great work on the regex, can you also add some unit tests that exercise the regex? Thanks, -joe On 3/31/06, Thomas Broyer <t.b...@gm...> wrote: > 2006/3/31, Thomas Broyer <t.b...@gm...>: > > Moreover, matching tokens and quoted-strings can be done within a > > single regex, using (?<=3D=85), (?=3D=85), (?<!=85) and (?!=85) constru= cts. > [=85] > > [=85] a small bug preventing commas from being prefixed with spaces > > (which is explicitely allowed by the definition of #-lists in HTTP). > [=85] > > Back to the [a-zA-Z0-9_-] vs. \w problem, I've done some more research > > and actually, the exact regex for a quoted string (without the <">s) > > is [=85] > > The exact regex for a token is [=85] > > I've created a bug report [1] (1461941 =96 Bugs in > _parse_www_authenticate's regex + use a single regex) with attached > patch, and an alternative "lax" regex (far more readable) > > [1] http://sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D1461941&= group_id=3D161082&atid=3D818434 > > -- > Thomas Broyer > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting langua= ge > that extends applications into web and mobile media. Attend the live webc= ast > and join the prime developer group breaking into this new coding territor= y! > http://sel.as-us.falkag.net/sel?cmdlnk&kid=110944&bid$1720&dat=121642 > _______________________________________________ > Httplib2-discuss mailing list > Htt...@li... > https://lists.sourceforge.net/lists/listinfo/httplib2-discuss > -- Joe Gregorio http://bitworking.org |