httplib2-discuss Mailing List for httplib2 (Page 4)
Status: Beta
Brought to you by:
jcgregorio
You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
(4) |
Apr
(11) |
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(7) |
Nov
(8) |
Dec
(8) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(9) |
Feb
(1) |
Mar
|
Apr
(4) |
May
(4) |
Jun
(1) |
Jul
(5) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
(2) |
Oct
|
Nov
|
Dec
(8) |
2010 |
Jan
(6) |
Feb
(3) |
Mar
(3) |
Apr
(4) |
May
(4) |
Jun
(7) |
Jul
(1) |
Aug
(1) |
Sep
(2) |
Oct
(4) |
Nov
|
Dec
|
From: Thomas B. <t.b...@gm...> - 2006-03-31 08:29:28
|
2006/3/31, Thomas Broyer <t.b...@gm...>: > Moreover, matching tokens and quoted-strings can be done within a > single regex, using (?<=3D=85), (?=3D=85), (?<!=85) and (?!=85) construct= s. [=85] > [=85] a small bug preventing commas from being prefixed with spaces > (which is explicitely allowed by the definition of #-lists in HTTP). [=85] > Back to the [a-zA-Z0-9_-] vs. \w problem, I've done some more research > and actually, the exact regex for a quoted string (without the <">s) > is [=85] > The exact regex for a token is [=85] I've created a bug report [1] (1461941 =96 Bugs in _parse_www_authenticate's regex + use a single regex) with attached patch, and an alternative "lax" regex (far more readable) [1] http://sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D1461941&gr= oup_id=3D161082&atid=3D818434 -- Thomas Broyer |
From: Thomas B. <t.b...@gm...> - 2006-03-31 08:01:52
|
Here's a copy of my followup to bug #1455955 (Support for HMACDigest authentication) [1], about the use of "[a-zA-Z0-9_-]" vs. "\w" to match "tokens" as defined by HTTP [2], so that it can be discussed (if anybody is subscribed to this list of course! :-P ) Moreover, matching tokens and quoted-strings can be done within a single regex, using (?<=3D=85), (?=3D=85), (?<!=85) and (?!=85) constructs.= Here's a regex matching both tokens and quoted-strings, at the expense of being a bit harder to read: WWW_AUTH =3D re.compile(r"^(?:\s*(?:,\s*)?([a-zA-Z0-9_-]+)\s*=3D\s*\"?((?<= =3D\")(?:[^\\\"]|\\.)*?(?=3D\")|(?<!\")[a-zA-Z0-9_-]+(?!\"))\"?)(.*)$") You then just have to remove any reference to WWW_AUTH2 and match2 from _parse_www_authenticate. This regex also fixes a small bug preventing commas from being prefixed with spaces (which is explicitely allowed by the definition of #-lists in HTTP). I just replaced "^,?\s*" with "^\s*(?:,\s*)?", i.e., match every space, eventually followed by a comma and eventually other spaces. I could have written "^\s*,?\s*" but I guess the non-matching group construct is a bit more efficient as it doesn't try to match "\s*" twice. Back to the [a-zA-Z0-9_-] vs. \w problem, I've done some more research and actually, the exact regex for a quoted string (without the <">s) is (?:[^\0-\x1f\x7f-\xff\\\"]|\\[\0-\x7f]|\r\n[ \t]+)*? but given that LWS has already been replaced with a single space, it can be simplified as: (?:[^\0-\x1f\x7f-\xff\\"]|\\[\0-\x7f])*? The only difference with what's currently in Httplib2 ([^\\\"]|\\.) is that the regex above excludes CTLs from the first part and any octet with value >=3D 128 (\xF0) from both parts. The exact regex for a token is: [^\0-\x1f\x7f-\xff()<>@,;:\\\"/[\]?=3D{} \t]+ Such a (unreadable, I admit it) regex, compared to \w or [a-zA-Z0-9_-]+, would match tokens such as to#en, to%en, to*en, to!en, etc. which are valid tokens, even if probably never used. [1] https://sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D1455955&g= roup_id=3D161082&atid=3D818437 [2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2 Er, Python doc for "re" says: \w When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. So \w is equivalent here to [a-zA-Z0-9_] (because httplib2 uses neither the LOCALE nor the UNICODE flags), which is even more restrictive than my proposed [a-zA-Z0-9_-]. Compare re.match(r"^\w+$", "f-o-o") and re.match(r"^\w+$", "foo"). HTTP defines "token" as "1*<any CHAR except CTLs or separators>", and both "/" and ":" which are present in almost every absoluteURI is a "serapator", so an absoluteURI is not a token (and such must be quoted). [a-zA-Z0-9_-] is far from perfect, but at least a bit better than \w. -- Thomas Broyer |
From: Thomas B. <t.b...@gm...> - 2006-03-30 08:45:13
|
I think this could be solved in many ways: - either line 606, by initializing "info" to a "Status: 504" message, but we must then make sure this message doesn't have an "etag", or other things that could break "cache freshness" computation - or line 647, looking for an empty "info" (or an "info" lacking a "status", or catching the KeyError, or =96better=96 looking for an empty cacheFullPath) and then returning a "Status: 504" message - or line 721, using int(self.get('status', 504)) instead of int(self['status']) --or catching the KeyError exception and then defaulting to a "Status: 504" message. I'd rather go for the second choice, replacing lines 646 to 649 with something like: if entry_disposition =3D=3D "FRESH": if not os.path.exists(cacheFullPath): # This should be the case only for a "Cache-Control: only-if-cached" request return CACHED_VERSION_UNAVAILABLE else: response =3D Response(info) response.fromcache =3D True return (response, content) after having defined a global variable: CACHED_VERSION_UNAVAILABLE =3D ( Response(rfc822.Message(StringIO.StringIO("""\ Status: 504 Content-Type: text/plain """)), "You asked for a cached version only, and no cached version is availabl= e." ) -- Thomas Broyer |