httplib2-discuss Mailing List for httplib2 (Page 4)

Status: Beta

Brought to you by: jcgregorio

httplib2-discuss — The one and only mailing list for httplib2

You can subscribe to this list here.

2006	_Jan	_Feb	_Mar (4)	_Apr (11)	_May	_Jun	_Jul (2)	_Aug	_Sep	_Oct (7)	_Nov (8)	_Dec (8)
2007	_Jan (9)	_Feb (1)	_Mar	_Apr (4)	_May (4)	_Jun (1)	_Jul (5)	_Aug	_Sep	_Oct	_Nov	_Dec (3)
2008	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct (1)	_Nov	_Dec
2009	_Jan	_Feb (1)	_Mar	_Apr	_May	_Jun	_Jul	_Aug (3)	_Sep (2)	_Oct	_Nov	_Dec (8)
2010	_Jan (6)	_Feb (3)	_Mar (3)	_Apr (4)	_May (4)	_Jun (7)	_Jul (1)	_Aug (1)	_Sep (2)	_Oct (4)	_Nov	_Dec

Flat | Threaded

<< < 1 2 3 4 (Page 4 of 4)

[Httplib2-discuss] =?WINDOWS-1252?Q?Re:_Bug_#1455955:_Followup_=96_match?= =?WINDOWS-1252?Q?ing_"tokens"_in_regular_expressions?=

From: Thomas B. <t.b...@gm...> - 2006-03-31 08:29:28

2006/3/31, Thomas Broyer <t.b...@gm...>:
> Moreover, matching tokens and quoted-strings can be done within a
> single regex, using (?<=3D=85), (?=3D=85), (?<!=85) and (?!=85) construct=
s.
[=85]
> [=85] a small bug preventing commas from being prefixed with spaces
> (which is explicitely allowed by the definition of #-lists in HTTP).
[=85]
> Back to the [a-zA-Z0-9_-] vs. \w problem, I've done some more research
> and actually, the exact regex for a quoted string (without the <">s)
> is [=85]
> The exact regex for a token is [=85]

I've created a bug report [1] (1461941 =96 Bugs in
_parse_www_authenticate's regex + use a single regex) with attached
patch, and an alternative "lax" regex (far more readable)

[1] http://sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D1461941&gr=
oup_id=3D161082&atid=3D818434

--
Thomas Broyer

[Httplib2-discuss] =?WINDOWS-1252?Q?Bug_#1455955:_Followup_=96_matchin?= =?WINDOWS-1252?Q?g_"tokens"_in_regular_expressions?=

From: Thomas B. <t.b...@gm...> - 2006-03-31 08:01:52

Here's a copy of my followup to bug #1455955 (Support for HMACDigest
authentication) [1], about the use of "[a-zA-Z0-9_-]" vs. "\w" to
match "tokens" as defined by HTTP [2], so that it can be discussed (if
anybody is subscribed to this list of course! :-P )

Moreover, matching tokens and quoted-strings can be done within a
single regex, using (?<=3D=85), (?=3D=85), (?<!=85) and (?!=85) constructs.=
 Here's
a regex matching both tokens and quoted-strings, at the expense of
being a bit harder to read:
WWW_AUTH =3D re.compile(r"^(?:\s*(?:,\s*)?([a-zA-Z0-9_-]+)\s*=3D\s*\"?((?<=
=3D\")(?:[^\\\"]|\\.)*?(?=3D\")|(?<!\")[a-zA-Z0-9_-]+(?!\"))\"?)(.*)$")
You then just have to remove any reference to WWW_AUTH2 and match2
from _parse_www_authenticate.

This regex also fixes a small bug preventing commas from being
prefixed with spaces (which is explicitely allowed by the definition
of #-lists in HTTP).
I just replaced "^,?\s*" with "^\s*(?:,\s*)?", i.e., match every
space, eventually followed by a comma and eventually other spaces. I
could have written "^\s*,?\s*" but I guess the non-matching group
construct is a bit more efficient as it doesn't try to match "\s*"
twice.

Back to the [a-zA-Z0-9_-] vs. \w problem, I've done some more research
and actually, the exact regex for a quoted string (without the <">s)
is
    (?:[^\0-\x1f\x7f-\xff\\\"]|\\[\0-\x7f]|\r\n[ \t]+)*?
but given that LWS has already been replaced with a single space, it
can be simplified as:
    (?:[^\0-\x1f\x7f-\xff\\"]|\\[\0-\x7f])*?
The only difference with what's currently in Httplib2 ([^\\\"]|\\.) is
that the regex above excludes CTLs from the first part and any octet
with value >=3D 128 (\xF0) from both parts.
The exact regex for a token is:
    [^\0-\x1f\x7f-\xff()<>@,;:\\\"/[\]?=3D{} \t]+
Such a (unreadable, I admit it) regex, compared to \w or
[a-zA-Z0-9_-]+, would match tokens such as to#en, to%en, to*en, to!en,
etc. which are valid tokens, even if probably never used.

[1] https://sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D1455955&g=
roup_id=3D161082&atid=3D818437
[2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2




Er, Python doc for "re" says:
\w
    When the LOCALE and UNICODE flags are not specified, matches any
alphanumeric character and the underscore; this is equivalent to the
set [a-zA-Z0-9_].

So \w is equivalent here to [a-zA-Z0-9_] (because httplib2 uses
neither the LOCALE nor the UNICODE flags), which is even more
restrictive than my proposed [a-zA-Z0-9_-].

Compare re.match(r"^\w+$", "f-o-o") and re.match(r"^\w+$", "foo").

HTTP defines "token" as "1*<any CHAR except CTLs or separators>", and
both "/" and ":" which are present in almost every absoluteURI is a
"serapator", so an absoluteURI is not a token (and such must be
quoted).

[a-zA-Z0-9_-] is far from perfect, but at least a bit better than \w.

--
Thomas Broyer

[Httplib2-discuss] Bug #1459543: headers={'cache-control':'only-if-cached'}

From: Thomas B. <t.b...@gm...> - 2006-03-30 08:45:13

I think this could be solved in many ways:
 - either line 606, by initializing "info" to a "Status: 504" message,
but we must then make sure this message doesn't have an "etag", or
other things that could break "cache freshness" computation
 - or line 647, looking for an empty "info" (or an "info" lacking a
"status", or catching the KeyError, or =96better=96 looking for an empty
cacheFullPath) and then returning a "Status: 504" message
 - or line 721, using int(self.get('status', 504)) instead of
int(self['status']) --or catching the KeyError exception and then
defaulting to a "Status: 504" message.

I'd rather go for the second choice, replacing lines 646 to 649 with
something like:
                if entry_disposition =3D=3D "FRESH":
                    if not os.path.exists(cacheFullPath):
                        # This should be the case only for a
"Cache-Control: only-if-cached" request
                        return CACHED_VERSION_UNAVAILABLE
                    else:
                        response =3D Response(info)
                        response.fromcache =3D True
                        return (response, content)

after having defined a global variable:
CACHED_VERSION_UNAVAILABLE =3D (
    Response(rfc822.Message(StringIO.StringIO("""\
Status: 504
Content-Type: text/plain
""")),
    "You asked for a cached version only, and no cached version is availabl=
e."
)

--
Thomas Broyer

39 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 2 3 4 (Page 4 of 4)

2006	Jan	Feb	Mar (4)	Apr (11)	May	Jun	Jul (2)	Aug	Sep	Oct (7)	Nov (8)	Dec (8)
2007	Jan (9)	Feb (1)	Mar	Apr (4)	May (4)	Jun (1)	Jul (5)	Aug	Sep	Oct	Nov	Dec (3)
2008	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2009	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug (3)	Sep (2)	Oct	Nov	Dec (8)
2010	Jan (6)	Feb (3)	Mar (3)	Apr (4)	May (4)	Jun (7)	Jul (1)	Aug (1)	Sep (2)	Oct (4)	Nov	Dec