A url that contains a fragment which contains a newline
can cause SplitUriRef to raise an error. The problem is
that the code assumes that the regex
SPLIT_URI_REF_PATTERN "will match every possible
string" (per the code comments) but in this case it
doesn't match. As a result, this direct reference to
.groupdict() fails:
g = SPLIT_URI_REF_PATTERN.match(uriref).groupdict()
Passing this URL to SplitUriRef() demonstrates the problem:
http://example.com/#SpamSausage\nAndMoreSpam
I found that compiling SPLIT_URI_REF_PATTERN with the
re.MULTILINE flag made the problem go away. I realize
that URLs that contain newlines aren't valid anyway,
but I gather that the intent of (and my preference for)
SplitUriRef's behavior is to never raise an error.
I found this in version 1.0 of the code; not 1.0b2 as
reported. (There's no option for v1.0 here in the bug
submission form.)
I have sent millions of URLs through this function and
this is the first time it has failed. Thanks for a
great library! =)
Source code to recreate the problem is attached.
Sample code to recreate the problem
Logged In: YES
user_id=371366
I wrote the function with the intent of handling only valid
URI refs, although I knew it would probably handle pretty
much anything you throw at it. That said, if using
re.MULTILINE would have no ill effects, I see no reason not
to use it. We can make that change in the next release.
Thanks for the compliment and excellent bug report.
Logged In: YES
user_id=1639275
I appreciate that feature creep can be dangerous, but in
this case it makes sense to me to this function gain the
ability to handle all URIs, not just valid ones. I say this
because it's so close to doing so already. My comment about
having used it for millions of URLs was not exaggeration, so
it's already done a fine job of handling whatever I throw at it.
Please note that I only tested re.MULTILINE to see that it
fixed this particular bug. I didn't do any regression
testing and I haven't really entertained thoughts from the
"What could possibly go wrong?" department. You wrote the
code so I'll leave it to your informed judgement.