4Suite / Bugs / #205 Ft.Lib.Uri.SplitUriRef can raise error w/malformed uri

#205 Ft.Lib.Uri.SplitUriRef can raise error w/malformed uri

Milestone: v1.0b2

Status: open

Owner: Mike Brown

Labels: 4Lib (Ft.Lib.*) (6)

Priority: 5

Updated: 2006-11-12

Created: 2006-11-11

Creator: nikitathespider

Private: No

A url that contains a fragment which contains a newline
can cause SplitUriRef to raise an error. The problem is
that the code assumes that the regex
SPLIT_URI_REF_PATTERN "will match every possible
string" (per the code comments) but in this case it
doesn't match. As a result, this direct reference to
.groupdict() fails:
g = SPLIT_URI_REF_PATTERN.match(uriref).groupdict()

Passing this URL to SplitUriRef() demonstrates the problem:
http://example.com/#SpamSausage\nAndMoreSpam

I found that compiling SPLIT_URI_REF_PATTERN with the
re.MULTILINE flag made the problem go away. I realize
that URLs that contain newlines aren't valid anyway,
but I gather that the intent of (and my preference for)
SplitUriRef's behavior is to never raise an error.

I found this in version 1.0 of the code; not 1.0b2 as
reported. (There's no option for v1.0 here in the bug
submission form.)

I have sent millions of URLs through this function and
this is the first time it has failed. Thanks for a
great library! =)

Source code to recreate the problem is attached.

Discussion

nikitathespider - 2006-11-11

Sample code to recreate the problem

SplitUriRefExample.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mike Brown - 2006-11-12

assigned_to: nobody --> mike_j_brown
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mike Brown - 2006-11-12

Logged In: YES
user_id=371366

I wrote the function with the intent of handling only valid
URI refs, although I knew it would probably handle pretty
much anything you throw at it. That said, if using
re.MULTILINE would have no ill effects, I see no reason not
to use it. We can make that change in the next release.

Thanks for the compliment and excellent bug report.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

nikitathespider - 2006-11-12

Logged In: YES
user_id=1639275

I appreciate that feature creep can be dangerous, but in
this case it makes sense to me to this function gain the
ability to handle all URIs, not just valid ones. I say this
because it's so close to doing so already. My comment about
having used it for millions of URLs was not exaggeration, so
it's already done a fine job of handling whatever I throw at it.

Please note that I only tested re.MULTILINE to see that it
fixed this particular bug. I didn't do any regression
testing and I haven't really entertained thoughts from the
"What could possibly go wrong?" department. You wrote the
code so I'll leave it to your informed judgement.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ft.Lib.Uri.SplitUriRef can raise error w/malformed uri

Group

Searches

Help

#205 Ft.Lib.Uri.SplitUriRef can raise error w/malformed uri

Discussion