Menu

#205 Ft.Lib.Uri.SplitUriRef can raise error w/malformed uri

v1.0b2
open
5
2006-11-12
2006-11-11
No

A url that contains a fragment which contains a newline
can cause SplitUriRef to raise an error. The problem is
that the code assumes that the regex
SPLIT_URI_REF_PATTERN "will match every possible
string" (per the code comments) but in this case it
doesn't match. As a result, this direct reference to
.groupdict() fails:
g = SPLIT_URI_REF_PATTERN.match(uriref).groupdict()

Passing this URL to SplitUriRef() demonstrates the problem:
http://example.com/#SpamSausage\nAndMoreSpam

I found that compiling SPLIT_URI_REF_PATTERN with the
re.MULTILINE flag made the problem go away. I realize
that URLs that contain newlines aren't valid anyway,
but I gather that the intent of (and my preference for)
SplitUriRef's behavior is to never raise an error.

I found this in version 1.0 of the code; not 1.0b2 as
reported. (There's no option for v1.0 here in the bug
submission form.)

I have sent millions of URLs through this function and
this is the first time it has failed. Thanks for a
great library! =)

Source code to recreate the problem is attached.

Discussion

  • nikitathespider

    nikitathespider - 2006-11-11

    Sample code to recreate the problem

     
  • Mike Brown

    Mike Brown - 2006-11-12
    • assigned_to: nobody --> mike_j_brown
     
  • Mike Brown

    Mike Brown - 2006-11-12

    Logged In: YES
    user_id=371366

    I wrote the function with the intent of handling only valid
    URI refs, although I knew it would probably handle pretty
    much anything you throw at it. That said, if using
    re.MULTILINE would have no ill effects, I see no reason not
    to use it. We can make that change in the next release.

    Thanks for the compliment and excellent bug report.

     
  • nikitathespider

    nikitathespider - 2006-11-12

    Logged In: YES
    user_id=1639275

    I appreciate that feature creep can be dangerous, but in
    this case it makes sense to me to this function gain the
    ability to handle all URIs, not just valid ones. I say this
    because it's so close to doing so already. My comment about
    having used it for millions of URLs was not exaggeration, so
    it's already done a fine job of handling whatever I throw at it.

    Please note that I only tested re.MULTILINE to see that it
    fixed this particular bug. I didn't do any regression
    testing and I haven't really entertained thoughts from the
    "What could possibly go wrong?" department. You wrote the
    code so I'll leave it to your informed judgement.

     

Log in to post a comment.