Menu

#152 URI Bug: urllib.quote escaping reserved chars

v2.0
closed-fixed
zsi (169)
5
2006-10-20
2006-07-10
No

The escaping for URI's is wrong:

From rfc2396:

"If the data for a URI component would conflict with the reserved
purpose, then the conflicting data must be escaped before forming the
URI."

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","

This implies that if ":" is used for a reserved purpose,

if scheme is defined then
append scheme to result
append ":" to result

, then it should not be escaped.

I'm not sure what the best solution to this problem is. But I think
"quoting" the query portion of a URI, if it exists, is probably all that can
be reasonably done.

-josh

On Jul 6, 2006, at 1:17 PM, Chris Lambacher wrote:

Hi,

I am having some trouble talking to a server that takes an anyURI
argument.
If I use TC.URI the uri gets encoded with urllib.quote, such that
http://host/ ends up as http%3A//host/. The end result is that the
server
tells me that it can't find a scheme in the uri.

I have gotten around this by creating my own URI class that omits the
quote/unquote step in serialization.

Is this behaviour according to standard and the server needs fixing, or
is
TC.URI broken?

Thanks,
Chris

Discussion

  • Christopher Lambacher

    Logged In: YES
    user_id=122679

    You either need to trust that users are providing properly
    formated URIs or you need provide a data type that allows
    all the bits to be set independantly and combined with the
    correct encoding.

    Obviously the first option is a lot easier. All you have to
    do is remove the functions overrided from
    String(text_to_data and get_formatted_content) and let the
    default string versions take over. Some form of validation
    could be done on the str as it is processed, raising an
    exception if it is not a properly formatted URI.

    The other way allows you to ensure that all the bits are
    encoded properly. It could be as simple as a dict that has
    the required keys scheme, host, path, query and with
    optional keys port, user, password. get_formatted_content
    could detect if it is a dict or a string and act accordingly.

    The current behavior is wrong and looks like it leaves you
    only able to send URI types to python servers/clients. It
    would be nice if something could be done about this before 2.0.

     
  • Joshua Boverhof

    Joshua Boverhof - 2006-10-20
    • status: open --> closed-fixed
     
  • Christopher Lambacher

    Logged In: YES
    user_id=122679
    Originator: NO

    Why is this marked as fixed? No change has been applied to the source and Josh apparently agrees with me that this is broken. I would recommend just removing the text_to_data and get_formatted_content methods and let the ones from String take over. That is what I have been using as a monkey patch to resolve the problem for the time being.

    Do you want to see a patch for that?

     

Log in to post a comment.