Menu

#21 incorrect decoding of URL in getServletPath()

open
nobody
None
5
2007-08-15
2007-08-15
No

HttpServletRequest.getServletPath() decodes URL incorrectly. Specifically, it decodes '+' to ' ', which is just defined for query string but not for the URL in general.

The bug is in WebAppConfiguration.java line 1279. From what it seems like you need a different version of the decodeURLToken() method.

See http://en.wikipedia.org/wiki/Query_string#URL_encoding

Discussion

  • Kohsuke Kawaguchi

    Logged In: YES
    user_id=179238
    Originator: YES

    Oh, and any plan to replease 0.9.10? I know you've made a few other bug fixes and I'd like to pick them up, too.

     
  • Rick Knowles

    Rick Knowles - 2007-08-15

    Logged In: YES
    user_id=716353
    Originator: NO

    I'm thinking of making the next one v1.0 ...

    Not trying to argue the point here, but the wikipedia article mentions two RFCs about URI encoding to back up it's point, and I can't seem to find any mention in the RFCs it quotes that the "+" behaviour should differ.

    I'm "going deep" on the verification for this one because of some bad experiences with the MS WebFolders WebDAV client on Windows with japanese chars, so if I change it I want to be 100% sure I understood the spec right. I don't put a lot of stock in Wikipedia directly, especially after the recent "Fox news whitewashing wikipedia" stories on reddit.

    If you can help me by finding the section I couldn't, I'll be very appreciative and gladly change it to match, assuming it doesn't break IE and firefox (which can't automatically be assumed from spec compliance).

    Thanks,

    Rick

     
  • Kohsuke Kawaguchi

    Logged In: YES
    user_id=179238
    Originator: YES

    http://tools.ietf.org/html/rfc1738 end of section 2.2, it says:

    > Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
    > reserved characters used for their reserved purposes may be used
    > unencoded within a URL.

    So '+' is indeed allowed as un-escaped in URL.

    Also see RFC 2396 (http://tools.ietf.org/html/rfc2396) and compare section 3.3 and 3.4. The path component includes '+' as a valid char, but 3.4 says '+' is reserved.

     
  • Rick Knowles

    Rick Knowles - 2007-08-15

    Logged In: YES
    user_id=716353
    Originator: NO

    OK thanks - that was exactly what I was looking for. It's clearly more than just the plus sign, which is what concerned me.

    I'll get in and do this in the next day or two, and if you can give it a try and say it's ok or not, I'll push it out as v0.9.10 straight away. v1.0 should probably wait a bit.

    Thanks again,

    Rick

     

Log in to post a comment.