From: Pier F. <pi...@be...> - 2004-08-17 10:04:24
|
Jan, no, I didn't see that FAQ before posting (I need to improve my searching skills). As far as I can see, though, at least for URL part, though, there is a RFC (it's only an informational one, but still, it's a RFC) which deals with this: http://www.ietf.org/rfc/rfc2718.txt In section 2.2.5, it mentions UTF-8: > 2.2.5 Character encoding > > When describing URL schemes in which (some of) the elements of the URL > are actually representations of sequences of characters, care should > be taken not to introduce unnecessary variety in the ways in which > characters are encoded into octets and then into URL characters. > Unless there is some compelling reason for a particular scheme to do > otherwise, translating character sequences into UTF-8 (RFC 2279) [3] > and then subsequently using the %HH encoding for unsafe octets is > recommended. I know that it's nowhere like a standard, but as far as I can see, IE6, Safari and Mozilla all encode the URLs in UTF-8. Now, being myself _BLATANTLY_STUPID_ (and I apaologise), I didn't see the "org.mortbay.util.URI.charset" system property. I just launched jetty with -Dorg.mortbay.util.URI.charset=UTF-8, and all my problems went away. Thanks y'all! Pier On 16 Aug 2004, at 22:59, Jan Bartel wrote: > Hi Pier, > > Seems like this is a pretty messy area. Did you read the Jetty FAQ > entry at > > http://jetty.mortbay.org/jetty/faq?s=900-Content&t=International > > > cheers, > Jan > > Pier Fumagalli wrote: >> Ok, I admit that having a Japanese girlfriend is biasing me, now! :-P >> >> Seriously speaking, we're observing one slight problem in Cocoon, that >> UTF-8 URLs are not parsed correctly :-( This is kinda of a problem, >> as a >> few people from China are starting to complain (and of course I won't >> be >> able to handle "~谷理子" -my gf's name- on my web server). >> >> It appears that UTF-8 URLs are passed on the wire (as specified by >> http://www.w3.org/International/O-URL-and-ident) as strings encoded in >> UTF-8, then URL-encoded. (for the above mentioned string, it's >> something >> like %e8%b0%b7%e7%90%86%e5%ad%90). >> >> Now, the only way in which I can correctly get the "real" string that >> the user typed in his browser, is (for example) to do so: >> >> String path_info = new >> String(request.getPathInfo().getBytes("ISO-8859-1"),"UTF-8")); >> >> which is _SO_ ugly, as I have to get the bytes ouf of the path_info, >> encoding them in UTF-8, then take that bytes sequence and decode into >> a >> string as if they were UTF-8 characters. >> >> It works, indeed, but it'd be better to have Jetty to process them >> correctly in the first place. >> >> Now, I'm sure that this popped up before, but I wasn't able to find >> any >> reference to how to modify the default ISO-8859-1 behavior... Any >> pointer, clue, or do I have to work on a patch? >> >> Pier > > > > ------------------------------------------------------- > SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media > 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 > Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. > http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 > _______________________________________________ > jetty-discuss mailing list > jet...@li... > https://lists.sourceforge.net/lists/listinfo/jetty-discuss > |