Thread: [Simpleweb-Support] What encoding of String that Request.getParameter() return?
Brought to you by:
niallg
From: Carfield Y. <car...@ca...> - 2005-12-05 10:26:33
|
Is it same as servlet API that using ISO-8859-1? |
From: Niall G. <gal...@ya...> - 2005-12-06 01:25:20
|
Hi, Yes, all URI use the ISO-8859-1 charset (see RFC 2396) as does the HTTP request (see RFC 2616). Niall --- Carfield Yim <car...@ca...> wrote: > Is it same as servlet API that using ISO-8859-1? > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do > you grep through log files > for problems? Stop! Download the new AJAX search > engine that makes > searching your log files as easy as surfing the > web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_idv37&alloc_id865&op=click > _______________________________________________ > Simpleweb-Support mailing list > Sim...@li... > https://lists.sourceforge.net/lists/listinfo/simpleweb-support > Niall Gallagher __________________________________________ Yahoo! DSL Something to write home about. Just $16.99/mo. or less. dsl.yahoo.com |
From: Carfield Y. <car...@ca...> - 2005-12-06 04:25:35
|
SSBzZWUsIHRoZW4gSSBlbmNvdW50ZXIgc29tZSBzdHJhbmdsZSBwcm9ibGVtLiBJIHdhcyB1c2lu ZyB0aGlzCmFwcHJvYWNoIHRvIGNvbnZlcnQgbXVsdGlieXRlIGNoYXJzZXQgdG8gVVRGLTggdG8g c3RvcmFnZToKClN0cmluZyBuZXdTdHJpbmcgPSBuZXcgU3RyaW5nKCBpblN0cmluZy5nZXRCeXRl cygiSVNPLTg4NTktMSIpICwgIlVURi04Iik7CgpCdXQgdGhpcyBkb24ndCB3b3JrIGF0IHNpbXBs ZXdlYi4uLgoKT24gMTIvNi8wNSwgTmlhbGwgR2FsbGFnaGVyIDxnYWxsYWdoZXJfbmlhbGxAeWFo b28uY29tPiB3cm90ZToKPiBIaSwKPgo+IFllcywgYWxsIFVSSSB1c2UgdGhlIElTTy04ODU5LTEg Y2hhcnNldCAoc2VlIFJGQyAyMzk2KQo+IGFzIGRvZXMgdGhlIEhUVFAgcmVxdWVzdCAoc2VlIFJG QyAyNjE2KS4KPgo+IE5pYWxsCj4KPiAtLS0gQ2FyZmllbGQgWWltIDxjYXJmaWVsZEBjYXJmaWVs ZC5jb20uaGs+IHdyb3RlOgo+Cj4gPiBJcyBpdCBzYW1lIGFzIHNlcnZsZXQgQVBJIHRoYXQgdXNp bmcgSVNPLTg4NTktMT8KPiA+Cj4gPgo+ID4KPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCj4gPiBUaGlzIFNGLm5ldCBlbWFpbCBpcyBzcG9u c29yZWQgYnk6IFNwbHVuayBJbmMuIERvCj4gPiB5b3UgZ3JlcCB0aHJvdWdoIGxvZyBmaWxlcwo+ ID4gZm9yIHByb2JsZW1zPyAgU3RvcCEgIERvd25sb2FkIHRoZSBuZXcgQUpBWCBzZWFyY2gKPiA+ IGVuZ2luZSB0aGF0IG1ha2VzCj4gPiBzZWFyY2hpbmcgeW91ciBsb2cgZmlsZXMgYXMgZWFzeSBh cyBzdXJmaW5nIHRoZQo+ID4gd2ViLiAgRE9XTkxPQUQgU1BMVU5LIQo+ID4gaHR0cDovL2Fkcy5v c2RuLmNvbS8/YWRfaWR2MzcmYWxsb2NfaWQWODY1Jm9wPWNsaWNrCj4gPiBfX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwo+ID4gU2ltcGxld2ViLVN1cHBvcnQg bWFpbGluZyBsaXN0Cj4gPiBTaW1wbGV3ZWItU3VwcG9ydEBsaXN0cy5zb3VyY2Vmb3JnZS5uZXQK PiA+Cj4gaHR0cHM6Ly9saXN0cy5zb3VyY2Vmb3JnZS5uZXQvbGlzdHMvbGlzdGluZm8vc2ltcGxl d2ViLXN1cHBvcnQKPiA+Cj4KPgo+IE5pYWxsIEdhbGxhZ2hlcgo+Cj4KPgo+IF9fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwo+IFlhaG9vISBEU0wgliBTb21ldGhpbmcg dG8gd3JpdGUgaG9tZSBhYm91dC4KPiBKdXN0ICQxNi45OS9tby4gb3IgbGVzcy4KPiBkc2wueWFo b28uY29tCj4KPgo+Cj4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLQo+IFRoaXMgU0YubmV0IGVtYWlsIGlzIHNwb25zb3JlZCBieTogU3BsdW5r IEluYy4gRG8geW91IGdyZXAgdGhyb3VnaCBsb2cgZmlsZXMKPiBmb3IgcHJvYmxlbXM/ICBTdG9w ISAgRG93bmxvYWQgdGhlIG5ldyBBSkFYIHNlYXJjaCBlbmdpbmUgdGhhdCBtYWtlcwo+IHNlYXJj aGluZyB5b3VyIGxvZyBmaWxlcyBhcyBlYXN5IGFzIHN1cmZpbmcgdGhlICB3ZWIuICBET1dOTE9B RCBTUExVTkshCj4gaHR0cDovL2Fkcy5vc2RuLmNvbS8/YWRfaWQ9NzYzNyZhbGxvY19pZD0xNjg2 NSZvcD1jbGljawo+IF9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fCj4gU2ltcGxld2ViLVN1cHBvcnQgbWFpbGluZyBsaXN0Cj4gU2ltcGxld2ViLVN1cHBvcnRA bGlzdHMuc291cmNlZm9yZ2UubmV0Cj4gaHR0cHM6Ly9saXN0cy5zb3VyY2Vmb3JnZS5uZXQvbGlz dHMvbGlzdGluZm8vc2ltcGxld2ViLXN1cHBvcnQKPgo= |
From: Niall G. <gal...@ya...> - 2005-12-06 17:17:49
|
Hi Carfield, This has nothing to do with Simple, if this causes problems then inString is not ISO-8859-1. If you get this data from the InputStream check the Request.getMimeType charset. If not then it is related to something else. Also, you should not try to promote ISO-8859-1 to UTF-8. You should try: inString.getBytes("UTF-8"); Niall --- Carfield Yim <car...@ca...> wrote: > I see, then I encounter some strangle problem. I was > using this > approach to convert multibyte charset to UTF-8 to > storage: > > String newString = new String( > inString.getBytes("ISO-8859-1") , "UTF-8"); > > But this don't work at simpleweb... > > On 12/6/05, Niall Gallagher > <gal...@ya...> wrote: > > Hi, > > > > Yes, all URI use the ISO-8859-1 charset (see RFC > 2396) > > as does the HTTP request (see RFC 2616). > > > > Niall > > > > --- Carfield Yim <car...@ca...> wrote: > > > > > Is it same as servlet API that using ISO-8859-1? > > > > > > > > > > > > ------------------------------------------------------- > > > This SF.net email is sponsored by: Splunk Inc. > Do > > > you grep through log files > > > for problems? Stop! Download the new AJAX > search > > > engine that makes > > > searching your log files as easy as surfing the > > > web. DOWNLOAD SPLUNK! > > > > http://ads.osdn.com/?ad_idv37&alloc_id865&op=click > > > _______________________________________________ > > > Simpleweb-Support mailing list > > > Sim...@li... > > > > > > https://lists.sourceforge.net/lists/listinfo/simpleweb-support > > > > > > > > > Niall Gallagher > > > > > > > > __________________________________________ > > Yahoo! DSL Something to write home about. > > Just $16.99/mo. or less. > > dsl.yahoo.com > > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: Splunk Inc. Do > you grep through log files > > for problems? Stop! Download the new AJAX search > engine that makes > > searching your log files as easy as surfing the > web. DOWNLOAD SPLUNK! > > > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > > _______________________________________________ > > Simpleweb-Support mailing list > > Sim...@li... > > > https://lists.sourceforge.net/lists/listinfo/simpleweb-support > > > N¬HYÞµéX¬²'²Þu¼¦[§Ü¨º > Þ¦Øk¢è!W¬~é®åzk¶C£ å¡§m éÞÀ@^ÇÈ^§zØZ¶f¤zËj·!x2¢êå¢âë±æ¬É«,º·âa{å,àHòÔ4¨m¶ÿiÛ(±ÙÜ¢oÚv'ïûjYhr'ׯ:ærX(¦¦W°y´®¦+¶f¢)à+-J)©ìm+©¦í+-²Ê.Ç¢¸ëa¶Úlÿùb²Û,¢êÜyú+éÞ·ùb²Û?+-wèþȦ¦W°y».¦+ Niall Gallagher __________________________________________ Yahoo! DSL Something to write home about. Just $16.99/mo. or less. dsl.yahoo.com |
From: Martin N. <mar...@gm...> - 2005-12-07 09:31:42
|
On 12/6/05, Carfield Yim <car...@ca...> wrote: > I see, then I encounter some strangle problem. I was using this > approach to convert multibyte charset to UTF-8 to storage: > > String newString =3D new String( inString.getBytes("ISO-8859-1") , "UTF-8= "); > > But this don't work at simpleweb... I dont quite understand this? Java Strings are charset-neutral as far as i know (and always stored internally as UTF-16), so there is no need to "convert" a string to anything as the string does not retain charset information. The only conversion is dont at Input/Output stream level or when encoding to other formats (such as URLEncoder). /Martin |
From: Carfield Y. <car...@ca...> - 2005-12-07 10:14:06
|
> I dont quite understand this? Java Strings are charset-neutral as far > as i know (and always stored internally as UTF-16), so there is no > need to "convert" a string to anything as the string does not retain > charset information. The only conversion is dont at Input/Output > stream level or when encoding to other formats (such as URLEncoder). > In some servlet container (tomcat and jetty as I know) If you submit multibytes character in HTML form, it will incorrectly assume that it is "ISO-8859-1", and return the incorrect encoded string at request.getparameter() method. In order to get back the correct string, I need to do the above. Of course, May be it just because I've setup servlet container incorrectly, but seem to me that this is a very common practice. |
From: Carfield Y. <car...@gm...> - 2005-12-07 10:13:48
|
> > I dont quite understand this? Java Strings are charset-neutral as far > as i know (and always stored internally as UTF-16), so there is no > need to "convert" a string to anything as the string does not retain > charset information. The only conversion is dont at Input/Output > stream level or when encoding to other formats (such as URLEncoder). > In some servlet container (tomcat and jetty as I know) If you submit multibytes character in HTML form, it will incorrectly assume that it is "ISO-8859-1", and return the incorrect encoded string at request.getparameter() method. In order to get back the correct string, I need to do the above. Of course, May be it just because I've setup servlet container incorrectly, but seem to me that this is a very common practice. |
From: Martin N. <mar...@gm...> - 2005-12-07 14:32:24
|
on 12/7/05, Carfield Yim <car...@gm...> wrote: > > > > I dont quite understand this? Java Strings are charset-neutral as far > > as i know (and always stored internally as UTF-16), so there is no > > need to "convert" a string to anything as the string does not retain > > charset information. The only conversion is dont at Input/Output > > stream level or when encoding to other formats (such as URLEncoder). > > > In some servlet container (tomcat and jetty as I know) If you submit > multibytes character in HTML form, it will incorrectly assume that it > is "ISO-8859-1", and return the incorrect encoded string at > request.getparameter() method. In order to get back the correct > string, I need to do the above. I see how it works now. Anyway might it have something to do with the fact that SimpleWeb actually tries to decode query parameters submitted in UTF-8 as i see the ParameterParser and URIParse classes does? I would think this causes a problem if the input is already in UTF and when you do inString.getBytes("ISO-8859-1") you get the ISO representations of characters (for example you get the string "Bj=F6rnb=E4r" which is already correct), which will mess up when trying to encode these into UTF-8 again (which will result in "Bj?rnb?"). /Martin |
From: Niall G. <gal...@ya...> - 2005-12-08 01:27:37
|
Hi, All input, be it a URI, URI parameters, or POSTed parameters are (by specification) meant to be ISO-8859-1. UTF is supported in the % HEX HEX escaping done by the client. You should have to do no character encoding anywhere in Simple. All characters are converted correctly, or should be. So, as Martin said, inString.getBytes("ISO-8859-1") will result in a conversion from Java UCS-2 (which is 16-bit) to ISO-8859-1 which is (8-bit). Resulting in a mangled string. Niall --- Martin Norrsken <mar...@gm...> wrote: > on 12/7/05, Carfield Yim <car...@gm...> wrote: > > > > > > I dont quite understand this? Java Strings are > charset-neutral as far > > > as i know (and always stored internally as > UTF-16), so there is no > > > need to "convert" a string to anything as the > string does not retain > > > charset information. The only conversion is dont > at Input/Output > > > stream level or when encoding to other formats > (such as URLEncoder). > > > > > In some servlet container (tomcat and jetty as I > know) If you submit > > multibytes character in HTML form, it will > incorrectly assume that it > > is "ISO-8859-1", and return the incorrect encoded > string at > > request.getparameter() method. In order to get > back the correct > > string, I need to do the above. > > I see how it works now. Anyway might it have > something to do with the > fact that SimpleWeb actually tries to decode query > parameters > submitted in UTF-8 as i see the ParameterParser and > URIParse classes > does? I would think this causes a problem if the > input is already in > UTF and when you do inString.getBytes("ISO-8859-1") > you get the ISO > representations of characters (for example you get > the string > "Björnbär" which is already correct), which will > mess up when trying > to encode these into UTF-8 again (which will result > in "Bj?rnb?"). > > /Martin > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do > you grep through log files > for problems? Stop! Download the new AJAX search > engine that makes > searching your log files as easy as surfing the > web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_idv37&alloc_id865&op=click > _______________________________________________ > Simpleweb-Support mailing list > Sim...@li... > https://lists.sourceforge.net/lists/listinfo/simpleweb-support > Niall Gallagher __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |