|
From: RUCH,SCOTT (HP-NewJersey,ex2) <sco...@hp...> - 2001-07-06 20:32:46
|
Folks, I just sent these results to two of our colleagues on the ICU4j-discussion list. Since there was an attachment, I didn't want to send it out to everyone. Here is the text of the mail. If anyone wants to see the JSP I used to test with, please contact me and I will send it to you. Scott ---------------------------------------------------------------- Gentlemen, I didn't want to copy the whole list with this attachment, so I'm sending it to you two, since you expressed interest and provided code to test. The attachment is a UTF-8 JSP that demonstrates all 3 conversions: the one from my first mail message to the list and the 2 suggestions you provided. I tested this on Weblogic 6.1 with their distributed JDK 1.3.0 from Sun. Use of the low level converter classes in sun.io results in an exception (no information in it) with the following stack trace: sun.io.MalformedInputException at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:110) at sun.io.ByteToCharConverter.convertAll(ByteToCharConverter.java:146) at jsp_servlet._encoding_conversion_test._jspService(_encoding_conversion_test. java:112) So, indeed, there is a failure down in the bowels, as we suspected. I don't have the source for sun.io (as rightfully I shouldn't ;-) so I can't debug any further. If you can run this, try to cut and paste the various text samples. The Hindi and Thai examples will actually let a character through the conversion before failing. All the other ones I tried failed totally. This is consistent with my earlier results using different text and conversion method 1. Any comments / opinions? I'd love to know if this is a bug in the converters or if this: newString = new String(original.getBytes("8859_1"), "desiredEncoding") just does *not* work. Thanks for your help. Scott > -----Original Message----- > From: Nicolas Braem [mailto:ni...@tr...] > Sent: Thursday, July 05, 2001 7:36 AM > To: 'RUCH,SCOTT (HP-NewJersey,ex2)' > Subject: RE: [ICU4J-discussion] Converting form data from an HTTP POST > req uest in a servlet > > > Hi, > > I don't know really what the problem here is, but I do know > that internally > the constructor uses a sun.io.ByteToCharConverter. So I > propose to use this > directly since if there's a failure a MalformedInputException > is thrown and > that will tell you in more detail what causes the failure. > The code looks > like this: > > sun.io.ByteToCharConverter b = > sun.io.ByteToCharConverter.getConverter("UTF8"); > char[] temp = b.convertAll(original.getBytes("8859_1")); > String newString = new String(temp); > > which will hopefully tell you some more. > > Cheers, > Nicolas > > -----Original Message----- > From: RUCH,SCOTT (HP-NewJersey,ex2) [mailto:sco...@hp...] > Sent: Thursday, July 05, 2001 04:52 > To: 'icu...@ww...' > Subject: [ICU4J-discussion] Converting form data from an HTTP POST > request in a servlet > > > > Not ICU4J-specific, but I figure there's a few ;-) > Java I18n / J2EE experts lurking out here that might > have an opinion on this: > > One of the ways that people deal with the fact that > there is no information in the POST request specifying > the underlying encoding of the form data is to let > the servlet container apply the default ISO 8859-1 > encoding to the data and then convert to the desired > encoding as such: > > newString = new String(original.getBytes("8859_1"), "desiredEncoding") > > I was testing this in a simple JSP with a sampling of > text from various languages encoded in UTF-8. I found > that depending on the original string content, the > conversion would fail sometimes. (Failure = zero-length > string. I interpret a zero length string as an "trans-coding > failure"). > > Consequently, I'm suspect of this technique. Is there > a reasonable explanation why UTF-8 and ISO 8859-1 are > incompatible. Intuitively, it would seem that this > should always work, but I've observed it failing... > > Thanks, > > Scott > > _______________________________________________ > ICU4J-discussion mailing list > ICU...@ww... > http://www-124.ibm.com/developerworks/opensource/mailman/ listinfo/icu4j -discussion |