|
From: <bc...@wo...> - 2001-11-26 19:34:06
|
[dman] >( > Short version : > jython gives no result when running scripts encoded in latin1 with > non-ASCII chars in them. >) >| Whatever the encoding used is, it may be unable to handle 0xA9 >| correctly: > >Perhaps, and perhaps java is broken? Don't think so. The first byte of a multibyte sequence must be in the range 0xC0 to 0xFD. So a file with a latin copyright character is not a valid UTF-8 text file. As an additional information point, my JDK1.2 and JDK1.3 also throws exceptions, but JDK1.4 silently transform the character into the unicode-undefined character. >As you can see, CPython (2.2b1) has no problems with the script >regardless of environment and file encoding, That simplicity will not last. Eventually even CPython will have ways to deal with the encoding of python source files. >$ LANG=en_US.UTF-8 jython >Jython 2.1a1 on java1.3.1 (JIT: null) >>>> from java.io import * >>>> f = InputStreamReader( FileInputStream( "hello_latin1.py" ) ) >>>> while 1 : print f.read() >... >10 >35 >Traceback (innermost last): > File "<console>", line 1, in ? >sun.io.MalformedInputException > at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:152) > >I'll attach the file so you can see it for yourself. It looks like >jython catches this exception, but silently ignores it. Yes. The generated tokenmanager catches all IOExceptions (MalformedInputException is a subclass of IOException) and interprets that as eof. >Perhaps it would be a good idea to try and fall back to latin1, Nah, no guessing IMO. >then display an error message if that fails too. That doesn't seem to be as easy as it rightly should have been. regards, finn |