Re: [Jython-users] jythonc not working -- solved, but strange

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

[dman]

>(
>  Short version :
>    jython gives no result when running scripts encoded in latin1 with
>    non-ASCII chars in them.
>)

>| Whatever the encoding used is, it may be unable to handle 0xA9
>| correctly:
>
>Perhaps, and perhaps java is broken?

Don't think so. The first byte of a multibyte sequence must be in the
range 0xC0 to 0xFD. So a file with a latin copyright character is not a
valid UTF-8 text file.

As an additional information point, my JDK1.2 and JDK1.3 also throws
exceptions, but JDK1.4 silently transform the character into the
unicode-undefined character.

>As you can see, CPython (2.2b1) has no problems with the script
>regardless of environment and file encoding, 

That simplicity will not last. Eventually even CPython will have ways to
deal with the encoding of python source files.

>$ LANG=en_US.UTF-8 jython 
>Jython 2.1a1 on java1.3.1 (JIT: null)
>>>> from java.io import *
>>>> f = InputStreamReader( FileInputStream( "hello_latin1.py" ) )
>>>> while 1 : print f.read()
>... 
>10
>35
>Traceback (innermost last):
>  File "<console>", line 1, in ?
>sun.io.MalformedInputException
>        at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:152)
>
>I'll attach the file so you can see it for yourself.  It looks like
>jython catches this exception, but silently ignores it.

Yes. The generated tokenmanager catches all IOExceptions
(MalformedInputException is a subclass of IOException) and interprets
that as eof.

>Perhaps it would be a good idea to try and fall back to latin1, 

Nah, no guessing IMO.

>then display an error message if that fails too.

That doesn't seem to be as easy as it rightly should have been.

regards,
finn