|
From: dman <ds...@ri...> - 2001-11-26 19:57:37
|
On Mon, Nov 26, 2001 at 07:37:33PM +0000, Finn Bock wrote: | [dman] |=20 | >( | > Short version : | > jython gives no result when running scripts encoded in latin1 with | > non-ASCII chars in them. | >) |=20 | >| Whatever the encoding used is, it may be unable to handle 0xA9 | >| correctly: | > | >Perhaps, and perhaps java is broken? |=20 | Don't think so. The first byte of a multibyte sequence must be in the | range 0xC0 to 0xFD. So a file with a latin copyright character is not a | valid UTF-8 text file. At least someone here has read the spec :-). | As an additional information point, my JDK1.2 and JDK1.3 also throws | exceptions, but JDK1.4 silently transform the character into the | unicode-undefined character. I'm not sure that is a good thing (jdk1.4), but maybe you don't have to deal with it. Consider someone who has some source in latin1 (or something else) and has a=F6c =3D "foo" a=FCc =3D "bar" If java uses UTF-8 as the encoding, then those two names will end up being the same if jython will treat the unicode-undefined character as a regular character. This would be an additional condition that should raise an exception. | >As you can see, CPython (2.2b1) has no problems with the script | >regardless of environment and file encoding,=20 |=20 | That simplicity will not last. Eventually even CPython will have ways t= o | deal with the encoding of python source files. Ok. | >$ LANG=3Den_US.UTF-8 jython=20 | >Jython 2.1a1 on java1.3.1 (JIT: null) | >>>> from java.io import * | >>>> f =3D InputStreamReader( FileInputStream( "hello_latin1.py" ) ) | >>>> while 1 : print f.read() | >...=20 | >10 | >35 | >Traceback (innermost last): | > File "<console>", line 1, in ? | >sun.io.MalformedInputException | > at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:152) | > | >I'll attach the file so you can see it for yourself. It looks like | >jython catches this exception, but silently ignores it. |=20 | Yes. The generated tokenmanager catches all IOExceptions | (MalformedInputException is a subclass of IOException) and interprets | that as eof. EOF would certainly explain why I didn't get any output or error message. Jython successfully executed nothing :-). | >Perhaps it would be a good idea to try and fall back to latin1,=20 |=20 | Nah, no guessing IMO. Ok. | >then display an error message if that fails too. |=20 | That doesn't seem to be as easy as it rightly should have been. Couldn't you just catch that exception and print out a message then exit right before catching IOException? It might be better to convert the exception into a different (python) exception. Yeah, for execfile() the interpreter shouldn't exit because the file is encoded wrong. -D --=20 "...the word HACK is used as a verb to indicate a massive amount of nerd-like effort." -Harley Hahn, A Student's Guide to Unix |