|
From: dman <ds...@ri...> - 2001-11-26 17:11:56
|
On Mon, Nov 26, 2001 at 03:58:39PM +0000, Finn Bock wrote:
| [dman]
(
Short version :
jython gives no result when running scripts encoded in latin1 with
non-ASCII chars in them.
)
| >What difference does it make to jython whether a (python) source file
| >is saved in latin1 or utf-8? In any case, I think it is a gross error
| >to simply terminate with no message when encountering a file that it
| >doesn't like.
|
| Sure. Normally jython doesn't. So what is special about woody?
See below. I have now figured out the source of this problem.
| >I started the conversion to utf-8 from main.py,
|
| I have now removed the latin-1 copyright character in the CVS version.
Cool. That will certainly fix all portability problems since ASCII is
a common subset of all encodings AFAIK (latin1 and utf-8 for sure).
| >...
| >The interesting thing about jythonc's source files is that they all
| >have the copyright symbol in a comment at the top of the file. In
| >'latin1' this is character 0xa9.
|
| The python source files is read as text files with a InputStreamReader
| using the default encoding for the platform. Normally that is a good way
| to read text files but a sideeffect is that python source programs with
| non-ascii characters isn't portable to other platforms with a different
| encoding.
|
| I don't know what the cause is, but these experiments might help shed
| light on it.
|
| What file encoding is used in your setup of woody?
|
| >>> import java
| >>> java.lang.System.getProperty("file.encoding")
| 'Cp1252'
| >>>
The woody machine I have at work had no problems running jythonc, just
my machine at home. I remembered late last night that I had set $LANG
to en_US.UTF-8 at home. Now that I am at work, I checked with that
machine and it has $LANG set to the default of "C". If I tried
"LANG=en_US.UTF-8 jythonc --help" it failed the same as it was doing
at home.
With LANG=C, the enconding used by java is "ISO-8859-1", with
LANG=en_US.UTF-8 the enconding is "UTF-8".
| Whatever the encoding used is, it may be unable to handle 0xA9
| correctly:
Perhaps, and perhaps java is broken?
I created "hello world" with the copyright symbol in a comment. I did
this with both latin1 and utf-8.
$ LANG=en_US python2.2 hello_latin1.py
hello world
$ LANG=en_US python2.2 hello_utf-8.py
hello world
$ LANG=en_US.UTF-8 python2.2 hello_latin1.py
hello world
$ LANG=en_US.UTF-8 python2.2 hello_utf-8.py
hello world
$ LANG=en_US jython hello_latin1.py
hello world
$ LANG=en_US jython hello_utf-8.py
hello world
$ LANG=en_US.UTF-8 jython hello_latin1.py
$ LANG=en_US.UTF-8 jython hello_utf-8.py
hello world
$
As you can see, CPython (2.2b1) has no problems with the script
regardless of environment and file encoding, however Java can't handle
a latin1 file with the environment set to UTF-8.
I should do some experiments at the Java level and see what it does in
that situation. Maybe it causes a problem in Jython's parsing (ie the
comments ends up extending to the end of the file) or maybe there is
some error that is silenty ignored.
| >>> from java import io
| >>> s = io.FileOutputStream("foo")
| >>> s.write("\xA9")
| >>> s.close()
| >>> s = io.FileReader("foo")
| >>> print hex(s.read())
| 0xa9
| >>> s.close()
| >>>
I just did a quick test using jython (interactive coding is very
cool!) :
$ LANG=en_US.UTF-8 jython
Jython 2.1a1 on java1.3.1 (JIT: null)
>>> from java.io import *
>>> f = InputStreamReader( FileInputStream( "hello_latin1.py" ) )
>>> while 1 : print f.read()
...
10
35
Traceback (innermost last):
File "<console>", line 1, in ?
sun.io.MalformedInputException
at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:152)
at java.io.InputStreamReader.convertInto(InputStreamReader.java:137)
at java.io.InputStreamReader.fill(InputStreamReader.java:166)
at java.io.InputStreamReader.read(InputStreamReader.java:249)
at java.io.InputStreamReader.read(InputStreamReader.java:222)
at java.lang.reflect.Method.invoke(Native Method)
at org.python.core.PyReflectedFunction.__call__(PyReflectedFunction.java:160)
at org.python.core.PyMethod.__call__(PyMethod.java:96)
at org.python.core.PyObject.__call__(PyObject.java:262)
at org.python.core.PyInstance.invoke(PyInstance.java:244)
at org.python.pycode._pyx3.f$0(<console>:1)
at org.python.pycode._pyx3.call_function(<console>)
at org.python.core.PyTableCode.call(PyTableCode.java:198)
at org.python.core.PyCode.call(PyCode.java:13)
at org.python.core.Py.runCode(Py.java:1075)
at org.python.core.Py.exec(Py.java:1096)
at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:145)
at org.python.util.InteractiveInterpreter.runcode(InteractiveInterpreter.java:87)
at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:68)
at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:42)
at org.python.util.InteractiveConsole.push(InteractiveConsole.java:83)
at org.python.util.InteractiveConsole.interact(InteractiveConsole.java:62)
at org.python.util.jython.main(jython.java:183)
sun.io.MalformedInputException: sun.io.MalformedInputException
>>>
I'll attach the file so you can see it for yourself. It looks like
jython catches this exception, but silently ignores it. Perhaps it
would be a good idea to try and fall back to latin1, then display an
error message if that fails too.
| >I use (g)vim 6.0 as my editor. As
| >you may already know it has two variables, 'enc' and 'fenc'.
|
| You could change the file encoding of the source files. You would then
I did.
| have to change the encoding used by java as well. But I strongly doubt
It was already changed -- changing the encoding of the files caused
them to match the encoding java was using.
| that you want to go there. If latin1 is suitable for your country and
| language, stick with that.
I suppose maybe I should. At least I know what to look for now if it
happens again :-).
-D
--
Even youths grow tired and weary,
and young men stumble and fall;
but those who hope in the Lord
will renew their strength.
They will soar on wings like eagles;
they will run and not grow weary,
they will walk and not be faint.
Isaiah 40:31
|