[Jython-users] Working with Unicode in Jython

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

I use Jython as a scripting solution for a Java app. The Python code that's
executed in Jython is read from a Unicode database field. So, before the
interp.exec(myPyCode) call the code is stored in a Unicode string.

Unfortunately I found that all three functions I need are failing if some
Unicode characters are involved:

1) Using unicode characters in the Python code:

PythonInterpreter interp = new PythonInterpreter(null, new PySystemState());
interp.exec("print ord(\"ā\")");

gives ordinal of "?", or 63, while 257 was expected. How to pass a string
that contains Unicode characters to the interpreter without messing up them?

(not sure if the special character "ā" will survive the mailing list. It's
a unicode character with ordinal 257, you can easily get it in Java by this:
String uniStr = Character.toString((char) 257);

2) Entering unicode literals with escaping:
interp.exec("print u\"\\u0101\"");

This code causes this Python error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in
position 0: ordinal not in range(128)

So, what would be the correct way of instantiating the interpreter so that
the Unicode literals would be happily processed, just as they are processed
with Python interpreter in Ubuntu?

3) Reading unicode characters at standard output

I can't seem to find a way how the code executed by Jython could pass
Unicode characters nicely to the standard output. I have my custom
OutputStream class' write(int code) method waiting for data (to store it on
Unicode string), but it never receives anything with charcode above 127.
Maybe the problem is that I never really managed to get any Unicode string
working in the Jython interpreter firsthand.

Unfortunately, I work in a country where there are plenty of Unicode
characters in frequent use, so I can't really cope with this problem by
using only ANSI table.

Thanks for helping in advance!