The API of PythonInterpreter probably needs to be reviewed more closely, since it is surprising without a deep understanding of Jython internals and how it passes the arg of exec through. But with that said, here's how to get #1 and #2 resolved. I don't have enough info to understand precisely your requirement for #3, but it probably has a similar requirement to carefully follow encoding/decoding.

So with this test code:

import org.python.core.Py;
import org.python.util.PythonInterpreter;

public class TestUnicode {

    public static void main(String[] args) {
try {
   PythonInterpreter runtime = new PythonInterpreter();
   runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint u'ā'")).encode("UTF-8"));
   runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint ord(u'ā')")).encode("UTF-8"));
   runtime.exec("print u'\\u0101'");
} catch (Exception ex) {
   System.err.println("Exception: " + ex);


I get the following output:

$ java TestUnicode

which should be what you wanted.

On Wed, Oct 2, 2013 at 3:42 PM, Pāvils Jurjāns <> wrote:

I use Jython as a scripting solution for a Java app. The Python code that's executed in Jython is read from a Unicode database field. So, before the interp.exec(myPyCode) call the code is stored in a Unicode string.

Unfortunately I found that all three functions I need are failing if some Unicode characters are involved:

1) Using unicode characters in the Python code:

PythonInterpreter interp = new PythonInterpreter(null, new PySystemState());
interp.exec("print ord(\"ā\")");

gives ordinal of "?", or 63, while 257 was expected. How to pass a string that contains Unicode characters to the interpreter without messing up them?

(not sure if the special character "ā" will survive the mailing list. It's a unicode character with ordinal 257, you can easily get it in Java by this:
String uniStr = Character.toString((char) 257);

2) Entering unicode literals with escaping:
interp.exec("print u\"\\u0101\"");

This code causes this Python error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in position 0: ordinal not in range(128)

So, what would be the correct way of instantiating the interpreter so that the Unicode literals would be happily processed, just as they are processed with Python interpreter in Ubuntu?

3) Reading unicode characters at standard output

I can't seem to find a way how the code executed by Jython could pass Unicode characters nicely to the standard output. I have my custom OutputStream class' write(int code) method waiting for data (to store it on Unicode string), but it never receives anything with charcode above 127. Maybe the problem is that I never really managed to get any Unicode string working in the Jython interpreter firsthand.

Unfortunately, I work in a country where there are plenty of Unicode characters in frequent use, so I can't really cope with this problem by using only ANSI table.

Thanks for helping in advance!

October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
Jython-users mailing list