Re: [Jython-users] Working with Unicode in Jython

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

The API of PythonInterpreter probably needs to be reviewed more closely,
since it is surprising without a deep understanding of Jython internals and
how it passes the arg of exec through. But with that said, here's how to
get #1 and #2 resolved. I don't have enough info to understand precisely
your requirement for #3, but it probably has a similar requirement to
carefully follow encoding/decoding.

So with this test code:

import org.python.core.Py;
import org.python.util.PythonInterpreter;

public class TestUnicode {

    public static void main(String[] args) {
try {
    PythonInterpreter runtime = new PythonInterpreter();
    runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint
u'ā'")).encode("UTF-8"));
    runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint
ord(u'ā')")).encode("UTF-8"));
    runtime.exec("print u'\\u0101'");
} catch (Exception ex) {
    System.err.println("Exception: " + ex);
 }
    }

}

I get the following output:

$ java TestUnicode
ā
257
ā

which should be what you wanted.

On Wed, Oct 2, 2013 at 3:42 PM, Pāvils Jurjāns <pas...@gm...> wrote:

> Hello,
>
> I use Jython as a scripting solution for a Java app. The Python code
> that's executed in Jython is read from a Unicode database field. So, before
> the interp.exec(myPyCode) call the code is stored in a Unicode string.
>
> Unfortunately I found that all three functions I need are failing if some
> Unicode characters are involved:
>
> 1) Using unicode characters in the Python code:
>
>  PythonInterpreter interp = new PythonInterpreter(null, new
> PySystemState());
> interp.exec("print ord(\"ā\")");
>
> gives ordinal of "?", or 63, while 257 was expected. How to pass a string
> that contains Unicode characters to the interpreter without messing up them?
>
> (not sure if the special character "ā" will survive the mailing list. It's
> a unicode character with ordinal 257, you can easily get it in Java by this:
> String uniStr = Character.toString((char) 257);
>
> 2) Entering unicode literals with escaping:
> interp.exec("print u\"\\u0101\"");
>
> This code causes this Python error:
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in
> position 0: ordinal not in range(128)
>
> So, what would be the correct way of instantiating the interpreter so that
> the Unicode literals would be happily processed, just as they are processed
> with Python interpreter in Ubuntu?
>
> 3) Reading unicode characters at standard output
>
> I can't seem to find a way how the code executed by Jython could pass
> Unicode characters nicely to the standard output. I have my custom
> OutputStream class' write(int code) method waiting for data (to store it on
> Unicode string), but it never receives anything with charcode above 127.
> Maybe the problem is that I never really managed to get any Unicode string
> working in the Jython interpreter firsthand.
>
> Unfortunately, I work in a country where there are plenty of Unicode
> characters in frequent use, so I can't really cope with this problem by
> using only ANSI table.
>
> Thanks for helping in advance!
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
> _______________________________________________
> Jython-users mailing list
> Jyt...@li...
> https://lists.sourceforge.net/lists/listinfo/jython-users
>
>