This works for me the way it does for Jim on Linux Mint (a lot like Ubuntu), but on Windows I receive a similar error to Pāvils.
dev>jython
Jython 2.7b1+ (default:f731a595b90a+, Oct 1 2013, 08:42:42)
[Java HotSpot(TM) 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0_35
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print sys.stdout.encoding
Cp1252
>>> print u"\u0101"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\Lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0101' in position 0: character maps to <undefined>

I can make it work (on Windows) by choosing cp1257, or 65001/UTF-8 at least for output. But we are dealing with Linux here, not Windows aren't we?

The first problem is surely that the stream encoding is set to ascii: that's what the message in #2 means. So I'd look at whether the console itself has a UTF-8 locale (mine is en_GB.UTF-8), whether the terminal window encoding is consistent with that (on my system the gterm doesn't follow changes in locale), and whether Jython is picking this up correctly via the JVM (check sys.stdout.encoding). You can specify the encoding with -Dpython.console.encoding=UTF-8 on the command line, but the environment has to agree.

Jeff
Jeff Allen
On 03/10/2013 09:45, Pāvils Jurjāns wrote:
Jim,

Thanks for showing the approach (Py.newUnicode("# encoding=UTF-8\n< my unicode code here >")).encode("UTF-8")

This let me pass these tests:

print "ā"
print ord(u"ā")

However, this test

print u"\u0101"

still throws Python exception:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in position 0: ordinal not in range(128)

Maybe that's not such a big deal, as I now have enough functionality to use Unicode characters in my code directly, but for the sake of completion fixing that wouldn't hurt.

Once I've managed to get the Unicode chars over the fence to the Jython interpreter, I also could capture the standard output. The characters arrive sort of utf-8 encoded, but strangely, the calls to OutputStream.write(int char) come with a negative value, so I have to fix it by adding 256. Here's the full code of my OutputStream class used to capture Jython Unicode output:

import java.io.OutputStream;
import java.io.ByteArrayOutputStream;
import java.io.UnsupportedEncodingException;

public class MyOutputStream extends OutputStream
{
    private ByteArrayOutputStream buffer;
    private Logger l;
    public MyOutputStream (Logger logger) {
        buffer = new ByteArrayOutputStream();
    }
    public void write(int b) {
        if (b < 0) {
            b += 256;
        }
        if (b > 0xff) {
            b = b & 0xff;
        }
        if (b == 0) {
            flush();
        } else {
            buffer.write(b);
        }
    }
    public void flush() {
        String txt;
        try {
            txt = buffer.toString("utf-8");
        }
    catch (UnsupportedEncodingException e) {
            txt = "#Encoding error#";
        }
        // Use the value of txt for good
        buffer.reset();
    }
}

Perhaps this code is somewhat hacky, but it works.

Pavils


On Thu, Oct 3, 2013 at 4:15 AM, Jim Baker <jbaker@zyasoft.com> wrote:
The API of PythonInterpreter probably needs to be reviewed more closely, since it is surprising without a deep understanding of Jython internals and how it passes the arg of exec through. But with that said, here's how to get #1 and #2 resolved. I don't have enough info to understand precisely your requirement for #3, but it probably has a similar requirement to carefully follow encoding/decoding.

So with this test code:

import org.python.util.PythonInterpreter;

public class TestUnicode {

    public static void main(String[] args) {
try {
   PythonInterpreter runtime = new PythonInterpreter();
   runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint u'ā'")).encode("UTF-8"));
   runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint ord(u'ā')")).encode("UTF-8"));
   runtime.exec("print u'\\u0101'");
} catch (Exception ex) {
   System.err.println("Exception: " + ex);
}
    }

}

I get the following output:

$ java TestUnicode
ā
257
ā

which should be what you wanted.


On Wed, Oct 2, 2013 at 3:42 PM, Pāvils Jurjāns <passiday@gmail.com> wrote:
Hello,

I use Jython as a scripting solution for a Java app. The Python code that's executed in Jython is read from a Unicode database field. So, before the interp.exec(myPyCode) call the code is stored in a Unicode string.

Unfortunately I found that all three functions I need are failing if some Unicode characters are involved:

1) Using unicode characters in the Python code:

PythonInterpreter interp = new PythonInterpreter(null, new PySystemState());
interp.exec("print ord(\"ā\")");

gives ordinal of "?", or 63, while 257 was expected. How to pass a string that contains Unicode characters to the interpreter without messing up them?

(not sure if the special character "ā" will survive the mailing list. It's a unicode character with ordinal 257, you can easily get it in Java by this:
String uniStr = Character.toString((char) 257);

2) Entering unicode literals with escaping:
interp.exec("print u\"\\u0101\"");

This code causes this Python error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in position 0: ordinal not in range(128)

So, what would be the correct way of instantiating the interpreter so that the Unicode literals would be happily processed, just as they are processed with Python interpreter in Ubuntu?

3) Reading unicode characters at standard output

I can't seem to find a way how the code executed by Jython could pass Unicode characters nicely to the standard output. I have my custom OutputStream class' write(int code) method waiting for data (to store it on Unicode string), but it never receives anything with charcode above 127. Maybe the problem is that I never really managed to get any Unicode string working in the Jython interpreter firsthand.

Unfortunately, I work in a country where there are plenty of Unicode characters in frequent use, so I can't really cope with this problem by using only ANSI table.

Thanks for helping in advance!

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
Jython-users mailing list
Jython-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-users





------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk


_______________________________________________
Jython-users mailing list
Jython-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-users