Re: [Jython-users] Working with Unicode in Jython

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Jim,

Thanks for showing the approach (Py.newUnicode("# encoding=UTF-8\n< my
unicode code here >")).encode("UTF-8")

This let me pass these tests:

print "ā"
print ord(u"ā")

However, this test

print u"\u0101"

still throws Python exception:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in
position 0: ordinal not in range(128)

Maybe that's not such a big deal, as I now have enough functionality to use
Unicode characters in my code directly, but for the sake of completion
fixing that wouldn't hurt.

Once I've managed to get the Unicode chars over the fence to the Jython
interpreter, I also could capture the standard output. The characters
arrive sort of utf-8 encoded, but strangely, the calls to
OutputStream.write(int char) come with a negative value, so I have to fix
it by adding 256. Here's the full code of my OutputStream class used to
capture Jython Unicode output:

import java.io.OutputStream;
import java.io.ByteArrayOutputStream;
import java.io.UnsupportedEncodingException;

public class MyOutputStream extends OutputStream
{
    private ByteArrayOutputStream buffer;
    private Logger l;
    public MyOutputStream (Logger logger) {
        buffer = new ByteArrayOutputStream();
    }
    public void write(int b) {
        if (b < 0) {
            b += 256;
        }
        if (b > 0xff) {
            b = b & 0xff;
        }
        if (b == 0) {
            flush();
        } else {
            buffer.write(b);
        }
    }
     public void flush() {
        String txt;
        try {
            txt = buffer.toString("utf-8");
        }
    catch (UnsupportedEncodingException e) {
            txt = "#Encoding error#";
        }
        // Use the value of txt for good
        buffer.reset();
    }
}

Perhaps this code is somewhat hacky, but it works.

Pavils

On Thu, Oct 3, 2013 at 4:15 AM, Jim Baker <jb...@zy...> wrote:

> The API of PythonInterpreter probably needs to be reviewed more closely,
> since it is surprising without a deep understanding of Jython internals and
> how it passes the arg of exec through. But with that said, here's how to
> get #1 and #2 resolved. I don't have enough info to understand precisely
> your requirement for #3, but it probably has a similar requirement to
> carefully follow encoding/decoding.
>
> So with this test code:
>
> import org.python.core.Py;
> import org.python.util.PythonInterpreter;
>
> public class TestUnicode {
>
>     public static void main(String[] args) {
> try {
>     PythonInterpreter runtime = new PythonInterpreter();
>     runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint
> u'ā'")).encode("UTF-8"));
>     runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint
> ord(u'ā')")).encode("UTF-8"));
>     runtime.exec("print u'\\u0101'");
> } catch (Exception ex) {
>     System.err.println("Exception: " + ex);
>  }
>     }
>
> }
>
> I get the following output:
>
> $ java TestUnicode
> ā
> 257
> ā
>
> which should be what you wanted.
>
>
> On Wed, Oct 2, 2013 at 3:42 PM, Pāvils Jurjāns <pas...@gm...> wrote:
>
>> Hello,
>>
>> I use Jython as a scripting solution for a Java app. The Python code
>> that's executed in Jython is read from a Unicode database field. So, before
>> the interp.exec(myPyCode) call the code is stored in a Unicode string.
>>
>> Unfortunately I found that all three functions I need are failing if some
>> Unicode characters are involved:
>>
>> 1) Using unicode characters in the Python code:
>>
>>  PythonInterpreter interp = new PythonInterpreter(null, new
>> PySystemState());
>> interp.exec("print ord(\"ā\")");
>>
>> gives ordinal of "?", or 63, while 257 was expected. How to pass a string
>> that contains Unicode characters to the interpreter without messing up them?
>>
>> (not sure if the special character "ā" will survive the mailing list.
>> It's a unicode character with ordinal 257, you can easily get it in Java by
>> this:
>> String uniStr = Character.toString((char) 257);
>>
>> 2) Entering unicode literals with escaping:
>> interp.exec("print u\"\\u0101\"");
>>
>> This code causes this Python error:
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in
>> position 0: ordinal not in range(128)
>>
>> So, what would be the correct way of instantiating the interpreter so
>> that the Unicode literals would be happily processed, just as they are
>> processed with Python interpreter in Ubuntu?
>>
>> 3) Reading unicode characters at standard output
>>
>> I can't seem to find a way how the code executed by Jython could pass
>> Unicode characters nicely to the standard output. I have my custom
>> OutputStream class' write(int code) method waiting for data (to store it on
>> Unicode string), but it never receives anything with charcode above 127.
>> Maybe the problem is that I never really managed to get any Unicode string
>> working in the Jython interpreter firsthand.
>>
>> Unfortunately, I work in a country where there are plenty of Unicode
>> characters in frequent use, so I can't really cope with this problem by
>> using only ANSI table.
>>
>> Thanks for helping in advance!
>>
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>> from
>> the latest Intel processors and coprocessors. See abstracts and register >
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Jython-users mailing list
>> Jyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/jython-users
>>
>>
>