Thread: [Jython-users] Help for issue 1183 Jython 2.2.1 cannot pass unicode to a func in a py file

Brought to you by: bckfnn, bwarsaw, bzimmer, cgroves, and 4 others

jython-users

[Jython-users] Help for issue 1183 Jython 2.2.1 cannot pass unicode to a func in a py file

From: Rose P. <ros...@or...> - 2008-12-18 00:37:31

Hi, Jython gurus:

I need some help on running Jython 2.2.1 with multi-byte strings.

Jython 2.2.1 cannot pass a unicode String correctly to a function 
defined in a py script. The value of the parameter is converted to 
different \x format.
This is not happened in Jython 2.1.

To reproduce it, define a py script, test.py file. The test.py file 
defines a function called create() which simply returns the value of the 
parameter:

======= start of test.py   ======
def create(name):
    return name
======= end of test.py  =====

Then start Jython 2.1 and run the function create() from the py file:

java -classpath jython.jar.2.1 org.python.util.jython
Jython 2.1 on java1.6.0_05 (JIT: null)

execfile("test.py")
create('\u4f7f\u7528')  <-- input Japanese characters
u'\u4F7F\u7528'             <-- return the same unicode representing the
                              Japanese characters with length 2


We can see the output of create function returns a two-byte unicode, 
which can be displayed correctly by Java System.out.println() method.

Then we try Jython 2.2.1 with the same step:

java -classpath jython.jar.2.2.1 org.python.util.jython
Jython 2.2.1 on java1.6.0_05

execfile("test.py")
create('\u4f7f\u7528')   <-- input Japanese characters
'\xBB\xC8\xCD\xD1'           <-- returns different values with length 4.

The \xBB\xC8\xCD\xD1 are not recognized by java so we always get "????" 
if use System.out.println() to print.

This is a regression for Jython 2.2.1.

This is going to affect all the customer written py files. Is there a 
workaround for this in Jython? Jython 2.5 seems to have the same issue.

Thanks,
Rose

Re: [Jython-users] Help for issue 1183 Jython 2.2.1 cannot pass unicode to a func in a py file

From: Charlie G. <cha...@gm...> - 2008-12-22 19:57:14

Hi Rose,

On Wed, Dec 17, 2008 at 4:37 PM, Rose Pan <ros...@or...> wrote:
> Jython 2.2.1 cannot pass a unicode String correctly to a function
> defined in a py script. The value of the parameter is converted to
> different \x format.

You're actually not passing unicode strings to the function.  To
create a unicode string in Python, you prepend u to it.

> create('\u4f7f\u7528')  <-- input Japanese characters

This should be changed to create(u'\u4f7f\u7528').  If I do that, I
get a unicode string out of the create function using Jython 2.2.
Jython 2.1's implementation had a bug in it that allowed unicode
strings to be passed around in a byte string e.g. those created with
str or quotes with no u, and it had undefined behavior when converting
into real bytes.  This was fixed in 2.2.

Charlie

Re: [Jython-users] Help for issue 1183 Jython 2.2.1 cannot pass unicode to a func in a py file

From: Rose P. <ros...@or...> - 2009-01-11 08:49:51

Hi, Charlie,

Thanks for the info.

I understand the sample here is for passing a Japanese literal from a py
file to a java method. This makes sense to me.

I might have further question on this. Could you please show us an
example while InteractiveInterpreter is embedded in Java? I am more
interested in the following case:

1. Read a string from Java. The string can be either representing a
variable setting (like a="multi-byte string") or a function name with
parameters in a py file (like print("multi-byte string")). .

2. Pass the string to InteractiveInterpreter.runsource(string) method.

3. Jython invokes the function in the py file which simply passes the
parameter (the Japanese literal for example) of the function to a java
method.

With the current model in jython 2.2.1, the string read from java needs
to be changed to have a 'u' prepened before passing to
InteractiveInterpreter.runsource() method , so the string can be passed
correctly to the function in the py file and then to the java method.
This is already showed in the sample.

But the issue here is the string representing the variable setting can
also be passed to InteractiveInterpreter.runsource() method with the 'u'
prepended. When trying to show the value of the variable using the
Jython print command, it will give us error.

Jython 2.2.1 on java1.6.0_05
Type "copyright", "credits" or "license" for more information.
>>> execfile("test.py")
>>> testdo(u'\u4F7F\u7528')
u'\u4F7F\u7528'
>>> a=u'\u4F7F\u7528'
>>> print a
Traceback (innermost last):
File "<console>", line 1, in ?
UnicodeError: ascii encoding error: ordinal not in range(128)

The print command in Jython only print the string with the format like
"'\xBB\xC8\xCD\xD1'".

So I might have two more questions:

1. Is there a way to handle both cases (setting variables and calling
functions) when embedding InteractiveInterpreter in java?

2. Since the unicode characters read from java can not be directly
passed to InteractiveInterpreter.runsource(), it has to convert to
jython unicode string. Is there a convenient method in Jython to convert
java string into jython unicode string that we can call in java code, so
the 'u' can be prepended at the beginning of the multi-byte string, not
the beginning of the whole string?

If there is a sample for two of the cases, that would be great.

Really appreciate your help.

Thanks,
Rose


Charlie Groves wrote:
> On Thu, Jan 8, 2009 at 12:39 PM, Peter Bower <pet...@or...> wrote:
>   
>> Given the following scenario:
>>
>> 1) assign a Japanese literal to a variable (in console or in py file)
>>
>> 2) print the variable
>>
>> 3) pass the variable to a java method
>>     
>
> The following Python module:
>
> j = u'\u521d\u671f'
> for c in j:
>     print ord(c)
> import sys
> sys.setdefaultencoding('utf-8')
> print j
> import Test
> Test.print(j)
>
> and Java class:
>
> import java.io.PrintStream;
> import java.io.UnsupportedEncodingException;
>
> public class Test
> {
>     public static void print (String val) throws UnsupportedEncodingException
>     {
>         for (int i = 0; i < val.length(); i++) {
>             System.out.println((int)val.charAt(i));
>         }
>         PrintStream utf8Stream = new PrintStream(System.out, true, "UTF-8");
>         utf8Stream.println(val);
>     }
> }
>
> prints
>
> 21021
> 26399
> 初期
> 21021
> 26399
> 初期
>
> on my terminal in Mac OS X(the third and sixth line may be garbled in
> this email, but they actually print out as the characters represented
> by \u521d\u671f,  I swear:).
>
>   
>> In Jython 2.1, it was very simple
>>
>>     name = "<Japanese characters>"
>>
>>     print name
>>
>>     test.create(name)
>>
>> Everything works, it prints correctly, and the Java method gets the expected
>> string. The model appears
>> to be:
>>
>>     - Literals are read with the default character set (or that of the
>> console encoding)
>>
>>     - Strings can flow from Jython to Java and back without requiring
>> conversion
>>
>>     - String are printed using the String.getBytes() method which encodes
>> using default character set
>>     
>
> This actually doesn't work in all cases, and is one of the reasons
> this was changed for 2.2.  Java's default character doesn't always
> match the encoding of the console it's using e.g. the default encoding
> is MacRoman on Mac OS X, but the console uses utf-8 by default.
> That's why my Java source above makes its own PrintStream.  System.out
> uses MacRoman and doesn't print properly to the console.  This was
> particularly troublesome as Jython would read source files in the
> default encoding on one system, and if that source file was used on a
> system with a different default encoding, it would either explode or
> produce gibberish when the differences in encoding were encountered.
>
> The bigger reason for the change was to better conform to Python's
> Unicode model.  Python has two "String" types, str and unicode.  str
> is a byte string and is created by unadorned quotes.  unicode is a
> sequence of unicode characters like Java's String and is created by
> prepending a u to the quotes.  Allowing unicode characters in str as
> Jython 2.1 did lead to mismatches between CPython and Jython's models,
> and caused the unicode values in the strings to be truncated when
> various str operations were performed.  Whenever you have character
> data, you want unicode objects and strings created with u''.
>
>   
>> The 2.2.1 model appears to be
>>
>>     - literals are read with the ISO-8859-1 character set from .py files and
>> by default in the console.
>>
>>     - they flow from Jython to Java as is
>>
>>     - strings are printed using the raw bytes (PyString.to_bytes())
>>     
>
> This is correct.  The encoding used to read from the interpreter is
> controlled with python.console.encoding, but otherwise things are
> assumed to be raw byte values.  There's no way to have encoded unicode
> values in source files in Python 2.2.  That was added by PEP 263 in
> Python 2.3.  The only way to make unicode literals in 2.2 is with \u
> values for characters outside the ascii character set.
>
>   
>> - is u' required? Does Jython 2.2.1 continue to support the non u' format?
>> Or should
>>   unicode("japanese characters", "jp charset") be used instead (if a jp
>> charset was available)?
>>     
>
> Either u or calling unicode will work.  If you have a large body of
> existing source, you can use something like the native2ascii tool that
> comes with Java to convert the encoded Japanese values into unicode
> escapes.  If you need to do it dynamically, something like
> http://www.google.com/codesearch/p?hl=en#MzR-vajYaSo/kaffe-1.1.5/libraries/javalib/gnu/classpath/tools/native2ascii/Native2ASCII.java
> would work.
>
>   
>> - should print <unicode variable> work out of the box? Or do we [and
>> customers] need to set the default encoding?
>>     
>
> Yes, you'll need to set Python's default encoding to the encoding that
> the console uses.  I don't know of a way to do this across the Java
> platform.  System.getProperty("file.encoding") returns Java's default
> encoding, but that doesn't always line up with what the console
> expects.
>
>   
>> - what character set should Java methods expect the string to be in:
>> ("ISO-8859-1", the
>>   default character set, or something else)?
>>     
>
> If you've got a unicode value in Python, the String will consist of
> the same unicode characters and no encoding is needed.  If you have a
> str of encoded characters , the String will consist of chars of the
> same the same length in whatever encoding the str came in as.
>
> I'm sorry this transition is proving to be so painful; Jython's
> support for unicode was pretty broken in 2.1, and it'll finally work
> decently in 2.5 with the addition of PEP 263.
>
>

Re: [Jython-users] Help for issue 1183 Jython 2.2.1 cannot pass unicode to a func in a py file

From: Charlie G. <cha...@gm...> - 2009-01-12 17:17:54

On Sun, Jan 11, 2009 at 12:49 AM, Rose Pan <ros...@or...> wrote:
> Jython 2.2.1 on java1.6.0_05
> Type "copyright", "credits" or "license" for more information.
>>>> execfile("test.py")
>>>> testdo(u'\u4F7F\u7528')
> u'\u4F7F\u7528'
>>>> a=u'\u4F7F\u7528'
>>>> print a
> Traceback (innermost last):
> File "<console>", line 1, in ?
> UnicodeError: ascii encoding error: ordinal not in range(128)
>
> The print command in Jython only print the string with the format like
> "'\xBB\xC8\xCD\xD1'".

This happens because the jython's default encoding is ascii, and
that's what it uses to encode things through print.  If you call
sys.setdefaultencoding(<your console's encoding>) before this, jython
will print properly.

> 1. Is there a way to handle both cases (setting variables and calling
> functions) when embedding InteractiveInterpreter in java?

I'm not sure what you're asking here.  The cases are setting a
variable with a unicode string and calling a function with that string
from an embedded InteractiveInterpreter?  I don't understand how
that's different than running a script directly or by using jython at
the console.

> 2. Since the unicode characters read from java can not be directly
> passed to InteractiveInterpreter.runsource(), it has to convert to
> jython unicode string. Is there a convenient method in Jython to convert
> java string into jython unicode string that we can call in java code, so
> the 'u' can be prepended at the beginning of the multi-byte string, not
> the beginning of the whole string?

Jython has no builtin way to convert str literals to unicode literals.
 However, you can encode the Java String source you're passing in to
the interpreter, and then decode the Strings that come out of Jython
into your Java code.  As long as your users aren't writing the Java
themselves, nothing on their end will need to change.  Here's an
example of that:

import java.io.PrintStream;
import java.nio.charset.Charset;

import org.python.util.InteractiveInterpreter;

public class Test
{
    static String encoding = "UTF-8";

    public static void main (String[] args)
        throws Exception
    {
        String unicode = "a = '\u4F7F\u7528'";
        new PrintStream(System.out, true, encoding).println("From
Java: " + unicode);
        InteractiveInterpreter interp = new InteractiveInterpreter();
        String source = unicode + "; print 'From Jython: %s' % a;
import Test; Test.print(a)";
        byte[] encoded = source.getBytes(encoding);
        String encodedSourceInString = new String(encoded,
Charset.forName("ISO-8859-1"));
        interp.runsource(encodedSourceInString);
    }

    public static void print (String encodedStringFromPython)
        throws Exception
    {
        byte[] encoded = encodedStringFromPython.getBytes("ISO-8859-1");
        String realString = new String(encoded, encoding);
        new PrintStream(System.out, true, encoding).println("From Java
>From Jython: " + realString);
    }
}

which prints out the Japanese String directly from Java, from Jython,
and then in Java again in a call from Jython.  There's some weird
stuff going on in there, so it's probably worth examining a few of the
bits more closely.

First, I set an encoding I'm going to use for printing to the console
from Java and for sending Strings into Jython.  On my Mac, the console
uses UTF-8, so I use that as the encoding, but you'll need to get the
encoding of whatever terminal you're using expects and use that
instead.

With that encoding, I print a Japanese String to the console from Java
just to make sure things are hunky-dory at a base level.  I then make
an InteractiveInterpreter and some Python source to run in it.  The
Python source runs the assignment, prints the assign variable and then
calls back into the Test class with that variable.  I encode the
String into bytes using the console encoding, and then I turn it back
into a String for use in InteractiveInterpreter.runsource.  This is a
slightly bizarre use of Strings and Charsets.  It uses the fact that
ISO-8859-1 is a direct mapping between its byte and char
representation to make a String out of the encoded bytes.  This lets
the encoded representation pass into the interpreter unmolested.  With
that encoded string, the Python source's print of the variable works
properly as it's a str already encoded in the console's encoding, and
doesn't pass through Jython's default encoding.

Finally, Jython calls back into Test.print with the value from a in
the Python source.  This is still an encoded Python str, so I use the
same ISO-8859-1 trick in reverse to get the encoded bytes out, and
turn those bytes back into a String with its constructor that takes an
encoding.  With a real Java String again, I'm able to print the value
from Java.

This isn't the prettiest of solutions, but it's the only way I can
think of to make this work without changing the underlying source to
use unicode literals.  If you do have some leeway on that, I'd
recommend going that way, but if you're stuck with the encoded source,
I believe this will work.

Re: [Jython-users] Help for issue 1183 Jython 2.2.1 cannot pass unicode to a func in a py file

From: Rose P. <ros...@or...> - 2009-01-12 22:27:21

Hi, Charlie,

Thanks for the detail explanation. I replaced the encoding to "euc_jp" 
which my terminal is using and tried out the sample again. It works when 
printing out in java directly. But it does not work when printing out in 
Jython and the java in the call from Jython. It works Here is the result:

 From Java: a = '\u4f7f\u7528'
 From Jython: (
 From Java From Jython: ??

Thanks,
Rose


Charlie Groves wrote:
> On Sun, Jan 11, 2009 at 12:49 AM, Rose Pan <ros...@or...> wrote:
>   
>> Jython 2.2.1 on java1.6.0_05
>> Type "copyright", "credits" or "license" for more information.
>>     
>>>>> execfile("test.py")
>>>>> testdo(u'\u4F7F\u7528')
>>>>>           
>> u'\u4F7F\u7528'
>>     
>>>>> a=u'\u4F7F\u7528'
>>>>> print a
>>>>>           
>> Traceback (innermost last):
>> File "<console>", line 1, in ?
>> UnicodeError: ascii encoding error: ordinal not in range(128)
>>
>> The print command in Jython only print the string with the format like
>> "'\xBB\xC8\xCD\xD1'".
>>     
>
> This happens because the jython's default encoding is ascii, and
> that's what it uses to encode things through print.  If you call
> sys.setdefaultencoding(<your console's encoding>) before this, jython
> will print properly.
>
>   
>> 1. Is there a way to handle both cases (setting variables and calling
>> functions) when embedding InteractiveInterpreter in java?
>>     
>
> I'm not sure what you're asking here.  The cases are setting a
> variable with a unicode string and calling a function with that string
> from an embedded InteractiveInterpreter?  I don't understand how
> that's different than running a script directly or by using jython at
> the console.
>
>   
>> 2. Since the unicode characters read from java can not be directly
>> passed to InteractiveInterpreter.runsource(), it has to convert to
>> jython unicode string. Is there a convenient method in Jython to convert
>> java string into jython unicode string that we can call in java code, so
>> the 'u' can be prepended at the beginning of the multi-byte string, not
>> the beginning of the whole string?
>>     
>
> Jython has no builtin way to convert str literals to unicode literals.
>  However, you can encode the Java String source you're passing in to
> the interpreter, and then decode the Strings that come out of Jython
> into your Java code.  As long as your users aren't writing the Java
> themselves, nothing on their end will need to change.  Here's an
> example of that:
>
> import java.io.PrintStream;
> import java.nio.charset.Charset;
>
> import org.python.util.InteractiveInterpreter;
>
> public class Test
> {
>     static String encoding = "UTF-8";
>
>     public static void main (String[] args)
>         throws Exception
>     {
>         String unicode = "a = '\u4F7F\u7528'";
>         new PrintStream(System.out, true, encoding).println("From
> Java: " + unicode);
>         InteractiveInterpreter interp = new InteractiveInterpreter();
>         String source = unicode + "; print 'From Jython: %s' % a;
> import Test; Test.print(a)";
>         byte[] encoded = source.getBytes(encoding);
>         String encodedSourceInString = new String(encoded,
> Charset.forName("ISO-8859-1"));
>         interp.runsource(encodedSourceInString);
>     }
>
>     public static void print (String encodedStringFromPython)
>         throws Exception
>     {
>         byte[] encoded = encodedStringFromPython.getBytes("ISO-8859-1");
>         String realString = new String(encoded, encoding);
>         new PrintStream(System.out, true, encoding).println("From Java
> >From Jython: " + realString);
>     }
> }
>
> which prints out the Japanese String directly from Java, from Jython,
> and then in Java again in a call from Jython.  There's some weird
> stuff going on in there, so it's probably worth examining a few of the
> bits more closely.
>
> First, I set an encoding I'm going to use for printing to the console
> from Java and for sending Strings into Jython.  On my Mac, the console
> uses UTF-8, so I use that as the encoding, but you'll need to get the
> encoding of whatever terminal you're using expects and use that
> instead.
>
> With that encoding, I print a Japanese String to the console from Java
> just to make sure things are hunky-dory at a base level.  I then make
> an InteractiveInterpreter and some Python source to run in it.  The
> Python source runs the assignment, prints the assign variable and then
> calls back into the Test class with that variable.  I encode the
> String into bytes using the console encoding, and then I turn it back
> into a String for use in InteractiveInterpreter.runsource.  This is a
> slightly bizarre use of Strings and Charsets.  It uses the fact that
> ISO-8859-1 is a direct mapping between its byte and char
> representation to make a String out of the encoded bytes.  This lets
> the encoded representation pass into the interpreter unmolested.  With
> that encoded string, the Python source's print of the variable works
> properly as it's a str already encoded in the console's encoding, and
> doesn't pass through Jython's default encoding.
>
> Finally, Jython calls back into Test.print with the value from a in
> the Python source.  This is still an encoded Python str, so I use the
> same ISO-8859-1 trick in reverse to get the encoded bytes out, and
> turn those bytes back into a String with its constructor that takes an
> encoding.  With a real Java String again, I'm able to print the value
> from Java.
>
> This isn't the prettiest of solutions, but it's the only way I can
> think of to make this work without changing the underlying source to
> use unicode literals.  If you do have some leeway on that, I'd
> recommend going that way, but if you're stuck with the encoded source,
> I believe this will work.
>
>

Re: [Jython-users] Help for issue 1183 Jython 2.2.1 cannot pass unicode to a func in a py file

From: Charlie G. <cha...@gm...> - 2009-01-13 17:56:29

On Mon, Jan 12, 2009 at 1:45 PM, Rose Pan <ros...@or...> wrote:
> Hi, Charlie,
>
> Thanks for the detail explanation. I replaced the encoding to "euc_jp" which
> my terminal is using and tried out the sample again. It works when printing
> out in java directly. But it does not work when printing out in Jython and
> the java in the call from Jython. It works Here is the result:
>
> From Java: a = '\u4f7f\u7528'
> From Jython: (
> From Java From Jython: ??

I'm not sure what's going on without being able to reproduce your
terminal setup.  It works for me if I set my console's encoding to
euc_jp along with the encoding in the test Java file.  Is your result
really printing the escaped unicode values instead of rendering single
characters?  That seems like its broken in Java before it even gets to
the Jython.

Re: [Jython-users] Help for issue 1183 Jython 2.2.1 cannot pass unicode to a func in a py file

From: Rose P. <ros...@or...> - 2009-01-12 22:45:30

Hi, Charlie,

Also the encoding "euc_jp" is not supported in Jython 2.2.1 yet. We 
can't run the sys.getdefaultencoding("euc_jp") in a py file for now and 
this causes us not able to print out u'\u4F7F\u7528' on the console 
correctly.

Any other solution?

Thanks,
Rose
Rose Pan wrote:
> Hi, Charlie,
>
> Thanks for the detail explanation. I replaced the encoding to "euc_jp" 
> which my terminal is using and tried out the sample again. It works when 
> printing out in java directly. But it does not work when printing out in 
> Jython and the java in the call from Jython. It works Here is the result:
>
>  From Java: a = '\u4f7f\u7528'
>  From Jython: (
>  From Java From Jython: ??
>
> Thanks,
> Rose
>
>
> Charlie Groves wrote:
>   
>> On Sun, Jan 11, 2009 at 12:49 AM, Rose Pan <ros...@or...> wrote:
>>   
>>     
>>> Jython 2.2.1 on java1.6.0_05
>>> Type "copyright", "credits" or "license" for more information.
>>>     
>>>       
>>>>>> execfile("test.py")
>>>>>> testdo(u'\u4F7F\u7528')
>>>>>>           
>>>>>>             
>>> u'\u4F7F\u7528'
>>>     
>>>       
>>>>>> a=u'\u4F7F\u7528'
>>>>>> print a
>>>>>>           
>>>>>>             
>>> Traceback (innermost last):
>>> File "<console>", line 1, in ?
>>> UnicodeError: ascii encoding error: ordinal not in range(128)
>>>
>>> The print command in Jython only print the string with the format like
>>> "'\xBB\xC8\xCD\xD1'".
>>>     
>>>       
>> This happens because the jython's default encoding is ascii, and
>> that's what it uses to encode things through print.  If you call
>> sys.setdefaultencoding(<your console's encoding>) before this, jython
>> will print properly.
>>
>>   
>>     
>>> 1. Is there a way to handle both cases (setting variables and calling
>>> functions) when embedding InteractiveInterpreter in java?
>>>     
>>>       
>> I'm not sure what you're asking here.  The cases are setting a
>> variable with a unicode string and calling a function with that string
>> from an embedded InteractiveInterpreter?  I don't understand how
>> that's different than running a script directly or by using jython at
>> the console.
>>
>>   
>>     
>>> 2. Since the unicode characters read from java can not be directly
>>> passed to InteractiveInterpreter.runsource(), it has to convert to
>>> jython unicode string. Is there a convenient method in Jython to convert
>>> java string into jython unicode string that we can call in java code, so
>>> the 'u' can be prepended at the beginning of the multi-byte string, not
>>> the beginning of the whole string?
>>>     
>>>       
>> Jython has no builtin way to convert str literals to unicode literals.
>>  However, you can encode the Java String source you're passing in to
>> the interpreter, and then decode the Strings that come out of Jython
>> into your Java code.  As long as your users aren't writing the Java
>> themselves, nothing on their end will need to change.  Here's an
>> example of that:
>>
>> import java.io.PrintStream;
>> import java.nio.charset.Charset;
>>
>> import org.python.util.InteractiveInterpreter;
>>
>> public class Test
>> {
>>     static String encoding = "UTF-8";
>>
>>     public static void main (String[] args)
>>         throws Exception
>>     {
>>         String unicode = "a = '\u4F7F\u7528'";
>>         new PrintStream(System.out, true, encoding).println("From
>> Java: " + unicode);
>>         InteractiveInterpreter interp = new InteractiveInterpreter();
>>         String source = unicode + "; print 'From Jython: %s' % a;
>> import Test; Test.print(a)";
>>         byte[] encoded = source.getBytes(encoding);
>>         String encodedSourceInString = new String(encoded,
>> Charset.forName("ISO-8859-1"));
>>         interp.runsource(encodedSourceInString);
>>     }
>>
>>     public static void print (String encodedStringFromPython)
>>         throws Exception
>>     {
>>         byte[] encoded = encodedStringFromPython.getBytes("ISO-8859-1");
>>         String realString = new String(encoded, encoding);
>>         new PrintStream(System.out, true, encoding).println("From Java
>> >From Jython: " + realString);
>>     }
>> }
>>
>> which prints out the Japanese String directly from Java, from Jython,
>> and then in Java again in a call from Jython.  There's some weird
>> stuff going on in there, so it's probably worth examining a few of the
>> bits more closely.
>>
>> First, I set an encoding I'm going to use for printing to the console
>> from Java and for sending Strings into Jython.  On my Mac, the console
>> uses UTF-8, so I use that as the encoding, but you'll need to get the
>> encoding of whatever terminal you're using expects and use that
>> instead.
>>
>> With that encoding, I print a Japanese String to the console from Java
>> just to make sure things are hunky-dory at a base level.  I then make
>> an InteractiveInterpreter and some Python source to run in it.  The
>> Python source runs the assignment, prints the assign variable and then
>> calls back into the Test class with that variable.  I encode the
>> String into bytes using the console encoding, and then I turn it back
>> into a String for use in InteractiveInterpreter.runsource.  This is a
>> slightly bizarre use of Strings and Charsets.  It uses the fact that
>> ISO-8859-1 is a direct mapping between its byte and char
>> representation to make a String out of the encoded bytes.  This lets
>> the encoded representation pass into the interpreter unmolested.  With
>> that encoded string, the Python source's print of the variable works
>> properly as it's a str already encoded in the console's encoding, and
>> doesn't pass through Jython's default encoding.
>>
>> Finally, Jython calls back into Test.print with the value from a in
>> the Python source.  This is still an encoded Python str, so I use the
>> same ISO-8859-1 trick in reverse to get the encoded bytes out, and
>> turn those bytes back into a String with its constructor that takes an
>> encoding.  With a real Java String again, I'm able to print the value
>> from Java.
>>
>> This isn't the prettiest of solutions, but it's the only way I can
>> think of to make this work without changing the underlying source to
>> use unicode literals.  If you do have some leeway on that, I'd
>> recommend going that way, but if you're stuck with the encoded source,
>> I believe this will work.
>>
>>   
>>     
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> SourcForge Community
> SourceForge wants to tell your story.
> http://p.sf.net/sfu/sf-spreadtheword
> _______________________________________________
> Jython-users mailing list
> Jyt...@li...
> https://lists.sourceforge.net/lists/listinfo/jython-users
>
>

Re: [Jython-users] Help for issue 1183 Jython 2.2.1 cannot pass unicode to a func in a py file

From: Charlie G. <cha...@gm...> - 2009-01-13 18:00:14

On Mon, Jan 12, 2009 at 2:45 PM, Rose Pan <ros...@or...> wrote:
> Hi, Charlie,
>
> Also the encoding "euc_jp" is not supported in Jython 2.2.1 yet. We can't
> run the sys.getdefaultencoding("euc_jp") in a py file for now and this
> causes us not able to print out u'\u4F7F\u7528' on the console correctly.
>
> Any other solution?

It's available from the 2.2 branch in subversion.  You'll need to add
two files to your Lib/encodings directory in Jython:
http://fisheye3.atlassian.com/browse/~raw,r=3747/jython/branches/Release_2_2maint/jython/Lib/encodings/euc_jp.py
and http://fisheye3.atlassian.com/browse/~raw,r=3747/jython/branches/Release_2_2maint/jython/Lib/encodings/java_charset_wrapper.py

Re: [Jython-users] Help for issue 1183 Jython 2.2.1 cannot pass unicode to a func in a py file

From: Rose P. <ros...@or...> - 2008-12-18 15:28:45

Hi,

More info on Jython 2.2.1.

Setting a property -Dpython.console.encoding=EUC_JP_LINUX  does not help 
to get the correct unicode.

In our java code, if we use the following method to copy the string 
"\xBB\xC8\xCD\xD1" to byte array after Jython returned the value from 
running the py file, then the java System.out.println() can print the 
correct multi-byte string on the console.

public static byte[] to_bytes(String s) {
        int len = s.length();
        byte[] b = new byte[len];
        s.getBytes(0, len, b, 0);  <-- Copies characters from this 
string into the destination byte array.
                                                    Each byte receives 
the 8 low-order bits of the corresponding character.
                                                    The eight high-order 
bits of each character are not copied and do not participate in the 
transfer in any way.
        return b;
    }

But with this workaround, we have to transfer every String returned from 
the py files to the byte array using the method above. This is not 
acceptable as we have more than 100 functions defined in the py files 
and each function has multiple parameters of type String.

Has anybody encountered the same issue? I think this is a very common 
problem for Jython as Jython is now used world widely.

Any help / comments would be really appreciated.

Thanks,
Rose

Rose Pan wrote:
> Hi, Jython gurus:
>
> I need some help on running Jython 2.2.1 with multi-byte strings.
>
> Jython 2.2.1 cannot pass a unicode String correctly to a function 
> defined in a py script. The value of the parameter is converted to 
> different \x format.
> This is not happened in Jython 2.1.
>
> To reproduce it, define a py script, test.py file. The test.py file 
> defines a function called create() which simply returns the value of the 
> parameter:
>
> ======= start of test.py   ======
> def create(name):
>     return name
> ======= end of test.py  =====
>
> Then start Jython 2.1 and run the function create() from the py file:
>
> java -classpath jython.jar.2.1 org.python.util.jython
> Jython 2.1 on java1.6.0_05 (JIT: null)
>
> execfile("test.py")
> create('\u4f7f\u7528')  <-- input Japanese characters
> u'\u4F7F\u7528'             <-- return the same unicode representing the
>                               Japanese characters with length 2
>
>
> We can see the output of create function returns a two-byte unicode, 
> which can be displayed correctly by Java System.out.println() method.
>
> Then we try Jython 2.2.1 with the same step:
>
> java -classpath jython.jar.2.2.1 org.python.util.jython
> Jython 2.2.1 on java1.6.0_05
>
> execfile("test.py")
> create('\u4f7f\u7528')   <-- input Japanese characters
> '\xBB\xC8\xCD\xD1'           <-- returns different values with length 4.
>
> The \xBB\xC8\xCD\xD1 are not recognized by java so we always get "????" 
> if use System.out.println() to print.
>
> This is a regression for Jython 2.2.1.
>
> This is going to affect all the customer written py files. Is there a 
> workaround for this in Jython? Jython 2.5 seems to have the same issue.
>
> Thanks,
> Rose
>
>
>
>
>
> ------------------------------------------------------------------------------
> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
> The future of the web can't happen without you.  Join us at MIX09 to help
> pave the way to the Next Web now. Learn more and register at
> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
> _______________________________________________
> Jython-users mailing list
> Jyt...@li...
> https://lists.sourceforge.net/lists/listinfo/jython-users
>
>

Re: [Jython-users] Help for issue 1183 Jython 2.2.1 cannot pass unicode to a func in a py file

From: Charlie G. <cha...@gm...> - 2009-01-10 12:31:12

On Thu, Jan 8, 2009 at 12:39 PM, Peter Bower <pet...@or...> wrote:
> Given the following scenario:
>
> 1) assign a Japanese literal to a variable (in console or in py file)
>
> 2) print the variable
>
> 3) pass the variable to a java method

The following Python module:

j = u'\u521d\u671f'
for c in j:
    print ord(c)
import sys
sys.setdefaultencoding('utf-8')
print j
import Test
Test.print(j)

and Java class:

import java.io.PrintStream;
import java.io.UnsupportedEncodingException;

public class Test
{
    public static void print (String val) throws UnsupportedEncodingException
    {
        for (int i = 0; i < val.length(); i++) {
            System.out.println((int)val.charAt(i));
        }
        PrintStream utf8Stream = new PrintStream(System.out, true, "UTF-8");
        utf8Stream.println(val);
    }
}

prints

21021
26399
初期
21021
26399
初期

on my terminal in Mac OS X(the third and sixth line may be garbled in
this email, but they actually print out as the characters represented
by \u521d\u671f,  I swear:).

> In Jython 2.1, it was very simple
>
>     name = "<Japanese characters>"
>
>     print name
>
>     test.create(name)
>
> Everything works, it prints correctly, and the Java method gets the expected
> string. The model appears
> to be:
>
>     - Literals are read with the default character set (or that of the
> console encoding)
>
>     - Strings can flow from Jython to Java and back without requiring
> conversion
>
>     - String are printed using the String.getBytes() method which encodes
> using default character set

This actually doesn't work in all cases, and is one of the reasons
this was changed for 2.2.  Java's default character doesn't always
match the encoding of the console it's using e.g. the default encoding
is MacRoman on Mac OS X, but the console uses utf-8 by default.
That's why my Java source above makes its own PrintStream.  System.out
uses MacRoman and doesn't print properly to the console.  This was
particularly troublesome as Jython would read source files in the
default encoding on one system, and if that source file was used on a
system with a different default encoding, it would either explode or
produce gibberish when the differences in encoding were encountered.

The bigger reason for the change was to better conform to Python's
Unicode model.  Python has two "String" types, str and unicode.  str
is a byte string and is created by unadorned quotes.  unicode is a
sequence of unicode characters like Java's String and is created by
prepending a u to the quotes.  Allowing unicode characters in str as
Jython 2.1 did lead to mismatches between CPython and Jython's models,
and caused the unicode values in the strings to be truncated when
various str operations were performed.  Whenever you have character
data, you want unicode objects and strings created with u''.

> The 2.2.1 model appears to be
>
>     - literals are read with the ISO-8859-1 character set from .py files and
> by default in the console.
>
>     - they flow from Jython to Java as is
>
>     - strings are printed using the raw bytes (PyString.to_bytes())

This is correct.  The encoding used to read from the interpreter is
controlled with python.console.encoding, but otherwise things are
assumed to be raw byte values.  There's no way to have encoded unicode
values in source files in Python 2.2.  That was added by PEP 263 in
Python 2.3.  The only way to make unicode literals in 2.2 is with \u
values for characters outside the ascii character set.

> - is u' required? Does Jython 2.2.1 continue to support the non u' format?
> Or should
>   unicode("japanese characters", "jp charset") be used instead (if a jp
> charset was available)?

Either u or calling unicode will work.  If you have a large body of
existing source, you can use something like the native2ascii tool that
comes with Java to convert the encoded Japanese values into unicode
escapes.  If you need to do it dynamically, something like
http://www.google.com/codesearch/p?hl=en#MzR-vajYaSo/kaffe-1.1.5/libraries/javalib/gnu/classpath/tools/native2ascii/Native2ASCII.java
would work.

> - should print <unicode variable> work out of the box? Or do we [and
> customers] need to set the default encoding?

Yes, you'll need to set Python's default encoding to the encoding that
the console uses.  I don't know of a way to do this across the Java
platform.  System.getProperty("file.encoding") returns Java's default
encoding, but that doesn't always line up with what the console
expects.

> - what character set should Java methods expect the string to be in:
> ("ISO-8859-1", the
>   default character set, or something else)?

If you've got a unicode value in Python, the String will consist of
the same unicode characters and no encoding is needed.  If you have a
str of encoded characters , the String will consist of chars of the
same the same length in whatever encoding the str came in as.

I'm sorry this transition is proving to be so painful; Jython's
support for unicode was pretty broken in 2.1, and it'll finally work
decently in 2.5 with the addition of PEP 263.