Re: [Jython-users] Help for issue 1183 Jython 2.2.1 cannot pass unicode to a func in a py file

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

More info on Jython 2.2.1.

Setting a property -Dpython.console.encoding=EUC_JP_LINUX  does not help 
to get the correct unicode.

In our java code, if we use the following method to copy the string 
"\xBB\xC8\xCD\xD1" to byte array after Jython returned the value from 
running the py file, then the java System.out.println() can print the 
correct multi-byte string on the console.

public static byte[] to_bytes(String s) {
        int len = s.length();
        byte[] b = new byte[len];
        s.getBytes(0, len, b, 0);  <-- Copies characters from this 
string into the destination byte array.
                                                    Each byte receives 
the 8 low-order bits of the corresponding character.
                                                    The eight high-order 
bits of each character are not copied and do not participate in the 
transfer in any way.
        return b;
    }

But with this workaround, we have to transfer every String returned from 
the py files to the byte array using the method above. This is not 
acceptable as we have more than 100 functions defined in the py files 
and each function has multiple parameters of type String.

Has anybody encountered the same issue? I think this is a very common 
problem for Jython as Jython is now used world widely.

Any help / comments would be really appreciated.

Thanks,
Rose

Rose Pan wrote:
> Hi, Jython gurus:
>
> I need some help on running Jython 2.2.1 with multi-byte strings.
>
> Jython 2.2.1 cannot pass a unicode String correctly to a function 
> defined in a py script. The value of the parameter is converted to 
> different \x format.
> This is not happened in Jython 2.1.
>
> To reproduce it, define a py script, test.py file. The test.py file 
> defines a function called create() which simply returns the value of the 
> parameter:
>
> ======= start of test.py   ======
> def create(name):
>     return name
> ======= end of test.py  =====
>
> Then start Jython 2.1 and run the function create() from the py file:
>
> java -classpath jython.jar.2.1 org.python.util.jython
> Jython 2.1 on java1.6.0_05 (JIT: null)
>
> execfile("test.py")
> create('\u4f7f\u7528')  <-- input Japanese characters
> u'\u4F7F\u7528'             <-- return the same unicode representing the
>                               Japanese characters with length 2
>
>
> We can see the output of create function returns a two-byte unicode, 
> which can be displayed correctly by Java System.out.println() method.
>
> Then we try Jython 2.2.1 with the same step:
>
> java -classpath jython.jar.2.2.1 org.python.util.jython
> Jython 2.2.1 on java1.6.0_05
>
> execfile("test.py")
> create('\u4f7f\u7528')   <-- input Japanese characters
> '\xBB\xC8\xCD\xD1'           <-- returns different values with length 4.
>
> The \xBB\xC8\xCD\xD1 are not recognized by java so we always get "????" 
> if use System.out.println() to print.
>
> This is a regression for Jython 2.2.1.
>
> This is going to affect all the customer written py files. Is there a 
> workaround for this in Jython? Jython 2.5 seems to have the same issue.
>
> Thanks,
> Rose
>
>
>
>
>
> ------------------------------------------------------------------------------
> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
> The future of the web can't happen without you.  Join us at MIX09 to help
> pave the way to the Next Web now. Learn more and register at
> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
> _______________________________________________
> Jython-users mailing list
> Jyt...@li...
> https://lists.sourceforge.net/lists/listinfo/jython-users
>
>