From: Rose P. <ros...@or...> - 2008-12-18 15:28:45
|
Hi, More info on Jython 2.2.1. Setting a property -Dpython.console.encoding=EUC_JP_LINUX does not help to get the correct unicode. In our java code, if we use the following method to copy the string "\xBB\xC8\xCD\xD1" to byte array after Jython returned the value from running the py file, then the java System.out.println() can print the correct multi-byte string on the console. public static byte[] to_bytes(String s) { int len = s.length(); byte[] b = new byte[len]; s.getBytes(0, len, b, 0); <-- Copies characters from this string into the destination byte array. Each byte receives the 8 low-order bits of the corresponding character. The eight high-order bits of each character are not copied and do not participate in the transfer in any way. return b; } But with this workaround, we have to transfer every String returned from the py files to the byte array using the method above. This is not acceptable as we have more than 100 functions defined in the py files and each function has multiple parameters of type String. Has anybody encountered the same issue? I think this is a very common problem for Jython as Jython is now used world widely. Any help / comments would be really appreciated. Thanks, Rose Rose Pan wrote: > Hi, Jython gurus: > > I need some help on running Jython 2.2.1 with multi-byte strings. > > Jython 2.2.1 cannot pass a unicode String correctly to a function > defined in a py script. The value of the parameter is converted to > different \x format. > This is not happened in Jython 2.1. > > To reproduce it, define a py script, test.py file. The test.py file > defines a function called create() which simply returns the value of the > parameter: > > ======= start of test.py ====== > def create(name): > return name > ======= end of test.py ===== > > Then start Jython 2.1 and run the function create() from the py file: > > java -classpath jython.jar.2.1 org.python.util.jython > Jython 2.1 on java1.6.0_05 (JIT: null) > > execfile("test.py") > create('\u4f7f\u7528') <-- input Japanese characters > u'\u4F7F\u7528' <-- return the same unicode representing the > Japanese characters with length 2 > > > We can see the output of create function returns a two-byte unicode, > which can be displayed correctly by Java System.out.println() method. > > Then we try Jython 2.2.1 with the same step: > > java -classpath jython.jar.2.2.1 org.python.util.jython > Jython 2.2.1 on java1.6.0_05 > > execfile("test.py") > create('\u4f7f\u7528') <-- input Japanese characters > '\xBB\xC8\xCD\xD1' <-- returns different values with length 4. > > The \xBB\xC8\xCD\xD1 are not recognized by java so we always get "????" > if use System.out.println() to print. > > This is a regression for Jython 2.2.1. > > This is going to affect all the customer written py files. Is there a > workaround for this in Jython? Jython 2.5 seems to have the same issue. > > Thanks, > Rose > > > > > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. > The future of the web can't happen without you. Join us at MIX09 to help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > Jython-users mailing list > Jyt...@li... > https://lists.sourceforge.net/lists/listinfo/jython-users > > |