From: Rose P. <ros...@or...> - 2009-01-12 22:27:21
|
Hi, Charlie, Thanks for the detail explanation. I replaced the encoding to "euc_jp" which my terminal is using and tried out the sample again. It works when printing out in java directly. But it does not work when printing out in Jython and the java in the call from Jython. It works Here is the result: From Java: a = '\u4f7f\u7528' From Jython: ( From Java From Jython: ?? Thanks, Rose Charlie Groves wrote: > On Sun, Jan 11, 2009 at 12:49 AM, Rose Pan <ros...@or...> wrote: > >> Jython 2.2.1 on java1.6.0_05 >> Type "copyright", "credits" or "license" for more information. >> >>>>> execfile("test.py") >>>>> testdo(u'\u4F7F\u7528') >>>>> >> u'\u4F7F\u7528' >> >>>>> a=u'\u4F7F\u7528' >>>>> print a >>>>> >> Traceback (innermost last): >> File "<console>", line 1, in ? >> UnicodeError: ascii encoding error: ordinal not in range(128) >> >> The print command in Jython only print the string with the format like >> "'\xBB\xC8\xCD\xD1'". >> > > This happens because the jython's default encoding is ascii, and > that's what it uses to encode things through print. If you call > sys.setdefaultencoding(<your console's encoding>) before this, jython > will print properly. > > >> 1. Is there a way to handle both cases (setting variables and calling >> functions) when embedding InteractiveInterpreter in java? >> > > I'm not sure what you're asking here. The cases are setting a > variable with a unicode string and calling a function with that string > from an embedded InteractiveInterpreter? I don't understand how > that's different than running a script directly or by using jython at > the console. > > >> 2. Since the unicode characters read from java can not be directly >> passed to InteractiveInterpreter.runsource(), it has to convert to >> jython unicode string. Is there a convenient method in Jython to convert >> java string into jython unicode string that we can call in java code, so >> the 'u' can be prepended at the beginning of the multi-byte string, not >> the beginning of the whole string? >> > > Jython has no builtin way to convert str literals to unicode literals. > However, you can encode the Java String source you're passing in to > the interpreter, and then decode the Strings that come out of Jython > into your Java code. As long as your users aren't writing the Java > themselves, nothing on their end will need to change. Here's an > example of that: > > import java.io.PrintStream; > import java.nio.charset.Charset; > > import org.python.util.InteractiveInterpreter; > > public class Test > { > static String encoding = "UTF-8"; > > public static void main (String[] args) > throws Exception > { > String unicode = "a = '\u4F7F\u7528'"; > new PrintStream(System.out, true, encoding).println("From > Java: " + unicode); > InteractiveInterpreter interp = new InteractiveInterpreter(); > String source = unicode + "; print 'From Jython: %s' % a; > import Test; Test.print(a)"; > byte[] encoded = source.getBytes(encoding); > String encodedSourceInString = new String(encoded, > Charset.forName("ISO-8859-1")); > interp.runsource(encodedSourceInString); > } > > public static void print (String encodedStringFromPython) > throws Exception > { > byte[] encoded = encodedStringFromPython.getBytes("ISO-8859-1"); > String realString = new String(encoded, encoding); > new PrintStream(System.out, true, encoding).println("From Java > >From Jython: " + realString); > } > } > > which prints out the Japanese String directly from Java, from Jython, > and then in Java again in a call from Jython. There's some weird > stuff going on in there, so it's probably worth examining a few of the > bits more closely. > > First, I set an encoding I'm going to use for printing to the console > from Java and for sending Strings into Jython. On my Mac, the console > uses UTF-8, so I use that as the encoding, but you'll need to get the > encoding of whatever terminal you're using expects and use that > instead. > > With that encoding, I print a Japanese String to the console from Java > just to make sure things are hunky-dory at a base level. I then make > an InteractiveInterpreter and some Python source to run in it. The > Python source runs the assignment, prints the assign variable and then > calls back into the Test class with that variable. I encode the > String into bytes using the console encoding, and then I turn it back > into a String for use in InteractiveInterpreter.runsource. This is a > slightly bizarre use of Strings and Charsets. It uses the fact that > ISO-8859-1 is a direct mapping between its byte and char > representation to make a String out of the encoded bytes. This lets > the encoded representation pass into the interpreter unmolested. With > that encoded string, the Python source's print of the variable works > properly as it's a str already encoded in the console's encoding, and > doesn't pass through Jython's default encoding. > > Finally, Jython calls back into Test.print with the value from a in > the Python source. This is still an encoded Python str, so I use the > same ISO-8859-1 trick in reverse to get the encoded bytes out, and > turn those bytes back into a String with its constructor that takes an > encoding. With a real Java String again, I'm able to print the value > from Java. > > This isn't the prettiest of solutions, but it's the only way I can > think of to make this work without changing the underlying source to > use unicode literals. If you do have some leeway on that, I'd > recommend going that way, but if you're stuck with the encoded source, > I believe this will work. > > |