From: Pāvils J. <pas...@gm...> - 2013-10-02 21:42:36
|
Hello, I use Jython as a scripting solution for a Java app. The Python code that's executed in Jython is read from a Unicode database field. So, before the interp.exec(myPyCode) call the code is stored in a Unicode string. Unfortunately I found that all three functions I need are failing if some Unicode characters are involved: 1) Using unicode characters in the Python code: PythonInterpreter interp = new PythonInterpreter(null, new PySystemState()); interp.exec("print ord(\"ā\")"); gives ordinal of "?", or 63, while 257 was expected. How to pass a string that contains Unicode characters to the interpreter without messing up them? (not sure if the special character "ā" will survive the mailing list. It's a unicode character with ordinal 257, you can easily get it in Java by this: String uniStr = Character.toString((char) 257); 2) Entering unicode literals with escaping: interp.exec("print u\"\\u0101\""); This code causes this Python error: UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in position 0: ordinal not in range(128) So, what would be the correct way of instantiating the interpreter so that the Unicode literals would be happily processed, just as they are processed with Python interpreter in Ubuntu? 3) Reading unicode characters at standard output I can't seem to find a way how the code executed by Jython could pass Unicode characters nicely to the standard output. I have my custom OutputStream class' write(int code) method waiting for data (to store it on Unicode string), but it never receives anything with charcode above 127. Maybe the problem is that I never really managed to get any Unicode string working in the Jython interpreter firsthand. Unfortunately, I work in a country where there are plenty of Unicode characters in frequent use, so I can't really cope with this problem by using only ANSI table. Thanks for helping in advance! |
From: Jim B. <jb...@zy...> - 2013-10-03 01:16:16
|
The API of PythonInterpreter probably needs to be reviewed more closely, since it is surprising without a deep understanding of Jython internals and how it passes the arg of exec through. But with that said, here's how to get #1 and #2 resolved. I don't have enough info to understand precisely your requirement for #3, but it probably has a similar requirement to carefully follow encoding/decoding. So with this test code: import org.python.core.Py; import org.python.util.PythonInterpreter; public class TestUnicode { public static void main(String[] args) { try { PythonInterpreter runtime = new PythonInterpreter(); runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint u'ā'")).encode("UTF-8")); runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint ord(u'ā')")).encode("UTF-8")); runtime.exec("print u'\\u0101'"); } catch (Exception ex) { System.err.println("Exception: " + ex); } } } I get the following output: $ java TestUnicode ā 257 ā which should be what you wanted. On Wed, Oct 2, 2013 at 3:42 PM, Pāvils Jurjāns <pas...@gm...> wrote: > Hello, > > I use Jython as a scripting solution for a Java app. The Python code > that's executed in Jython is read from a Unicode database field. So, before > the interp.exec(myPyCode) call the code is stored in a Unicode string. > > Unfortunately I found that all three functions I need are failing if some > Unicode characters are involved: > > 1) Using unicode characters in the Python code: > > PythonInterpreter interp = new PythonInterpreter(null, new > PySystemState()); > interp.exec("print ord(\"ā\")"); > > gives ordinal of "?", or 63, while 257 was expected. How to pass a string > that contains Unicode characters to the interpreter without messing up them? > > (not sure if the special character "ā" will survive the mailing list. It's > a unicode character with ordinal 257, you can easily get it in Java by this: > String uniStr = Character.toString((char) 257); > > 2) Entering unicode literals with escaping: > interp.exec("print u\"\\u0101\""); > > This code causes this Python error: > UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in > position 0: ordinal not in range(128) > > So, what would be the correct way of instantiating the interpreter so that > the Unicode literals would be happily processed, just as they are processed > with Python interpreter in Ubuntu? > > 3) Reading unicode characters at standard output > > I can't seem to find a way how the code executed by Jython could pass > Unicode characters nicely to the standard output. I have my custom > OutputStream class' write(int code) method waiting for data (to store it on > Unicode string), but it never receives anything with charcode above 127. > Maybe the problem is that I never really managed to get any Unicode string > working in the Jython interpreter firsthand. > > Unfortunately, I work in a country where there are plenty of Unicode > characters in frequent use, so I can't really cope with this problem by > using only ANSI table. > > Thanks for helping in advance! > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Jython-users mailing list > Jyt...@li... > https://lists.sourceforge.net/lists/listinfo/jython-users > > |
From: Pāvils J. <pas...@gm...> - 2013-10-03 08:46:19
|
Jim, Thanks for showing the approach (Py.newUnicode("# encoding=UTF-8\n< my unicode code here >")).encode("UTF-8") This let me pass these tests: print "ā" print ord(u"ā") However, this test print u"\u0101" still throws Python exception: UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in position 0: ordinal not in range(128) Maybe that's not such a big deal, as I now have enough functionality to use Unicode characters in my code directly, but for the sake of completion fixing that wouldn't hurt. Once I've managed to get the Unicode chars over the fence to the Jython interpreter, I also could capture the standard output. The characters arrive sort of utf-8 encoded, but strangely, the calls to OutputStream.write(int char) come with a negative value, so I have to fix it by adding 256. Here's the full code of my OutputStream class used to capture Jython Unicode output: import java.io.OutputStream; import java.io.ByteArrayOutputStream; import java.io.UnsupportedEncodingException; public class MyOutputStream extends OutputStream { private ByteArrayOutputStream buffer; private Logger l; public MyOutputStream (Logger logger) { buffer = new ByteArrayOutputStream(); } public void write(int b) { if (b < 0) { b += 256; } if (b > 0xff) { b = b & 0xff; } if (b == 0) { flush(); } else { buffer.write(b); } } public void flush() { String txt; try { txt = buffer.toString("utf-8"); } catch (UnsupportedEncodingException e) { txt = "#Encoding error#"; } // Use the value of txt for good buffer.reset(); } } Perhaps this code is somewhat hacky, but it works. Pavils On Thu, Oct 3, 2013 at 4:15 AM, Jim Baker <jb...@zy...> wrote: > The API of PythonInterpreter probably needs to be reviewed more closely, > since it is surprising without a deep understanding of Jython internals and > how it passes the arg of exec through. But with that said, here's how to > get #1 and #2 resolved. I don't have enough info to understand precisely > your requirement for #3, but it probably has a similar requirement to > carefully follow encoding/decoding. > > So with this test code: > > import org.python.core.Py; > import org.python.util.PythonInterpreter; > > public class TestUnicode { > > public static void main(String[] args) { > try { > PythonInterpreter runtime = new PythonInterpreter(); > runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint > u'ā'")).encode("UTF-8")); > runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint > ord(u'ā')")).encode("UTF-8")); > runtime.exec("print u'\\u0101'"); > } catch (Exception ex) { > System.err.println("Exception: " + ex); > } > } > > } > > I get the following output: > > $ java TestUnicode > ā > 257 > ā > > which should be what you wanted. > > > On Wed, Oct 2, 2013 at 3:42 PM, Pāvils Jurjāns <pas...@gm...> wrote: > >> Hello, >> >> I use Jython as a scripting solution for a Java app. The Python code >> that's executed in Jython is read from a Unicode database field. So, before >> the interp.exec(myPyCode) call the code is stored in a Unicode string. >> >> Unfortunately I found that all three functions I need are failing if some >> Unicode characters are involved: >> >> 1) Using unicode characters in the Python code: >> >> PythonInterpreter interp = new PythonInterpreter(null, new >> PySystemState()); >> interp.exec("print ord(\"ā\")"); >> >> gives ordinal of "?", or 63, while 257 was expected. How to pass a string >> that contains Unicode characters to the interpreter without messing up them? >> >> (not sure if the special character "ā" will survive the mailing list. >> It's a unicode character with ordinal 257, you can easily get it in Java by >> this: >> String uniStr = Character.toString((char) 257); >> >> 2) Entering unicode literals with escaping: >> interp.exec("print u\"\\u0101\""); >> >> This code causes this Python error: >> UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in >> position 0: ordinal not in range(128) >> >> So, what would be the correct way of instantiating the interpreter so >> that the Unicode literals would be happily processed, just as they are >> processed with Python interpreter in Ubuntu? >> >> 3) Reading unicode characters at standard output >> >> I can't seem to find a way how the code executed by Jython could pass >> Unicode characters nicely to the standard output. I have my custom >> OutputStream class' write(int code) method waiting for data (to store it on >> Unicode string), but it never receives anything with charcode above 127. >> Maybe the problem is that I never really managed to get any Unicode string >> working in the Jython interpreter firsthand. >> >> Unfortunately, I work in a country where there are plenty of Unicode >> characters in frequent use, so I can't really cope with this problem by >> using only ANSI table. >> >> Thanks for helping in advance! >> >> >> ------------------------------------------------------------------------------ >> October Webinars: Code for Performance >> Free Intel webinars can help you accelerate application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >> from >> the latest Intel processors and coprocessors. See abstracts and register > >> >> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk >> _______________________________________________ >> Jython-users mailing list >> Jyt...@li... >> https://lists.sourceforge.net/lists/listinfo/jython-users >> >> > |
From: Jeff A. <ja...@fa...> - 2013-10-04 08:08:11
|
This works for me the way it does for Jim on Linux Mint (a lot like Ubuntu), but on Windows I receive a similar error to Pa-vils. dev>jython Jython 2.7b1+ (default:f731a595b90a+, Oct 1 2013, 08:42:42) [Java HotSpot(TM) 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0_35 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> print sys.stdout.encoding Cp1252 >>> print u"\u0101" Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\Lib\encodings\cp1252.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeEncodeError: 'charmap' codec can't encode character u'\u0101' in position 0: character maps to <undefined> I can make it work (on Windows) by choosing cp1257, or 65001/UTF-8 at least for output. But we are dealing with Linux here, not Windows aren't we? The first problem is surely that the stream encoding is set to ascii: that's what the message in #2 means. So I'd look at whether the console itself has a UTF-8 locale (mine is en_GB.UTF-8), whether the terminal window encoding is consistent with that (on my system the gterm doesn't follow changes in locale), and whether Jython is picking this up correctly via the JVM (check sys.stdout.encoding). You can specify the encoding with -Dpython.console.encoding=UTF-8 on the command line, but the environment has to agree. Jeff Jeff Allen On 03/10/2013 09:45, Pa-vils Jurja-ns wrote: > Jim, > > Thanks for showing the approach (Py.newUnicode("# encoding=UTF-8\n< my > unicode code here >")).encode("UTF-8") > > This let me pass these tests: > > print "a-" > print ord(u"a-") > > However, this test > > print u"\u0101" > > still throws Python exception: > > UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in > position 0: ordinal not in range(128) > > Maybe that's not such a big deal, as I now have enough functionality > to use Unicode characters in my code directly, but for the sake of > completion fixing that wouldn't hurt. > > Once I've managed to get the Unicode chars over the fence to the > Jython interpreter, I also could capture the standard output. The > characters arrive sort of utf-8 encoded, but strangely, the calls to > OutputStream.write(int char) come with a negative value, so I have to > fix it by adding 256. Here's the full code of my OutputStream class > used to capture Jython Unicode output: > > import java.io.OutputStream; > import java.io.ByteArrayOutputStream; > import java.io.UnsupportedEncodingException; > > public class MyOutputStream extends OutputStream > { > private ByteArrayOutputStream buffer; > private Logger l; > public MyOutputStream (Logger logger) { > buffer = new ByteArrayOutputStream(); > } > public void write(int b) { > if (b < 0) { > b += 256; > } > if (b > 0xff) { > b = b & 0xff; > } > if (b == 0) { > flush(); > } else { > buffer.write(b); > } > } > public void flush() { > String txt; > try { > txt = buffer.toString("utf-8"); > } > catch (UnsupportedEncodingException e) { > txt = "#Encoding error#"; > } > // Use the value of txt for good > buffer.reset(); > } > } > > Perhaps this code is somewhat hacky, but it works. > > Pavils > > > On Thu, Oct 3, 2013 at 4:15 AM, Jim Baker <jb...@zy... > <mailto:jb...@zy...>> wrote: > > The API of PythonInterpreter probably needs to be reviewed more > closely, since it is surprising without a deep understanding of > Jython internals and how it passes the arg of exec through. But > with that said, here's how to get #1 and #2 resolved. I don't have > enough info to understand precisely your requirement for #3, but > it probably has a similar requirement to carefully follow > encoding/decoding. > > So with this test code: > > import org.python.core.Py <http://org.python.core.Py>; > import org.python.util.PythonInterpreter; > > public class TestUnicode { > > public static void main(String[] args) { > try { > PythonInterpreter runtime = new PythonInterpreter(); > runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint > u'a-'")).encode("UTF-8")); > runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint > ord(u'a-')")).encode("UTF-8")); > runtime.exec("print u'\\u0101'"); > } catch (Exception ex) { > System.err.println("Exception: " + ex); > } > } > > } > > I get the following output: > > $ java TestUnicode > a- > 257 > a- > > which should be what you wanted. > > > On Wed, Oct 2, 2013 at 3:42 PM, Pa-vils Jurja-ns > <pas...@gm... <mailto:pas...@gm...>> wrote: > > Hello, > > I use Jython as a scripting solution for a Java app. The > Python code that's executed in Jython is read from a Unicode > database field. So, before the interp.exec(myPyCode) call the > code is stored in a Unicode string. > > Unfortunately I found that all three functions I need are > failing if some Unicode characters are involved: > > 1) Using unicode characters in the Python code: > > PythonInterpreter interp = new PythonInterpreter(null, new > PySystemState()); > interp.exec("print ord(\"a-\")"); > > gives ordinal of "?", or 63, while 257 was expected. How to > pass a string that contains Unicode characters to the > interpreter without messing up them? > > (not sure if the special character "a-" will survive the > mailing list. It's a unicode character with ordinal 257, you > can easily get it in Java by this: > String uniStr = Character.toString((char) 257); > > 2) Entering unicode literals with escaping: > interp.exec("print u\"\\u0101\""); > > This code causes this Python error: > UnicodeEncodeError: 'ascii' codec can't encode character > u'\u0101' in position 0: ordinal not in range(128) > > So, what would be the correct way of instantiating the > interpreter so that the Unicode literals would be happily > processed, just as they are processed with Python interpreter > in Ubuntu? > > 3) Reading unicode characters at standard output > > I can't seem to find a way how the code executed by Jython > could pass Unicode characters nicely to the standard output. I > have my custom OutputStream class' write(int code) method > waiting for data (to store it on Unicode string), but it never > receives anything with charcode above 127. Maybe the problem > is that I never really managed to get any Unicode string > working in the Jython interpreter firsthand. > > Unfortunately, I work in a country where there are plenty of > Unicode characters in frequent use, so I can't really cope > with this problem by using only ANSI table. > > Thanks for helping in advance! > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application > performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. > Get the most from > the latest Intel processors and coprocessors. See abstracts > and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Jython-users mailing list > Jyt...@li... > <mailto:Jyt...@li...> > https://lists.sourceforge.net/lists/listinfo/jython-users > > > > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > > > _______________________________________________ > Jython-users mailing list > Jyt...@li... > https://lists.sourceforge.net/lists/listinfo/jython-users |
From: Jeff A. <ja...@fa...> - 2013-10-07 22:58:18
|
I think we're not far off now ... On 06/10/2013 00:30, Pavils Jurjans wrote: > You are right. Of course, there is encoding involved between the > Jython interpreter and output stream. I used your advice to look up > the sys.stdout.encoding, but I don't seem to get any progress there: > running the code below gives me "sys.stdout.encoding = None". > > sys.stdout = codecs.getwriter("utf-8")(sys.stdout) > print "sys.stdout.encoding = %s" % sys.stdout.encoding > > So, in fact, my previous test of > > print "ā" > > is a false success - Jython treats that "ā" literal as two byte string > (ie, utf-8 version of "ā"), passes it to output stream where those two > bytes are converted back to "ā", just to mislead me. > Of course! I couldn't understand why MyOutputStream was working for you at all, but it is because your "ā" is still a byte-string. Presumably len( "ā")==2. > I can process the unicode values: > > a = u"ā" > print ord(a) > > But send the unicode character to output is possible only with some > processing: > > print u"ā" # Fails > print u"ā".encode("utf-8") # Success > > Seems that telling the Jython interpreter to automatically encode all > outgoing text using utf-8 encoding would be the solution. But how to > do that? > > When print comes to encode those characters (as in line "# Fails") it checks the encoding of the output stream. encoding==None means use the default, which is normally ascii, but can be set using org.python.core.codecs.setDefaultEncoding("utf-8"). Also, when you supplied your own OutputStream to setOut(), it was wrapped for you in a PyFile() that has the encoding property. You could supply your own PyFile that wraps (say) System.out. There's no constructor argument setting the encoding, but the attribute is public, writable from Java. On my machine I get success with: PyFile myOut = new PyFile(System.out); myOut.encoding = "UTF-8"; runtime.setOut(myOut); runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint u'ā'")).encode("UTF-8")); runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint ord(u'ā')")).encode("UTF-8")); runtime.exec("print u'\\u0101'"); Jeff |
From: Pāvils J. <pas...@gm...> - 2013-10-09 20:56:47
|
Great, I can report two paths to success: Keeping the existing setOut(OutputStream) setting as it is, it really was enough to do org.python.core.codecs.setDefaultEncoding("utf-8") to make Jython tolerate print u"\u0101" It would be enough with that only, but I checked also your suggestion of passing to setOut a PyFile instance, to see if that gets rid of the UnicodeEncodeError exception and/or there is any change in the stream of bytes arriving to MyOutputStream, and it works like charm. So I perhaps will go the second path, to keep the Jython code clean and pure without ugly init code. Thanks, Jeff. Mission complete, case closed. Pavils On Tue, Oct 8, 2013 at 1:57 AM, Jeff Allen <ja...@fa...> wrote: > I think we're not far off now ... > > On 06/10/2013 00:30, Pavils Jurjans wrote: > > You are right. Of course, there is encoding involved between the > > Jython interpreter and output stream. I used your advice to look up > > the sys.stdout.encoding, but I don't seem to get any progress there: > > running the code below gives me "sys.stdout.encoding = None". > > > > sys.stdout = codecs.getwriter("utf-8")(sys.stdout) > > print "sys.stdout.encoding = %s" % sys.stdout.encoding > > > > So, in fact, my previous test of > > > > print "ā" > > > > is a false success - Jython treats that "ā" literal as two byte string > > (ie, utf-8 version of "ā"), passes it to output stream where those two > > bytes are converted back to "ā", just to mislead me. > > > Of course! I couldn't understand why MyOutputStream was working for you > at all, but it is because your "ā" is still a byte-string. Presumably > len( "ā")==2. > > > I can process the unicode values: > > > > a = u"ā" > > print ord(a) > > > > But send the unicode character to output is possible only with some > > processing: > > > > print u"ā" # Fails > > print u"ā".encode("utf-8") # Success > > > > Seems that telling the Jython interpreter to automatically encode all > > outgoing text using utf-8 encoding would be the solution. But how to > > do that? > > > > > When print comes to encode those characters (as in line "# Fails") it > checks the encoding of the output stream. encoding==None means use the > default, which is normally ascii, but can be set using > org.python.core.codecs.setDefaultEncoding("utf-8"). > > Also, when you supplied your own OutputStream to setOut(), it was > wrapped for you in a PyFile() that has the encoding property. You could > supply your own PyFile that wraps (say) System.out. There's no > constructor argument setting the encoding, but the attribute is public, > writable from Java. On my machine I get success with: > > PyFile myOut = new PyFile(System.out); > myOut.encoding = "UTF-8"; > runtime.setOut(myOut); > > runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint > u'ā'")).encode("UTF-8")); > runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint > ord(u'ā')")).encode("UTF-8")); > > runtime.exec("print u'\\u0101'"); > > Jeff > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk > _______________________________________________ > Jython-users mailing list > Jyt...@li... > https://lists.sourceforge.net/lists/listinfo/jython-users > |