From: Jeff A. <ja...@fa...> - 2013-10-07 22:58:18
|
I think we're not far off now ... On 06/10/2013 00:30, Pavils Jurjans wrote: > You are right. Of course, there is encoding involved between the > Jython interpreter and output stream. I used your advice to look up > the sys.stdout.encoding, but I don't seem to get any progress there: > running the code below gives me "sys.stdout.encoding = None". > > sys.stdout = codecs.getwriter("utf-8")(sys.stdout) > print "sys.stdout.encoding = %s" % sys.stdout.encoding > > So, in fact, my previous test of > > print "ā" > > is a false success - Jython treats that "ā" literal as two byte string > (ie, utf-8 version of "ā"), passes it to output stream where those two > bytes are converted back to "ā", just to mislead me. > Of course! I couldn't understand why MyOutputStream was working for you at all, but it is because your "ā" is still a byte-string. Presumably len( "ā")==2. > I can process the unicode values: > > a = u"ā" > print ord(a) > > But send the unicode character to output is possible only with some > processing: > > print u"ā" # Fails > print u"ā".encode("utf-8") # Success > > Seems that telling the Jython interpreter to automatically encode all > outgoing text using utf-8 encoding would be the solution. But how to > do that? > > When print comes to encode those characters (as in line "# Fails") it checks the encoding of the output stream. encoding==None means use the default, which is normally ascii, but can be set using org.python.core.codecs.setDefaultEncoding("utf-8"). Also, when you supplied your own OutputStream to setOut(), it was wrapped for you in a PyFile() that has the encoding property. You could supply your own PyFile that wraps (say) System.out. There's no constructor argument setting the encoding, but the attribute is public, writable from Java. On my machine I get success with: PyFile myOut = new PyFile(System.out); myOut.encoding = "UTF-8"; runtime.setOut(myOut); runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint u'ā'")).encode("UTF-8")); runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint ord(u'ā')")).encode("UTF-8")); runtime.exec("print u'\\u0101'"); Jeff |