Great, I can report two paths to success:

Keeping the existing setOut(OutputStream) setting as it is, it really was enough to do

org.python.core.codecs.setDefaultEncoding("utf-8")

to make Jython tolerate print u"\u0101"

It would be enough with that only, but I checked also your suggestion of passing to setOut a PyFile instance, to see if that gets rid of the UnicodeEncodeError exception and/or there is any change in the stream of bytes arriving to MyOutputStream, and it works like charm.

So I perhaps will go the second path, to keep the Jython code clean and pure without ugly init code.

Thanks, Jeff. Mission complete, case closed.

Pavils

On Tue, Oct 8, 2013 at 1:57 AM, Jeff Allen <ja.py@farowl.co.uk> wrote:
I think we're not far off now ...

On 06/10/2013 00:30, Pavils Jurjans wrote:
> You are right. Of course, there is encoding involved between the
> Jython interpreter and output stream. I used your advice to look up
> the sys.stdout.encoding, but I don't seem to get any progress there:
> running the code below gives me "sys.stdout.encoding = None".
>
> sys.stdout = codecs.getwriter("utf-8")(sys.stdout)
> print "sys.stdout.encoding = %s" % sys.stdout.encoding
>
> So, in fact, my previous test of
>
> print "ā"
>
> is a false success - Jython treats that "ā" literal as two byte string
> (ie, utf-8 version of "ā"), passes it to output stream where those two
> bytes are converted back to "ā", just to mislead me.
>
Of course! I couldn't understand why MyOutputStream was working for you
at all, but it is because your "ā" is still a byte-string. Presumably
len( "ā")==2.

> I can process the unicode values:
>
> a = u"ā"
> print ord(a)
>
> But send the unicode character to output is possible only with some
> processing:
>
> print u"ā"                           # Fails
> print u"ā".encode("utf-8")     # Success
>
> Seems that telling the Jython interpreter to automatically encode all
> outgoing text using utf-8 encoding would be the solution. But how to
> do that?
>
>
When print comes to encode those characters (as in line "# Fails") it
checks the encoding of the output stream. encoding==None means use the
default, which is normally ascii, but can be set using
org.python.core.codecs.setDefaultEncoding("utf-8").

Also, when you supplied your own OutputStream to setOut(), it was
wrapped for you in a PyFile() that has the encoding property. You could
supply your own PyFile that wraps (say) System.out. There's no
constructor argument setting the encoding, but the attribute is public,
writable from Java. On my machine I get success with:

             PyFile myOut = new PyFile(System.out);
             myOut.encoding = "UTF-8";
             runtime.setOut(myOut);

             runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint
u'ā'")).encode("UTF-8"));
             runtime.exec((Py.newUnicode("# encoding=UTF-8\nprint
ord(u'ā')")).encode("UTF-8"));

             runtime.exec("print u'\\u0101'");

Jeff

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Jython-users mailing list
Jython-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-users