sorry, forgot to reply to list, see below

On 08/28/2013 10:43 PM, Jim Baker wrote:
...
# -*- coding: utf-8 -*- 

from java.io import File, FileOutputStream
from contextlib import closing

x = u"首页"
with closing(FileOutputStream(File("output.txt"))) as f:
    f.write("<b>%s</b>" % (x,))  # FileOutputStream takes bytes...
    f.write("<b>%s</b>" % (x.encode("UTF-8"),))

This results in the following output:

<b>??</b><b>首页</b>

Interestingly, if you were to change the above to x = "首页" (or equivalently in 2.7, x = b"首页"), you would setting x to a bytestring with the underlying bytes in UTF-8 (because that's what the source encoding was, we said so in the code) and you would get this output:

<b>首页</b><b>首页</b>

The moral here is that you need to carefully track your encodings.
Thanks Jim,

I'm using jython 2.5 and this works:

x = "José"
toClient.println("<li>%s</li>" % x)

In my example, row[0] is of type PyUnicode and the db uses encoding utf8. So, some part of the pipelining is changing the encoding. After a search, I got to

http://old.nabble.com/Re%3A-On-String,-PyString-and-PyUnicode-p18767753.html

where you state that PyUnicode uses utf-16. I can't test now, but I guess .encode('utf-8') should do the trick, right? And I don't suppose there is a way to avoid this extra conversion?

BTW, what is the state of jython 2.7 bugwise?

Regards,
Fernando