sorry, forgot to reply to list, see below

On 08/28/2013 10:43 PM, Jim Baker wrote:
# -*- coding: utf-8 -*- 

from import File, FileOutputStream
from contextlib import closing

x = u"首页"
with closing(FileOutputStream(File("output.txt"))) as f:
    f.write("<b>%s</b>" % (x,))  # FileOutputStream takes bytes...
    f.write("<b>%s</b>" % (x.encode("UTF-8"),))

This results in the following output:


Interestingly, if you were to change the above to x = "首页" (or equivalently in 2.7, x = b"首页"), you would setting x to a bytestring with the underlying bytes in UTF-8 (because that's what the source encoding was, we said so in the code) and you would get this output:


The moral here is that you need to carefully track your encodings.
Thanks Jim,

I'm using jython 2.5 and this works:

x = "José"
toClient.println("<li>%s</li>" % x)

In my example, row[0] is of type PyUnicode and the db uses encoding utf8. So, some part of the pipelining is changing the encoding. After a search, I got to,-PyString-and-PyUnicode-p18767753.html

where you state that PyUnicode uses utf-16. I can't test now, but I guess .encode('utf-8') should do the trick, right? And I don't suppose there is a way to avoid this extra conversion?

BTW, what is the state of jython 2.7 bugwise?