Re: [Jython-users] encoded (UTF) strings in Jython

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

for me the solution was as follows ('myfile' is a file with a single utf-8 encoded line containing 6-characters Russian word 'Привет' (== 'Hello') ):

  f = open('myfile', 'r')

  ll = f.readlines()

  f.close()

  print ll

['\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82\n']

(so you can see it isn't a Unicode string and it is two-byte-oriented). Now:

  import codecs

  for l in ll :

    uni_l = codecs.utf_8_decode(l)

    print uni_l

(u'\u041F\u0440\u0438\u0432\u0435\u0442\n', 13)

Now it's encoded correctly. Looks like kludge IMO. But hope this helps.

By the way, if the file is fully read with f.read() instead of f.readlines(), everything is u''-encoded correctly. Does someone knows why? For Python, there's no difference between read() and readlines() results (I've tried).

--

   Ivan