From: Иван Ч. <cam...@ya...> - 2008-03-19 06:14:31
|
Hi, for me the solution was as follows ('myfile' is a file with a single utf-8 encoded line containing 6-characters Russian word 'Привет' (== 'Hello') ): f = open('myfile', 'r') ll = f.readlines() f.close() print ll ['\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82\n'] (so you can see it isn't a Unicode string and it is two-byte-oriented). Now: import codecs for l in ll : uni_l = codecs.utf_8_decode(l) print uni_l (u'\u041F\u0440\u0438\u0432\u0435\u0442\n', 13) Now it's encoded correctly. Looks like kludge IMO. But hope this helps. By the way, if the file is fully read with f.read() instead of f.readlines(), everything is u''-encoded correctly. Does someone knows why? For Python, there's no difference between read() and readlines() results (I've tried). -- Ivan |