[Rest2web-develop] unicode problems
Brought to you by:
mjfoord
From: martin f k. <ma...@ma...> - 2006-08-05 21:05:43
|
Michael, I see restutils.encode uses the string encode function. I don't think this is what you want. < madduck> so i am baffled < madduck> >>> type('bla'.encode('utf-8')) < madduck> <type 'str'> < cracki> encode returns 8 bit < madduck> or even worse, < madduck> >>> type(u'bla'.encode('utf-8')) < madduck> <type 'str'> < cracki> you want unicode("bla") < cracki> encode encodes to binary representations < madduck> what's the point of "encode('utf-8')" then? < cracki> unicode("foo", "utf-8") < cracki> encode(u"someunicodestr", "weirdencoding") transforms to a=20 binary representation < cracki> in memory, unicode strings are multibyte, constant width Please also see http://docs.python.org/tut/node5.html#SECTION005130000000000000000 http://www.reportlab.com/i18n/python_unicode_tutorial.html The reason I am posting this is because I am getting an error UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2555: ordinal not in range(128) This is due to a file that says "Z=FCrich", and the file itself is UTF-8, as is the template: lapse:~/phd/web> head -15 imprint.txt = [390] restindex encoding: utf8 template-encoding:=20 /restindex [...] 8050 Z=FCrich The exception is thrown in line 75 of embedded_code.py: template =3D template.replace(occ, value) when template holds the template text just after body had been filled in with the result from the imprint.txt transformed to HTML. Template is a str, not a unicode object, which is the root of all evil. Am I doing something wrong? --=20 martin; (greetings from the heart of the sun.) \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck =20 spamtraps: mad...@ma... =20 "violence is the last refuge of the incompetent" -- isaac asimov |