[Rest2web-develop] unicode problems
Brought to you by:
mjfoord
|
From: martin f k. <ma...@ma...> - 2006-08-05 21:05:43
|
Michael,
I see restutils.encode uses the string encode function. I don't
think this is what you want.
< madduck> so i am baffled
< madduck> >>> type('bla'.encode('utf-8'))
< madduck> <type 'str'>
< cracki> encode returns 8 bit
< madduck> or even worse,
< madduck> >>> type(u'bla'.encode('utf-8'))
< madduck> <type 'str'>
< cracki> you want unicode("bla")
< cracki> encode encodes to binary representations
< madduck> what's the point of "encode('utf-8')" then?
< cracki> unicode("foo", "utf-8")
< cracki> encode(u"someunicodestr", "weirdencoding") transforms to a=20
binary representation
< cracki> in memory, unicode strings are multibyte, constant width
Please also see
http://docs.python.org/tut/node5.html#SECTION005130000000000000000
http://www.reportlab.com/i18n/python_unicode_tutorial.html
The reason I am posting this is because I am getting an error
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
position 2555: ordinal not in range(128)
This is due to a file that says "Z=FCrich", and the file itself is
UTF-8, as is the template:
lapse:~/phd/web> head -15 imprint.txt =
[390]
restindex
encoding: utf8
template-encoding:=20
/restindex
[...]
8050 Z=FCrich
The exception is thrown in line 75 of embedded_code.py:
template =3D template.replace(occ, value)
when template holds the template text just after body had been
filled in with the result from the imprint.txt transformed to HTML.
Template is a str, not a unicode object, which is the root of all
evil.
Am I doing something wrong?
--=20
martin; (greetings from the heart of the sun.)
\____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck
=20
spamtraps: mad...@ma...
=20
"violence is the last refuge of the incompetent"
-- isaac asimov
|