[Rest2web-develop] unicode problems

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Michael,

I see restutils.encode uses the string encode function. I don't
think this is what you want.

< madduck> so i am baffled
< madduck> >>> type('bla'.encode('utf-8'))
< madduck> <type 'str'>
< cracki> encode returns 8 bit
< madduck> or even worse,
< madduck> >>> type(u'bla'.encode('utf-8'))
< madduck> <type 'str'>
< cracki> you want unicode("bla")
< cracki> encode encodes to binary representations
< madduck> what's the point of "encode('utf-8')" then?
< cracki> unicode("foo", "utf-8")
< cracki> encode(u"someunicodestr", "weirdencoding") transforms to a=20
          binary representation
< cracki> in memory, unicode strings are multibyte, constant width

Please also see
  http://docs.python.org/tut/node5.html#SECTION005130000000000000000
  http://www.reportlab.com/i18n/python_unicode_tutorial.html

The reason I am posting this is because I am getting an error

  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
  position 2555: ordinal not in range(128)

This is due to a file that says "Z=FCrich", and the file itself is
UTF-8, as is the template:

  lapse:~/phd/web> head -15 imprint.txt                                    =
 [390]
  restindex
    encoding: utf8
    template-encoding:=20
  /restindex
  [...]
    8050 Z=FCrich

The exception is thrown in line 75 of embedded_code.py:

  template =3D template.replace(occ, value)

when template holds the template text just after body had been
filled in with the result from the imprint.txt transformed to HTML.
Template is a str, not a unicode object, which is the root of all
evil.

Am I doing something wrong?

--=20
martin;              (greetings from the heart of the sun.)
  \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck
=20
spamtraps: mad...@ma...
=20
"violence is the last refuge of the incompetent"
                                                       -- isaac asimov