Mathis Hofer wrote on 6/29/07 12:29 PM:
> There were two problems:
> 1. I apparently used a unicode string for the "Content-Disposition"
> which I had to convert to ASCII.
> 2. I had to override the HTTPContent.write() method and put out the
> strings without converting them to "str". Otherwise WebKit tried to
> convert the CP1252-encoded string into ASCII which also resulted in a
> UnicodeDecodeError for special chars.
>
> The latter is a general problem for web pages in Unicode and I think it
> should be changed (if not already done):
> The HTTPContent.write() should not convert the output to ASCII, since
> this makes output in different encodings impossible... it should either
> leave the output like it is so it's up to the developer to perform the
> encoding, or it should use a customizable encoding.
>
>
HTTPContent.write() is not explicitly converting to ASCII. It is
converting to a string whatever object is about to be written. Because
you are passing a unicode object, Python needs to encode it with a
character set to get a string, and the default site-wide character set
for a Python installation is ASCII. You can configure this for your
Python installation if you need a different default character set.
In general, if you are using unicode objects in your code, you as the
developer need to apply a character set to them before they can be
"written" anywhere (to a file, socket, etc). Even without the explicit
str() call in HTTPContent.write(), you would get the same error I think,
because Python will need to have a string to append to the buffer, and
the default __str__ implementation for unicode objects is to encode them
with the site-wide character set (ASCII by default). If not, removing
that str() call would give you TypeError or something along those lines.
I would recommend overriding .write() in your main servlet class, and
have that function compensate for the lack of a character set. For
example, you could change the function signature to be def
write(unicodeObject, charSet):, or you could have the write() function
accept either a string or both a unicode object and character set, or
reference an attribute on the servlet which specifies the charset to be
used with unicode objects, etc.
Regards - Ben
|