From: Mathis H. <mat...@dr...> - 2007-06-29 10:26:43
Attachments:
signature.asc
|
Hi all I'm trying to output a PDF generated with Reportlab within a WebKit servlet. The PDF data seems to be in WinAnsiEncoding (CP1252). If I just output the data with self.write(pdf), I get this error: File "/opt/Webware/WebKit/ASStreamOut.py", line 84, in flush self._buffer +=3D ''.join(self._chunks) UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 11: ordinal not in range(128) I don't know why it wants to decode it as ASCII... so let's try to decode it by myself and send it as Unicode: self.write(unicode(pdf, "CP1252")) Which leads me to a problem within the HTTPContent class: File "/opt/Webware/WebKit/HTTPContent.py", line 176, in write self._response.write(str(arg)) UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-14: ordinal not in range(128) I can workaround this by using UTF-8 in HTTPContent.write() (which is what I actually want to do for the normal HTML content anyway): self._response.write(unicode(arg).encode(self._encoding)) But that doesn't help, still the same problem in ASStreamOut: File "/opt/Webware/WebKit/ASStreamOut.py", line 84, in flush self._buffer +=3D ''.join(self._chunks) UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 11: ordinal not in range(128) I really can't make it -- stuck in the encoding hell. Can somebody help m= e? I just want to output that PDF data :-/ Regards, Mathis --=20 DreamLab Technologies AG Monbijoustrasse 36 3011 Bern Switzerland Tel: +41 31 398 66 66 Fax: +41 31 398 66 69 PGP Key ID: 2462240B |
From: Christoph Z. <ci...@on...> - 2007-06-29 10:47:53
|
Mathis, can you send a complete example servlet so I can reproduce this? I am using Reportlab with Webware but never saw such problems. -- Chris |
From: Mathis H. <mat...@dr...> - 2007-06-29 11:25:52
Attachments:
signature.asc
ReportlabTest.py
|
Hi Christoph Here is a stripped down servlet which gives me a UnicodeDecodeError. I found out, that the table is the problem, there is no exception with paragraphs. Mathis -- DreamLab Technologies AG Monbijoustrasse 36 3011 Bern Switzerland Tel: +41 31 398 66 66 Fax: +41 31 398 66 69 PGP Key ID: 2462240B |
From: Christoph Z. <ci...@on...> - 2007-06-29 13:43:18
|
Mathis Hofer wrote: > Here is a stripped down servlet which gives me a UnicodeDecodeError. I > found out, that the table is the problem, there is no exception with > paragraphs. The servlet works for me, I get the PDF with the table with no errors. Which versions of Python, Webware, Reportlab and OS are you using? Did you change sys.setdefaultencoding or something like that? -- Chris |
From: Mathis H. <mat...@dr...> - 2007-06-29 19:29:41
Attachments:
signature.asc
|
Hey Chris > The servlet works for me, I get the PDF with the table with no errors. > Which versions of Python, Webware, Reportlab and OS are you using? > Did you change sys.setdefaultencoding or something like that? Thank you for helping. I was finally able to fix it. There were two problems: 1. I apparently used a unicode string for the "Content-Disposition" which I had to convert to ASCII. 2. I had to override the HTTPContent.write() method and put out the strings without converting them to "str". Otherwise WebKit tried to convert the CP1252-encoded string into ASCII which also resulted in a UnicodeDecodeError for special chars. The latter is a general problem for web pages in Unicode and I think it should be changed (if not already done): The HTTPContent.write() should not convert the output to ASCII, since this makes output in different encodings impossible... it should either leave the output like it is so it's up to the developer to perform the encoding, or it should use a customizable encoding. Greetings, Mathis --=20 DreamLab Technologies AG Monbijoustrasse 36 3011 Bern Switzerland Tel: +41 31 398 66 66 Fax: +41 31 398 66 69 PGP Key ID: 2462240B |
From: Christoph Z. <ci...@on...> - 2007-06-29 20:15:23
|
Mathis Hofer wrote: > 2. I had to override the HTTPContent.write() method and put out the > strings without converting them to "str". Otherwise WebKit tried to > convert the CP1252-encoded string into ASCII which also resulted in a > UnicodeDecodeError for special chars. > > The latter is a general problem for web pages in Unicode and I think it > should be changed (if not already done): > The HTTPContent.write() should not convert the output to ASCII, since > this makes output in different encodings impossible... it should either > leave the output like it is so it's up to the developer to perform the > encoding, or it should use a customizable encoding. Actually HTTPContent.write() does not convert its arguments to ASCII, but only to str; it's perfectly ok if they are CP1252-encoded strings. The problem comes when you pass unicode arguments. HTTPContent *must* convert them to str, because otherwise they cannot be appended to the output buffer. So the only thing HTTPContent could do is encode them according to some default encoding, probably utf-8. In your case utf-8 would have been the wrong guess and you might have been even more confused. But we could make the default encoding configurable. -- Chris |
From: Ben P. <be...@pa...> - 2007-06-29 20:31:21
|
Mathis Hofer wrote on 6/29/07 12:29 PM: > There were two problems: > 1. I apparently used a unicode string for the "Content-Disposition" > which I had to convert to ASCII. > 2. I had to override the HTTPContent.write() method and put out the > strings without converting them to "str". Otherwise WebKit tried to > convert the CP1252-encoded string into ASCII which also resulted in a > UnicodeDecodeError for special chars. > > The latter is a general problem for web pages in Unicode and I think it > should be changed (if not already done): > The HTTPContent.write() should not convert the output to ASCII, since > this makes output in different encodings impossible... it should either > leave the output like it is so it's up to the developer to perform the > encoding, or it should use a customizable encoding. > > HTTPContent.write() is not explicitly converting to ASCII. It is converting to a string whatever object is about to be written. Because you are passing a unicode object, Python needs to encode it with a character set to get a string, and the default site-wide character set for a Python installation is ASCII. You can configure this for your Python installation if you need a different default character set. In general, if you are using unicode objects in your code, you as the developer need to apply a character set to them before they can be "written" anywhere (to a file, socket, etc). Even without the explicit str() call in HTTPContent.write(), you would get the same error I think, because Python will need to have a string to append to the buffer, and the default __str__ implementation for unicode objects is to encode them with the site-wide character set (ASCII by default). If not, removing that str() call would give you TypeError or something along those lines. I would recommend overriding .write() in your main servlet class, and have that function compensate for the lack of a character set. For example, you could change the function signature to be def write(unicodeObject, charSet):, or you could have the write() function accept either a string or both a unicode object and character set, or reference an attribute on the servlet which specifies the charset to be used with unicode objects, etc. Regards - Ben |
From: Christoph Z. <ci...@on...> - 2007-06-30 07:40:12
|
Ben Parker wrote: > I would recommend overriding .write() in your main servlet class, and > have that function compensate for the lack of a character set. For > example, you could change the function signature to be def > write(unicodeObject, charSet):, or you could have the write() function > accept either a string or both a unicode object and character set, or > reference an attribute on the servlet which specifies the charset to be > used with unicode objects, etc. Do we want to offer something like this as the default? The encoding attribute could be set to a configurable (via Application.config) default such as 'utf-8', and could be used in other places as well. For instance, the Page class could write a content-type meta tag with the inherited attribute of HTTPContent in its writeMetaData method. And of course it could be overwritten by individual servlets. I personally avoid using unicode in Webware; instead I write my servlets in latin-1 or utf-8, add a content-type tag to my base class and a "coding:" line at the top of my servlets, use the same encoding in the database and everythign fits together, with no unicode objects around. But maybe if you get unicode content from other sources or want to switch between different encodings, using unicode will be an option. Please give me your opinions since the next release is coming soon so it can be already included if we think it's useful. -- Chris |
From: Ben P. <be...@pa...> - 2007-06-30 08:45:45
|
Christoph Zwerschke wrote on 6/30/07 12:40 AM: > Do we want to offer something like this as the default? The encoding > attribute could be set to a configurable (via Application.config) > default such as 'utf-8', and could be used in other places as well. For > instance, the Page class could write a content-type meta tag with the > inherited attribute of HTTPContent in its writeMetaData method. And of > course it could be overwritten by individual servlets. > > I don't see a need for it in the Page class. How about implementing it in an example UnicodePage.py similar to the example SidebarPage.py? It seems like the main reason for using unicode objects would be to display various character sets for any page - otherwise why not simply use strings in the global character set of choice? If that assumption is true, then it makes more sense to have the character set configurable per servlet transaction, determined by whatever business logic decides the correct character set based on the user's request. Because this would be dependent on the web application, I think having some function on the UnicodePage which returns the character set for the servlet transaction would be more useful than an application-wide setting. > I personally avoid using unicode in Webware; instead I write my servlets > in latin-1 or utf-8, add a content-type tag to my base class and a > "coding:" line at the top of my servlets, use the same encoding in the > database and everythign fits together, with no unicode objects around. > > I agree. On our multilingual projects, we use UTF-8 strings everywhere. One character set for all the languages, very simple. - Ben |
From: Christoph Z. <ci...@on...> - 2007-06-30 09:26:33
|
Ben Parker wrote: > I don't see a need for it in the Page class. How about implementing it > in an example UnicodePage.py similar to the example SidebarPage.py? I'd rather implement it as a Mixin then that can be added to any other base class (SideBarPage is actually intended as a base class). > I agree. On our multilingual projects, we use UTF-8 strings everywhere. > One character set for all the languages, very simple. Exactly. In the past, utf-8 was not so well supported by some browsers, so I preferred latin-1, but nowadays it is much better. You sometimes read the recommendation "only use unicode for strings", but using ordinary strings with a consistent encoding of utf-8 is much simpler, at least for Webware. By the way, in Python 3.0 these issues will eventually go away. All strings will be unicode and use a default encoding of utf-8. -- Chris |