ReportLab / Bugs / #29 Non-ASCII characters broken

Non-ASCII characters broken

#29 Non-ASCII characters broken

Milestone: essential_fix

Status: closed

Owner: Andy Robinson

Labels: pdfgen (12)

Priority: 5

Updated: 2002-01-17

Created: 2001-05-30

Creator: Anonymous

Private: No

Non-ASCII characters used to be shown correctly
in the generated PDF files. Python recently (since
1.6? 2.0?) changed the escape codes for such
characters in a repr'ed string from \nnn to \xhh.
Reportlab puts these into the output file; only
parentheses are treated specially.

In order to display Latin-1 encoded text properly
(again!) I applied the following simplistic patch.
A better solution might involve Unicode operations.
(Patch against Reportlab_1_07.)

[Sorry, the sf login didn't work, somehow.]

*** canvas.py.orig Fri Apr 20 07:52:05 2001
--- canvas.py Thu May 31 00:57:17 2001
***************
*** 51,56 ****
--- 51,62 ----
(1, 1, FILL_NON_ZERO) : 'B', #Stroke and
Fill
}

+ import re
+ _re_ascii = re.compile(r"[\x7f-\xff]")
+ def _oct_escape(matchobj):
+ return "\\%o" % ord(matchobj.group(0))
+
+
class Canvas:
"""This class is the programmer's interface to
the PDF file format. Methods
are (or will be) provided here to do just about
everything PDF can do.
***************
*** 183,191 ****
def _escape(self, s):
"""PDF escapes are like Python ones, but
brackets need slashes before them too.
Use Python's repr function and chop off the
quotes first"""
! s = repr(s)[1:-1]
! s = string.replace(s, '(','\(')
! s = string.replace(s, ')','\)')
return s

#info functions - non-standard
--- 189,197 ----
def _escape(self, s):
"""PDF escapes are like Python ones, but
brackets need slashes before them too.
Use Python's repr function and chop off the
quotes first"""
! s = _re_ascii.sub(_oct_escape, s)
! s = s.replace('(','\(')
! s = s.replace(')','\)')
return s

#info functions - non-standard

Discussion

Mats Wichmann - 2001-06-02

Logged In: YES
user_id=53605

Just ran into this as well... tried to fool some with PythonPoint, after
lots of changes since the last time. I still have the old pdf file generated
from pythonpoint.xml, and it's fine, but the one just generated is full
of crud for bullets and other non-ascii output (e.g., the page footer
text is messed up). I'm happy to foward a one-page sample pdf file
if it helps. I wasn't going to say anything since so much changed
since the first time I tried this - too hard to attribute blame: new
version of reportlab (1.07, was 1.06); new version of Acrobat reader
(5, was 4); new version of Python (2.1); and Win2k is now Service
Pack 2 (might even have been WinNT, not 2000, when I last played
with this). The suspicion is Python 2.1 is what caused this in the
change in character escapes from octal to hex.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peter Harris - 2001-09-20

Logged In: YES
user_id=8911

Thanks whoever came up with this patch! I was using
reportlab to turn text reports into PDF (substituting
multiple '-' with wide dashes '\x97'. Needless to say the
reports looked horrible.

I am really delighted to find the answer within 10 minutes
of noticing the problem.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robin Becker - 2002-01-17

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robin Becker - 2002-01-17

assigned_to: nobody --> andy_robinson
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Non-ASCII characters broken

Group

Searches

Help

#29 Non-ASCII characters broken

Discussion