Menu

#29 Non-ASCII characters broken

essential_fix
closed
pdfgen (12)
5
2002-01-17
2001-05-30
Anonymous
No

Non-ASCII characters used to be shown correctly
in the generated PDF files. Python recently (since
1.6? 2.0?) changed the escape codes for such
characters in a repr'ed string from \nnn to \xhh.
Reportlab puts these into the output file; only
parentheses are treated specially.

In order to display Latin-1 encoded text properly
(again!) I applied the following simplistic patch.
A better solution might involve Unicode operations.
(Patch against Reportlab_1_07.)

[Sorry, the sf login didn't work, somehow.]

*** canvas.py.orig Fri Apr 20 07:52:05 2001
--- canvas.py Thu May 31 00:57:17 2001
***************
*** 51,56 ****
--- 51,62 ----
(1, 1, FILL_NON_ZERO) : 'B', #Stroke and
Fill
}

+ import re
+ _re_ascii = re.compile(r"[\x7f-\xff]")
+ def _oct_escape(matchobj):
+ return "\\%o" % ord(matchobj.group(0))
+
+
class Canvas:
"""This class is the programmer's interface to
the PDF file format. Methods
are (or will be) provided here to do just about
everything PDF can do.
***************
*** 183,191 ****
def _escape(self, s):
"""PDF escapes are like Python ones, but
brackets need slashes before them too.
Use Python's repr function and chop off the
quotes first"""
! s = repr(s)[1:-1]
! s = string.replace(s, '(','\(')
! s = string.replace(s, ')','\)')
return s

#info functions - non-standard
--- 189,197 ----
def _escape(self, s):
"""PDF escapes are like Python ones, but
brackets need slashes before them too.
Use Python's repr function and chop off the
quotes first"""
! s = _re_ascii.sub(_oct_escape, s)
! s = s.replace('(','\(')
! s = s.replace(')','\)')
return s

#info functions - non-standard

Discussion

  • Mats Wichmann

    Mats Wichmann - 2001-06-02

    Logged In: YES
    user_id=53605

    Just ran into this as well... tried to fool some with PythonPoint, after
    lots of changes since the last time. I still have the old pdf file generated
    from pythonpoint.xml, and it's fine, but the one just generated is full
    of crud for bullets and other non-ascii output (e.g., the page footer
    text is messed up). I'm happy to foward a one-page sample pdf file
    if it helps. I wasn't going to say anything since so much changed
    since the first time I tried this - too hard to attribute blame: new
    version of reportlab (1.07, was 1.06); new version of Acrobat reader
    (5, was 4); new version of Python (2.1); and Win2k is now Service
    Pack 2 (might even have been WinNT, not 2000, when I last played
    with this). The suspicion is Python 2.1 is what caused this in the
    change in character escapes from octal to hex.

     
  • Peter Harris

    Peter Harris - 2001-09-20

    Logged In: YES
    user_id=8911

    Thanks whoever came up with this patch! I was using
    reportlab to turn text reports into PDF (substituting
    multiple '-' with wide dashes '\x97'. Needless to say the
    reports looked horrible.

    I am really delighted to find the answer within 10 minutes
    of noticing the problem.

     
  • Robin Becker

    Robin Becker - 2002-01-17
    • status: open --> closed
     
  • Robin Becker

    Robin Becker - 2002-01-17
    • assigned_to: nobody --> andy_robinson
     

Log in to post a comment.

MongoDB Logo MongoDB