Unicode, greek letters and super-/subscript

Help
2006-06-11
2012-12-04
  • Matthias Steffens

    Hi,

    I'm trying to display special characters such as an 'endash', greek letters or super-/subscripted characters in a PDF file - how can I achieve this using the pdf-php class? Or is this possible at all?

    I also can't figure out the syntax how one calls a special character (like, e.g., an endash) in PDF by character code. I've tried to find info about this in the PDF reference files but to no avail. I'd appreciate your help!

    My attempts to pass unicode-encoded entities to the pdf-php class have failed as well. Is this possible? Is there a Unicode encoding that can be specified with the 'selectFont' command?

    Many thanks for your help,

    Matthias

     
    • BB

      BB - 2009-01-07

      Hi

      I am having the same problems...do you already have a solution for this?

      Thanks

       
    • Matthias Steffens

      Hi Balazs,

      > I am having the same problems...
      > do you already have a solution for this?

      No, not really. AFAIK, pdf-php doesn't support Unicode/UTF-8.

      Some workarounds:

      - If all of your non-ASCII characters are part of the latin1 (ISO-8859-1) character set, you can pass these characters (encoded as latin1 chars) directly to the pdf-php class and they should display correctly.

      - to enable the correct display of a few more characters (which are not part of the latin1 character set), you can use a $diff array as a parameter to the selectFont() function. This is described in the ezPDF manual.

      In short, the pdf-php package let's you replace an (unused) character for any other PostScript char.

      The PDF reference has a list of supported PostScript/PDF character names: http://www.adobe.com/devnet/pdf/pdf_reference.html

      For the decimal code numbers of the ISO-8859-1 character set, see e.g.: http://www.ramsch.org/martin/uni/fmi-hp/iso8859-1.html

      Here's what I'm using:

      $diff = array(
                    166 => 'endash', // "¦"
                    169 => 'emdash', // "©"
                    170 => 'quotedblleft', // "ª"
                    172 => 'quotedblright', // "¬"
                    174 => 'quoteleft', // "®"
                    182 => 'quoteright' // "¶"
                   );

      $pdf->selectFont($textBodyFont, array('encoding' => 'WinAnsiEncoding', 'differences' => $diff));

      However, note that this gives you only a few additional characters, and not the full Unicode character set.

      HTH, Matthias

       
    • BB

      BB - 2009-01-08

      Hi Matthias,

      yeah I saw the encoding option in the docs, but it doesn't work for me.

      I installed my app on a new laptop and I put new versions from apache (2.2.x), PHP (5.2.8) on it. I use PHP4 and apache2.0.x on my other systems.

      The strange thing is, I have no problems with umlauts and € on my other systems, on the new one they don't work and some other stuff, like the tables (ezTable) are messed up.

      I'm trying to figure out if that depens on the new PHP version or on some char-set settings in MySQL.

      Could you be so nice to list your system's versions to compare? Also, the char-sets and collation-settings of your MySQL DB's would be nice!

      Concluding from the fact, that also the ezTables are messed up, I think PHP is our scapegoat here....:/

      I'll run a test with PHP4.9 or so...

      Cheers,
      Balazs

       
    • BB

      BB - 2009-01-09

      okay, I got apache 2.0.63 and PHP4.4.9 running, with the same result.

      Now I am going to investigate the collations and encoding settings in MySQL as it seems to be the only chance left for this error.

      Cheers

       
    • Matthias Steffens

      Hi Balazs,

      > Could you be so nice to list your system's versions to compare?
      > Also, the char-sets and collation-settings of your MySQL DB's would
      > be nice!

      On my local system I'm using PHP 5.2.6 and MySQL 5.0.67 with these settings:

      SHOW VARIABLES LIKE '%character%';

      +--------------------------+-----------------------------------------+
      | Variable_name            | Value                                   |
      +--------------------------+-----------------------------------------+
      | character_set_client     | latin1                                  |
      | character_set_connection | latin1                                  |
      | character_set_database   | latin1                                  |
      | character_set_filesystem | binary                                  |
      | character_set_results    | latin1                                  |
      | character_set_server     | latin1                                  |
      | character_set_system     | utf8                                    |
      | character_sets_dir       | /opt/local/share/mysql5/mysql/charsets/ |
      +--------------------------+-----------------------------------------+

      SHOW VARIABLES LIKE '%collation%';

      +----------------------+-------------------+
      | Variable_name        | Value             |
      +----------------------+-------------------+
      | collation_connection | latin1_swedish_ci |
      | collation_database   | latin1_swedish_ci |
      | collation_server     | latin1_swedish_ci |
      +----------------------+-------------------+

      To debug your MySQL server's character set and collation settings, you may be interested in parts 'a)' and 'b)' at:

      http://www.refbase.net/index.php/Troubleshooting#MySQL_migration_and_character_set_problems

      As mentioned above, if you have Unicode data, you'll need to transform them to latin1 in order to have them displayed properly with the pdf-php class. This can be done via the iconv() function.

      Matthias

       
    • BB

      BB - 2009-01-12

      Thanks!

      Somehow I got it right...just don't ask me how... :)

      BTW, have you ever tried utf8_decode() to transform your UTF-8 characters to ISO-8859-1?

      Cheers

       
    • Matthias Steffens

      > BTW, have you ever tried utf8_decode() to
      > transform your UTF-8 characters to ISO-8859-1?

      Yes, but I couldn't get it working for non-latin1 characters (where utf8_decode() seems to produce just question marks). In my app, I'm using a self-made conversion table:

      http://refbase.svn.sourceforge.net/viewvc/refbase/branches/bleeding-edge/includes/transtab_unicode_latin1.inc.php?revision=1145&view=markup

      which allows to convert UTF-8 characters (that are not part of the latin1 character set) to suitable ASCII representations.

      E.g., the "™" symbol gets converted into "[TM]", "Ω" gets "ohm", and "Ỳ" gets reduced to "Y", etc.

      For transliteration of non-latin1 chars one could probably also use:

      iconv("UTF-8", "ISO-8859-1//TRANSLIT", $sourceString)

      but this gave me errors (e.g. when a greek delta character was encountered).

      Following the above mentioned transliteration, I use iconv() to strip any remaining non-latin1 characters:

      iconv("UTF-8", "ISO-8859-1//IGNORE", $sourceString);

      This has worked quite well for me.

      Matthias

       
  • Eibo Thieme

    Eibo Thieme - 2009-11-03

    But in case you are just trying to use PHP Pdf with an otherwise pure UTF-8 system to produce output with only ISO8859-1 characters you might get away with one additional line in class.pdf.php:

        -- /tmp/class.pdf.php  2009-11-03 01:20:17.000000000 +0100
        +++ class.pdf.php       2009-11-03 01:20:02.000000000 +0100
        enter code h-- /tmp/class.pdf.php  2009-11-03 01:20:17.000000000 +0100
        +++ class.pdf.php       2009-11-03 01:20:02.000000000 +0100
        @@ -2154,6 +2154,7 @@
         * add text to the document, at a specified location, size and angle on the page
         */
         function addText($x,$y,$size,$text,$angle=0,$wordSpaceAdjust=0){
        +  $text = utf8_decode($text);
         if (!$this->numFonts){$this->selectFont('./fonts/Helvetica');}
       
         // if there are any open callbacks, then they should be called, to show the start of the line

    This is obviously an ugly hack, the class will only work with UTF-8 strings afterwards, but in case that's what you want this could get you there fast.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks