Problems with text rendering on some PDF documents

  • Ruslan

    Ruslan - 2013-11-07

    Hi, first of all thanks for the lib, its quite nice.

    But recently I came across some PDF documents on which the text isn't drawn properly. Point is that when I'm trying to write English text on some of PDF documents with "Arial" font, the text is not displayed correctly, namely latin chars are replaced with unreadable chars.
    Here is one of such documents on which the problem can be reproduced:

    Code to reproduce the problem:

    public class DrawTextOnDocument {
        public static void main(String[] args) throws Exception {
            PDDocument doc = null;
            try {
                String inputFileName = "testMod.pdf";
                String outputFileName = "testMod_result.pdf";
                FileLocator inlocator = new FileLocator(inputFileName);
                doc = PDDocument.createFromLocator(inlocator);
                IFontFactory factory = FontOutlet.get().lookupFontFactory(doc);
                FontQuery query = new FontQuery("Arial", PDFontStyle.REGULAR);
                PDFont font = factory.getFont(query);
                float fontSize = 20;
                PDPage page = doc.getPageTree().getFirstPage();
                while (page != null)
                    CSCreator creator = CSCreator.createFromProvider(page);
                    creator.textSetFont(null, font, fontSize);
                    creator.textLineMoveTo(100, 700);
                    creator.textShow("Hello, World!");
                    page = page.getNextPage();
                FileLocator outlocator = new FileLocator(outputFileName);
      , null);
            } finally {
                if (doc != null) {                

    Сan I somehow use "Arial" font to draw text on this PDF? Is there anything else I need to get this font to render properly?

    Thanks in advance

    Last edit: Ruslan 2013-11-07
  • Elfi Heck

    Elfi Heck - 2013-11-07

    The font "Arial" in the existing PDF is defined as a Type0 font with an Identity-H map and a Type2 CID font as its descendant font. This means that the code points in the PDF's contents must be the same as the actual glyph indices in the font file. To get at these indices you'd have to use a character map contained in the font file itself, but CSCreator does not do this, and instead writes the chracters' Unicode code points in the PDF contents. As a font file's glyph indices are almost never the same as the corresponding characters' Unicode code points, you get the wrong glyphs.
    It should be possible to implement a mapping to handle this, but I don't think we will do that. You can have a go at it if you want to dig deeper into font handling.
    As a quick workaround you could use a another font instead of the one from the document. The builtin fonts are quite easy to access. See the examples.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks