Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Problem with drawing text on some PDFs

Help
Ruslan
2013-10-16
2013-10-16
  • Ruslan
    Ruslan
    2013-10-16

    Hello!

    I'm using jPodRenderer to draw text on existing PDF documents.
    Everything worked fine, but recently I came across some PDF documents on which the text isn't drawn.
    Looking through the versions of these documents, I found that all of them are marked with %PDF-1.4 Sharp Scanned ImagePDF. Here You can take one of such documents (not mine, just found on the Internet) on which the problem reproduced: http://www.stammering.org/childleaflet.pdf

    I'm using the code:

    public class DrawTextOnDoc {
        public static void main(String[] args) throws Exception {
            PDDocument doc = null;
            try {
                String inputFileName = args[0];
                String outputFileName = args[1];
    
                FileLocator inlocator = new FileLocator(inputFileName);
                doc = PDDocument.createFromLocator(inlocator);
    
                PDFont font = PDFontType1.createNew(PDFontType1.FONT_Helvetica);
                font.setEncoding(WinAnsiEncoding.UNIQUE);
                float fontSize = 20;
    
                PDPage page = doc.getPageTree().getFirstPage();
                while (page != null)
                {
                    CSCreator creator = CSCreator.createFromProvider(page);
                    creator.textSetFont(null, font, fontSize);
                    creator.textLineMoveTo(100, 700);
                    creator.textShow("Hello, World!");
                    creator.close();
    
                    page = page.getNextPage();
                }
    
                FileLocator outlocator = new FileLocator(outputFileName);
                doc.save(outlocator, null);
            } finally {
                if (doc != null) {                
                    doc.close();                
                }
            }
        }
    }
    

    It should be noted jPodRenderer does not produce any errors and documents are actually changed (their size increase). But for some reason these changes are not visible when viewing outcome PDFs.

    Any ideas?

    Thanks in advance,
    Ruslan

     
    Last edit: Ruslan 2013-10-16
  • mtraut
    mtraut
    2013-10-16

    Your document is not very "well behaved".

    The content stream reads like

    599 0 0 841 0 0 cm
    /Img1 Do
    

    This means that the user space is scaled up to display the image, but is never reset. Your text is somewhere out there...

    To defend against such situations you can do something like (JavaScript code, should be easy to transform)

    var page = ...
    var oldContent = page.contentStream;
    var creator = Packages.de.intarsys.pdf.content.common.CSCreator.createNew(page);
    creator.saveState();
    creator.copy(oldContent);
    creator.restoreState();
    creator.saveState();
    creator.textSetFont(null, font, fontSize);
    creator.textLineMoveTo(100, 700);
    creator.textShow("Hello, World!");
    creator.restoreState();
    creator.close();
    

    This will copy the comlpete old content in between "q ... Q" and append your own code. Its good style anyway to encapsulate your snippets in save/restore.

    Be warned that this will parse and copy the old content which may be time consuming on some documents (not in this case). You can revert to more sophisticated strategies by prepending a complete "q" stream, appending a complete "Q" stream and then appending your stream, giving in total an array of 4 content streams.

     
  • Ruslan
    Ruslan
    2013-10-16

    many thanks! I checked this solution and it does work.
    But I wonder if there is a simple way to recognize such not "well behaved" PDFs.

     
  • mtraut
    mtraut
    2013-10-16

    beside parsing the content stream - no