help using snippets

Help
caseta
2008-07-22
2013-05-28
  • caseta
    caseta
    2008-07-22

    Hi,

    First thanks for your work providing a free PDF API.
    Now for my question:
    I need to do a text replace in a PDF doc and your API seems to be the only one able to do that.
    I checked the examples, I checked the snippets, yet I still can't even display the text on a PDF page. :(
    There is something that I am missing, so could you please help ?

    Here's the code I am currently using:

    FileLocator locator = new FileLocator("C:\\Sample1.PDF");
    PDDocument pdfDocument = PDDocument.createFromLocator(locator);
    PDPageTree pageTree = pdfDocument.getPageTree();
    PDPage page = pdfDocument.getPageTree().getFirstPage();
    CSContent content = page.getContentStream();
    CSCreator creator = CSCreator.createFromContent(content, page);
    ICSInterpreter interpreter = creator.getInterpreter();
    CSTextExtractor textExtractor = new CSTextExtractor();
    textExtractor.open(interpreter);
    System.out.println(textExtractor.getContent());

    So could you please tell me why I don't get anything printed to the sys out ?

    Here's the PDF doc I am testing on (just a sample I picked off the net):
    http://www.pdfpdf.com/samples/Sample1.PDF

    I am also trying to search within the document, of course without any success so far.
    The code I use is:

    FileLocator locator = new FileLocator("C:\\Sample1.PDF");
    PDDocument pdfDocument = PDDocument.createFromLocator(locator);
    PDPageTree pageTree = pdfDocument.getPageTree();
    PDPage page = pdfDocument.getPageTree().getFirstPage();
    CSContent content = page.getContentStream();
    CSCreator creator = CSCreator.createFromContent(content, page);
    ICSInterpreter interpreter = creator.getInterpreter();
    CSTextSearcher textSearcher = new CSTextSearcher();
    textSearcher.setSearchString("vous");
    textSearcher.open(interpreter);
    System.out.println(textSearcher.getHits().size());

    Could you please provide a couple of samples of code that just work ?
    Thank you very much !!

     
    • mtraut
      mtraut
      2008-07-22

      The pattern to use is a little bit different:

      FileLocator locator = new FileLocator("C:\\Sample1.PDF");
      PDDocument pdfDocument = PDDocument.createFromLocator(locator);
      PDPageTree pageTree = pdfDocument.getPageTree();
      PDPage page = pdfDocument.getPageTree().getFirstPage();
      CSContent content = page.getContentStream();

      /*
      fine until here:

      you have page and content stream.

      now you need an interpreter for processing the content stream and a device to receive
      the interpreters commands...
      */

      CSTextExtractor textExtractor = new CSTextExtractor();
      CSDeviceBasedInterpreter interpreter = new CSDeviceBasedInterpreter(null, extractor);

      /*
      now you can interpret the content and your device will receive events..
      */
      interpreter.process(content, page.getResources());

      /*
      if the content can be interpreted (wich may be not implemented for strange encodings)
      you will find it here
      */

      System.out.println(textExtractor.getContent());

      /***********************************************************************************************

      some comments on the rest of the code:

      a creator device is for creating a PDF content stream
      */
      CSCreator creator = CSCreator.createFromContent(content, page);

      /*
      it has no interpreter - an interpreter is connected with a device only
      via the code pattern above.

      So you could create an interpreter on a content stream and connect it again to a
      creator device - for example for filtering certain operators and create a new content directly.
      */
      ICSInterpreter interpreter = creator.getInterpreter(); // this is null
      CSTextExtractor textExtractor = new CSTextExtractor();

      /*
      this method is called by an interpreter only and should somehow disappear from the interface.
      maybe someday i will spend some thought on it...
      */
      textExtractor.open(interpreter);

      /*
      so this is empty... it has just not received any command from an interpreter
      */
      System.out.println(textExtractor.getContent());

      /**********************************************************************************************/

      The other example should be rewritten the same way as mentioned above.

      The text extractor and text searcher are not yet part of an official release and you may
      have some work to do to get it right for you.

      I hope with the next release (august) we will be able to provide some more examples for the powerful interpreter/device framework. Until now, there is just no device (besides CSCreator, whose main purpose is in another domain) to show useful examples (remember, the snippets are
      provided later and not yet "released").