Iterate Objects and read/modify properties

Help
2008-01-29
2013-01-26
  • Gorden Wiegels
    Gorden Wiegels
    2008-01-29

    Hi,

    First of all - great project and great product as far as I could see :)

    I am looking for a way to iterate through all objects in a PDF document, read the line thickness, manipulate it if needed and then store the file.

    Unfortunatly I got now clue how to start. I browsed through the Java API and created a first Class. So far I can

    - Open a PDF file
    - Iterate the pages
    - Iterate the Contents/ContentObject)

    Even though you have included an introduction to the PDF Object I am to dumb to understand. Could you please point me to into the right direction how I can navigate the objects and how to manipulate them? I really do not understand where to start and I am puzzled that there is a class "SetLineWidth" and not a method...

    Thanks in advance and best regards,
    Gorden

     
    • Hi Gorden,
      you blazed a nice trail in discovering PDF Clown's Contents collection [1].

      First of all, I wanna introduce you to some basic concepts that are crucial to correctly work with such a collection.
      Each page has its own Contents collection, which represents its content stream (i.e. all its contents) expressed as a sequence (ordered collection, i.e. list) of ContentObject-s [2]. Content objects instruct the consumer application to execute specific rendering actions on the page canvas (e.g. draw a path, show some text, set the line width etc.). Content objects may be simple operations (deriving from the Operation class [3], as in the case of SetLineWidth [5]) or composite objects (deriving from the CompositeObject class [4]). As composite objects contain other content objects, in order to fully scan your contents you need to recur through them.

      So, why there's a SetLineWidth class and no "method" to set the line width? Well, when working directly on the Contents collection, you have to assume that you are looking at a sequential, low-level description of the page, whose building blocks are content objects like SetLineWidth; when you find a SetLineWidth instance in a Contents collection, it means that it's changing the graphics state (in particular the line width) that will be used to render the following contents on the page.
      A higher-level approach (method-based, as you asked for) to content manipulation is provided by the composition functionality [6], but currently there's only a very partial support to path-related operations (just drawing rectangles...)... work in progress! ;-)
      Anyway, your task is well suited for direct access to the Contents collection, as I demonstrate with this code (it does just what you need: searching for SetLineWidth occurrences and modifying them accordingly -- I successfully tested it using eastman.pdf sample included in the downloadable release):

      import it.stefanochizzolini.clown.documents.Document;
      import it.stefanochizzolini.clown.documents.Page;
      import it.stefanochizzolini.clown.documents.Pages;
      import it.stefanochizzolini.clown.documents.contents.Contents;
      import it.stefanochizzolini.clown.documents.contents.objects.CompositeObject;
      import it.stefanochizzolini.clown.documents.contents.objects.ContentObject;
      import it.stefanochizzolini.clown.documents.contents.objects.Operation;
      import it.stefanochizzolini.clown.documents.contents.objects.SetLineWidth;
      import it.stefanochizzolini.clown.files.File;
      import it.stefanochizzolini.clown.objects.IPdfNumber;
      import it.stefanochizzolini.clown.tokens.FileFormatException;

      import java.util.List;

      ...
        public static void main(String[] args)
        {
          ... // (opening the PDF file) ...

          // Get the PDF document!
          Document document = file.getDocument();
          // Interating through the pages...
          for(Page page : document.getPages())
          {
            // Get the page contents!
            Contents contents = page.getContents();
            contents.add(0,new SetLineWidth(10)); // Forces the override of line width's initial value (1.0) [PDF:1.6:4.3] setting it at 10 user-space units.
            for(ContentObject obj : contents)
            {normalizeLineWidth(obj);}
            // Update the page contents!
            contents.flush();
          }

          ... // (serialization) ...
        }

        private void normalizeLineWidth(
          ContentObject content
          )
        {
          if(content instanceof SetLineWidth)
          {
            SetLineWidth setLineWidth = (SetLineWidth)content;
            // Force lines under 10 user-space units to be set to 10!
            if(setLineWidth.getValue() < 10)
            {
              /*
                NOTE: Current PDF Clown version (0.0.5) hasn't implemented yet a setter method
                to change the operation value (i.e. setValue(double)), so we temporarely use
                its lower-level access (1st item in the operand list).
              */
              ((IPdfNumber)setLineWidth.getOperands().get(0)).setNumberValue(10);
            }
          }
          else if(content instanceof CompositeObject)
          {
            List<? extends ContentObject> objects = ((CompositeObject)content).getObjects();
            for(ContentObject obj : objects)
            {normalizeLineWidth(obj);}
          }
        }

      I have to warn you that the pages resulting from the above code may not appear as expected: when the lines are part of raster graphics (images) shown on the page, there's no way to alter them as they are outside the syntactic domain of the PDF format (they "look as" lines, but they are actually just an arbitrary cluster of coloured pixels on a bitmap).

      Thank you for your request
      Stefano

      [1] http://clown.sourceforge.net/API/it/stefanochizzolini/clown/documents/contents/Contents.html
      [2] http://clown.sourceforge.net/API/it/stefanochizzolini/clown/documents/contents/objects/ContentObject.html
      [3] http://clown.sourceforge.net/API/it/stefanochizzolini/clown/documents/contents/objects/Operation.html
      [4] http://clown.sourceforge.net/API/it/stefanochizzolini/clown/documents/contents/objects/CompositeObject.html
      [5] http://clown.sourceforge.net/API/it/stefanochizzolini/clown/documents/contents/objects/SetLineWidth.html
      [6] http://clown.sourceforge.net/API/it/stefanochizzolini/clown/documents/contents/composition/package-summary.html

       
    • Gorden Wiegels
      Gorden Wiegels
      2008-02-05

      Stefano that is brilliant!

      Thank you very much! Just need to verify the output pdf on a printer now and voilá :)

      Best regards,
      Gorden