How can I get images coordinates?

Fran Zx
  • Fran Zx

    Fran Zx - 2008-06-05

    I have an existing pdf and I have to get the coordinates of the images.
    Is it possible to do?

    • Stefano Chizzolini


      Here's a sample that I successfully tested (NOTE: for the sake of simplicity, I considered only the most common case (external images), ignoring inline images -- however, it's really trivial to extend this sample in order to deal with such inline images, see [1]).



      // BEGIN ContentScanningSample
      package it.stefanochizzolini.clown.samples;

      import it.stefanochizzolini.clown.documents.Document;
      import it.stefanochizzolini.clown.documents.Page;
      import it.stefanochizzolini.clown.documents.contents.Contents;
      import it.stefanochizzolini.clown.documents.contents.ContentScanner;
      import it.stefanochizzolini.clown.documents.contents.XObjects;
      import it.stefanochizzolini.clown.documents.contents.objects.ContentObject;
      import it.stefanochizzolini.clown.documents.contents.objects.PaintXObject;
      import it.stefanochizzolini.clown.documents.contents.xObjects.ImageXObject;
      import it.stefanochizzolini.clown.documents.contents.xObjects.XObject;
      import it.stefanochizzolini.clown.objects.PdfDictionary;
      import it.stefanochizzolini.clown.objects.PdfName;
      import it.stefanochizzolini.clown.objects.PdfStream;
      import it.stefanochizzolini.clown.files.File;

      import java.awt.geom.Dimension2D;

        This sample demonstrates how to retrieve the precise position (page and coordinates) of each image
        within a PDF document.
        <p>This sample leverages the ContentScanner class.</p>
      public class ContentScanningSample
        implements ISample
        public void run(
          PDFClownSampleLoader loader
          // (boilerplate user choice -- ignore it)
          String filePath = loader.getPdfFileChoice("Please select a PDF file");

          // 1. Open the PDF file!
          File file;
          try{file = new File(filePath);}
          catch(Exception e){throw new RuntimeException(filePath + " file access error.",e);}

          // 2. Parsing the document...
          // Get the PDF document!
          Document document = file.getDocument();
          System.out.println("\nLooking for images...");
          // Interating through the pages...
          for(Page page : document.getPages())
            // Get the page contents!
            Contents contents = page.getContents();
            // Wrap the contents into the scanner!
            ContentScanner scanner = new ContentScanner(contents);
            // Get the external objects referenced by the page!
            XObjects xObjects = page.getResources().getXObjects();

            double pageHeight = page.getSize().getHeight(); // Page height is useful to translate native (bottom-up) vertical coordinates to common (top-down) ones.
            // Parsing the page...
              NOTE: Page contents are represented by a sequence of content objects,
              possibly nested into multiple levels.
            while(scanner.getCurrent() != null) // Keeps scanning till there's a content object available.
                NOTE: This inner loop is a temporary workaround to a non-conformant behavior
                of the current implementation (that is: moveInnerNext() method should
                go down just 1 level at a time into the object hierarchy,
                whilst its current implementation expands down to all the available levels at a time!).
              ContentScanner level = scanner;
                ContentObject object = level.getCurrent();
                // Is it an operation that shows an external object?
                  NOTE: Images can be represented on a page either as inline objects
                  or as external objects (XObject).
                  External objects are represented through PaintXObject operations.
                if(object instanceof PaintXObject)
                  // Get the reference key of the shown external object!
                  PdfName xObjectKey = (PdfName)((PaintXObject)object).getOperands().get(0);
                  // Get the external object associated to the reference key!
                  XObject xObject = xObjects.get(xObjectKey);
                  // Is the external object an image?
                  if(xObject instanceof ImageXObject)
                      "Image '" + xObjectKey + "' (" + xObject.getBaseObject() + ") " // Image key and indirect reference.
                        + "on page " + (page.getIndex() + 1) + " (" + page.getBaseObject() + ")" // Page index and indirect reference.
                    // Get the coordinates of the image!
                    double[] ctm = level.getState().getCTM(); // Current transformation matrix.
                    Dimension2D imageSize = xObject.getSize(); // Image native size.
                    System.out.println("  Coordinates:");
                    System.out.println("     x: " + Math.round(ctm[4]));
                    System.out.println("     y: " + Math.round(pageHeight - ctm[5]));
                    System.out.println("     width: " + Math.round(ctm[0]) + " (native: " + Math.round(imageSize.getWidth()) + ")");
                    System.out.println("     height: " + Math.round(Math.abs(ctm[3])) + " (native: " + Math.round(imageSize.getHeight()) + ")");
              } while((level = level.getChildLevel()) != null);
      // END ContentScanningSample


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks