Extract images from pdf

  • Jos Bergmans

    Jos Bergmans - 2008-11-24

    For a given pdf, I need to extract all images in the pdf as jpeg images. Would jpod be the right choice for such an exercise and if so, could someone please post some sample code showing how to do this?

    • mtraut

      mtraut - 2008-11-24


      jPod gives access to PDF internals and as such allows access to the PDImage object. The PDImage can manifest itself in many ways: PDF specific pixel encoding, plain JPEG, JBIG, CCITT or a variety of other formats. This is decorated with a number of possible filters and a bunch of colorspaces...

      All this objects are accessible, but there is no transformation to a common platform implementation! The good news is, that "jPod Renderer" a project built upon "jPod" and also available here on SourceForge (GPL) can do exactly this! You can render PDF content on a SWT or AWT image, which you can use to save in any format supported for example by image IO.

      Along with jPod Renderer there is actually no example that does specifically extract images, but for example "RenderDoc" renders a whole document to a image.  Along with some digging in "CSPLatformDevice" and "CWTPlatformImage" you should come to a more specific solution if needed.

      I hope i can add some examples regarding images soon (perhaps donated by you??)


Log in to post a comment.