Removing/Resizing images in pdf

  • lehd

    lehd - 2009-03-04

    Hello All

    I feel that jPod library can help me to remove all images from pdf or resizing(resampling) them for the purpose of making pdf smaller in size.

    What is the best approach for this?
    I tried using walker object as in extract images example. I see some images but not all of them were walked while others i can find with using storage layer:
             Iterator it = getDoc().cosGetDoc().stGetDoc().objects();
             while (it.hasNext()) {
                COSIndirectObject o = (COSIndirectObject);
                if ( o.dereference().asStream() == null) {
    So, the first subtask is a correct traversal to see all images.
    And the second is how can i remove image completely? I used some dirty trick by setting stream.setEncodedBytes( new byte[] {}) which actually makes pdf smaller but when i open it the reader says that cannot open some images but works fine. So, the question is: how can i remove images in a better way? Any hints implement image resizing will also help :)

    Thank you.

    • mtraut

      mtraut - 2009-03-04

      First, for iterating the valid PDF content you should really use a walker object. If correctly implemented it should visit any USED object in the file. Shrinking by removing unsused objects should not be implemented this way... (use "garbageCollect" for example). By using "objects" on the storage layer you will most certainly get dangling objects of the PDF document.

      "removing" images schould not be done by simply making them invalid - the metadata and image data is no longer consistent. "Correctly" removing images would mean removing them from the content stream. "Shrinking" should change the image XObject by replacing all (meta) data correctly. Create for example an pdf image raster of desired size, containing all black or white, adapt metadata  and add a flate filter to compress. Or you could create an AWT image of desired size, draw on it and use an ImageConverterAwt2Pdf to get correct PDImage data to replace in the original.

      Remember, PDF content streams can contain embdedded images that are not available as objects. As this images are rarely large, this should not bother you.

      Image resizing should be no problem when you deal with AWT BufferedImages, just use the plain associated GraphicsContext or Java Advanced Imaging.

    • Palani

      Palani - 2009-03-16

      I am trying to get all images in a PDF document to insert them into another new PDF document. For that I am using CSCreator.inlineImage(PDImage). How to get PDImage object of an image in a PDPage?




Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks