Extracting images sorted per page

Help
2013-04-27
2013-04-29
  • Kristian Jörg
    Kristian Jörg
    2013-04-27

    Hi!

    I am extracting images from pdf files that are made up of scanned images in jpeg format. I.e each pdf page contains only one image - the image of a document or book page.

    I am using the ImageExtractionSample.java as a basis for my code and everything works fine! Except for the fact that the images is extracted out of order. E.g the image that is on page 1 when viewed through a pdf viewer comes out as image 113... Some pdf files does produce images in the right order, some does not. I guess the ones that are in the right order is due to luck :)

    So, I either need to fins out the corresponding pdf page numer for each extracted image (by tracing the pdf structure back to the container with page info I guess). Or by sorting the indirect object list prior to the loop with extraction. As it stands now I have failed in both those attempts and need some guidance.

    Thanks
    Kristian

     
  • Kristian Jörg
    Kristian Jörg
    2013-04-29

    I solved this!

    By using the ContentScanner approach from the ContentScanningSample and merging that with the image extraction method in ImageExtractionSample the problem was solved.

    Suggestion, create a new sample program ImageExtractionSampleInOrder that does this. It guess it would help a lot of people!