I am extracting images from pdf files that are made up of scanned images in jpeg format. I.e each pdf page contains only one image - the image of a document or book page.
I am using the ImageExtractionSample.java as a basis for my code and everything works fine! Except for the fact that the images is extracted out of order. E.g the image that is on page 1 when viewed through a pdf viewer comes out as image 113... Some pdf files does produce images in the right order, some does not. I guess the ones that are in the right order is due to luck :)
So, I either need to fins out the corresponding pdf page numer for each extracted image (by tracing the pdf structure back to the container with page info I guess). Or by sorting the indirect object list prior to the loop with extraction. As it stands now I have failed in both those attempts and need some guidance.
I solved this!
By using the ContentScanner approach from the ContentScanningSample and merging that with the image extraction method in ImageExtractionSample the problem was solved.
Suggestion, create a new sample program ImageExtractionSampleInOrder that does this. It guess it would help a lot of people!