I am extracting images from pdf files that are made up of scanned images in jpeg format. I.e each pdf page contains only one image - the image of a document or book page.
I am using the ImageExtractionSample.java as a basis for my code and everything works fine! Except for the fact that the images is extracted out of order. E.g the image that is on page 1 when viewed through a pdf viewer comes out as image 113... Some pdf files does produce images in the right order, some does not. I guess the ones that are in the right order is due to luck :)
So, I either need to fins out the corresponding pdf page numer for each extracted image (by tracing the pdf structure back to the container with page info I guess). Or by sorting the indirect object list prior to the loop with extraction. As it stands now I have failed in both those attempts and need some guidance.
Thanks
Kristian
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
By using the ContentScanner approach from the ContentScanningSample and merging that with the image extraction method in ImageExtractionSample the problem was solved.
Suggestion, create a new sample program ImageExtractionSampleInOrder that does this. It guess it would help a lot of people!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi!
I am extracting images from pdf files that are made up of scanned images in jpeg format. I.e each pdf page contains only one image - the image of a document or book page.
I am using the ImageExtractionSample.java as a basis for my code and everything works fine! Except for the fact that the images is extracted out of order. E.g the image that is on page 1 when viewed through a pdf viewer comes out as image 113... Some pdf files does produce images in the right order, some does not. I guess the ones that are in the right order is due to luck :)
So, I either need to fins out the corresponding pdf page numer for each extracted image (by tracing the pdf structure back to the container with page info I guess). Or by sorting the indirect object list prior to the loop with extraction. As it stands now I have failed in both those attempts and need some guidance.
Thanks
Kristian
I solved this!
By using the ContentScanner approach from the ContentScanningSample and merging that with the image extraction method in ImageExtractionSample the problem was solved.
Suggestion, create a new sample program ImageExtractionSampleInOrder that does this. It guess it would help a lot of people!