I need to extract basic elements from pdf file, such as text, images and graphical paths.
I saw that it's possible to extract text and images.
In particular I need to know the right coordinates of all elements in the pdf document.
Is it possible?
The library contains a content interpreter framework that should make it quite easy to fullfill the task you mentioned. As an example you can look at CSTextExtractor.
Image extraction should work the same way - redefining doImage and accessing the current graphocs state transformation should be enough.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.