The given PDF contains text (which Adobe Acrobat calls "hidden text"). When using CSTextExtract, this hidden text, instead of the visible text is returned. I'd like to get both, hidden and visible text
Hmmm, i still can't get it (but i didn't compare character character). What text do you see that is not extracted?
On a first inspection, there is no Tr 3 (invisible) text in the document. The "ghost" text may stem from text that is moved out of the visible area or is behind some other object.
sorry. The Acrobat Standard tricked me out on this. I compared the xtracted text and found out, that both, the text from the previous page and the one from the current page is in the PDF for this single page.
So this isn't a jPod problem at all. I guess the problem is caused by iText which is used to split the documnent into single pages. Depending on the way, a PDF is build, it might put invisible or cropped text and graphics into the file.
However, thanks for the fast feedback,
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.