From: Alexei Colin <alexei@al...> - 2013-03-10 20:56:27
How can I diagnose why the the text position of OCRed text is not
correctly determined/recorded in the saved PDF? (Using tesseracts if
Evince and acroread can search, but no highlights are visible, and text
selection tool is not active. Acroread in fact shows a box with a wacky
position and size when iterating through search results.
On 10 March 2013 21:38, Alexei Colin <alexei@...> wrote:
> How can I diagnose why the the text position of OCRed text is not
> correctly determined/recorded in the saved PDF? (Using tesseracts if
> that matters).
There are two possibilities - that tesseract has reported the text in
the wrong place, or that gscan2pdf has embedded it incorrectly.
Note that tesseract has only been reporting the position since v3 (I
think). If you are using an earlier version, then gscan2pdf embeds the
text in a miniature font in the top, left hand corner of the page.
will log the boxes that gscan2pdf reads from the tesseract output.
Get latest updates about Open Source Projects, Conferences and News.