There seems to be a problem with some OCR output in gscan2pdf in that
each piece of text in the OCR output from a scan is surrounded buy a
box, making it very difficult to edit etc. Some other users have
reported a similar problem but there doesn't appear to be an answer as
yet. To try to do some debugging, I scanned a text document to TIFF
using XSane, then executed tesseract from a terminal and checked the
output which was straight text, no boxes and mostly readable.
The output from running gscan2pdf in log mode is in the attached text
file with the config date appended.
Hope this info helps and hope a solution will be forthcoming.
On 27 August 2012 09:36, Ron Chambers <locksalordy@...> wrote:
> There seems to be a problem with some OCR output in gscan2pdf in that each
> piece of text in the OCR output from a scan is surrounded buy a box, making
> it very difficult to edit etc. Some other users have reported a similar
This isn't a problem - per se, but a feature. The idea is that some
OCR engines can output the text position, and therefore we can use
this to put the OCR output behind the image in the correct place such
that when you search for the text in the PDF viewer, it is
If you turn off the text position, then this feature is lost.
I grant you, though, that at the moment, the user interface for
editing the text in boxes could be substantially improved, and that
also the boxes could be made optional.
Get latest updates about Open Source Projects, Conferences and News.