Re: [gscan2pdf-help] ocropus integration?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Wed, Feb 03, 2010 at 02:51:56PM +0100, Bernhard Reiter wrote:
> I'm away from my scanner right now, but i had an old (newspaper) scan
> sample png which i imported and ran unpaper and ocropus on.
> Interestingly, the error did not occur in this case. Pdf export,
> however, did not work (stuck at half of the progress bar); and text
> export, again, produced an empty file.

[...]

>  ocroscript recognize --tesslanguage=eng /tmp/BBR6OrlpE6/edzE5xhGfF.pnm > /tmp/BBR6OrlpE6/HDQHZ61OhG.txt
> Forked PID 3697
> ocroscript: /usr/share/ocropus/scripts/recognize.lua:113: CHECK ./ocr-utils/ocr-utils.cc:833 background_seems_white(a)
> Process 3697 exited.

Apologies for the late response. I wonder if the output from ocropus
is somehow confusing gscan2pdf which is therefore not writing the PDF
correctly.

Please reproduce the problem with a test image, and give it to me so
that I can check the output from ocropus. If you can't do that, please
at least post the output from the ocropus command above, adjusted, of
course for filenames.

I had to refactor the hocr parser to cope with cuneiform, so it is
possible that it will go away with the new release.

Regards

Jeff