Can't do OCR of loaded image with cuneiform
Brought to you by:
ra28145
In 1.1.2, I loaded a PDF and then tried to do Tools | OCR on it with cuneiform. It failed with this error in the window from which I launched gscan2pdf:
Cuneiform for Linux 1.1.0
PUMA_XFinalrecognition failed.
Both gocr and tessaract worked with cuneiform failed.
Cuneiform did work when I scanned directly from the scanner instead of loading a PDF.
It may be relevant that the images in the PDF were monochrome, i.e., black and white rather than color or greyscale.
Are you sure you are using 1.1.2? I fixed this in 1.1.1.
If so, can you post an example image that cuneiform doesn't process, please?
Yes, I am using 1.1.2.
The document I have that exhibits this problem has some text in it that I do not want to make public, and I don't have time to produce a redacted version, so if you send me your email address (jik@kamens.us) I will email it to you privately, assuming you will keep it private and delete it when you are done with it.
Last edit: Jonathan Kamens 2013-02-14
gscan2pdf/cuneiform processed your sample PDF perfectly - i.e. I can't reproduce your problem.
Which distro and architecture are you using?
Fedora 18, x86_64, cuneiform version 1.1.0.
I wonder whether this is down to different versions of ImageMagick converting the image in different ways - or alternatively the Fedora version of cuneiform not linking to ImageMagick properly.
Which version of ImageMagick do you have?
What do you get if you import the PDF you sent me into gscan2pdf, save one of the pages as a PNG, and then
identify <image.png></image.png>
from the command line?
$ rpm -q ImageMagick
ImageMagick-6.7.7.5-3.fc18.x86_64
$ identify ~/Desktop/foo.png
/home/jik/Desktop/foo.png PNG 2544x3299 2544x3299+0+0 8-bit PseudoClass 2c 64.2KB 0.000u 0:00.000
$
The image imports from PDF as 8-bit - i.e. greyscale or colour.
What happens if you use Tools/Threshold after importing. Can you then get cuneiform to process the image?
OCR with Cuneiform is successful after I do Tools/Threshold (but BTW there's another bug there -- after I do Tools/Threshold and click Apply, the Tools/Threshold window stays up; shouldn't it go away after the work is done?).
As far as Cuneiform is concerned, do you consider the bug closed?
For the Threshold dialog, I see your point. The OCR dialog is hidden when you start the process. However the Scan dialog is not...
I don't think the cuneiform bug is closed because I shouldn't have to run Tools / Threshold to get cuneiform to work, and in any case there's no way for anyone who hasn't read this bug to know that will fix the issue.
But its not gscan2pdf's fault that your cuneiform build can't deal with 8-bit depth images. It works fine here.
I'm using cuneiform 1.1.0, too, so I wonder whether Fedora 18 is building it against imagemagick properly.
Fair enough. If you tell me what distribution and architecture you're using on which it works, I'll file a bug for it with the Fedora folks and then you can consider the issue closed in gscan2pdf.
gentoo amd64