On 18 March 2013 18:32, Marcos Barbosa <marcosestevesbarbosa@...> wrote:
> This is my first post. Is possible use TIFF format instead PNM in
> gscan2pdf? OCR works better in TIFF format.
You don't say what version of gscan2pdf, or what OCR engine you are using.
gscan2pdf used to store scans as PNM, as that is the format in which
SANE provides them, but recent versions convert images to PNG
internally, as PNG is non-lossy and offers good compression.
Different OCR engines have require different image formats, although
recent versions tend to be format-agnostic.
Tesseract used to require TIFF (with no alpha layer). Cuneiform used
to require BMP. The current versions of both engines will accept
anything that ImageMagick can understand.
Don't confuse the image format with the image type. Typically, OCR
engines require a 1-bit image, so if you pass them a grey-scale or
colour image, they internally threshold it. Often, you will get better
results if you threshold the image yourself first.
Please post a test image that demonstrates your concern so that I can
address it more specifically.