Simplifying Tesseract call in Tesseract.pm
Brought to you by:
ra28145
In Tesseract.pm instead of writing to a temporary config file hocr.config, it is much easier to use the Tesseract -c command line option (this option can be used more than one time, if needed) to set the parameter directly:
In your present code lib/Gscan2pdf/Tesseract.pm line 236 you better use
$cmd = "tesseract $tif $path$name -l $options{language} -c tessedit_create_hocr=1"
Addition:
In my version I use
$cmd = "tesseract $tif $path$name -l $options{language} -c tessedit_create_hocr=1 -c tessedit_create_pdf=1";
to create hocr and pdf in one go.
Thank you for this. I didn't know this possibility existed.
Do you know in which version this option was introduced?
it is implemented in tesseract-3.03-rc1.tar.gz and it was part of tesserat 3.02.02.
see and cherry-pick https://github.com/Wikinaut/gscan2pdf/commit/7ec74003da8e805865c0cbf86c574ff2a32512ad
Ticket moved from /p/gscan2pdf/bugs/204/
I've just applied this, although as 3.02.01 is still in precise, I have left in the old method, and the new one is just used from 3.02.02.
Thanks for the patch