Menu

#15 Simplifying Tesseract call in Tesseract.pm

Unstable_(example)
closed
nobody
None
5
2017-03-24
2015-08-31
Wikinaut
No

In Tesseract.pm instead of writing to a temporary config file hocr.config, it is much easier to use the Tesseract -c command line option (this option can be used more than one time, if needed) to set the parameter directly:

In your present code lib/Gscan2pdf/Tesseract.pm line 236 you better use

$cmd = "tesseract $tif $path$name -l $options{language} -c tessedit_create_hocr=1"

Addition:

In my version I use

$cmd = "tesseract $tif $path$name -l $options{language} -c tessedit_create_hocr=1 -c tessedit_create_pdf=1";

to create hocr and pdf in one go.

Discussion

  • Jeffrey Ratcliffe

    Thank you for this. I didn't know this possibility existed.

    Do you know in which version this option was introduced?

     
  • Wikinaut

    Wikinaut - 2015-09-15

    it is implemented in tesseract-3.03-rc1.tar.gz and it was part of tesserat 3.02.02.

     
  • Jeffrey Ratcliffe

    Ticket moved from /p/gscan2pdf/bugs/204/

     
  • Jeffrey Ratcliffe

    I've just applied this, although as 3.02.01 is still in precise, I have left in the old method, and the new one is just used from 3.02.02.

    Thanks for the patch

     
  • Jeffrey Ratcliffe

    • status: open --> accepted
    • Group: v1.0_(example) --> Unstable_(example)
     
  • Jeffrey Ratcliffe

    • status: accepted --> closed
     

Log in to post a comment.