Feature request: pass custom options to tesseract

gerlos
2013-05-04
2013-06-20
  • gerlos
    gerlos
    2013-05-04

    Hello,
    Many thanks for your great work on gImageReader! You saved my life!

    I am using you software to digitize historical scientific data, that we have in print and now need in spreadsheets. gImageReader is helping me a lot, but it could be more helpful if we could manually specify (eg. in config windows) some additional options to pass to tesseract.

    Since I'm scanning tables of digits, I'd like to add the option "-psm 5", as well "outputbase digits".
    Is it possible to add such feature?

    So far I looked in the source and manually added them to the subprocess.Popen command in main.py, but it would be nice if we could do it withoud such hack.

    thanks
    gerlos

     
  • Sandro Mani
    Sandro Mani
    2013-06-20

    Hello and sorry for the very late answer, I must have missed the mail in my mailbox. So, since I've rewritten gImageReader to use the tesseract C++ API directly (you can find the current version in the git repo here on the sourceforge page), the program doesn't call tesseract via command line anymore. However, one alternative I see is to support a config file, such as described here [1]. Would that work?

    Best,
    Sandro

    [1] http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version