setting automatically the location of the languages
An interface to tesseract ocr
Brought to you by:
zdpo777
Originally created by: chopinX04@gmail.com
Originally owned by: chopinX04@gmail.com
futures:
1. use the default language if exist
2. propose the download if no exist
3. must be a menu to chose the possbility of download
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: filip.do...@gmail.com
Moreover, the available tesseract languages should be autodetected. On startup,
Lector will check for required files and show all installed languages in the left
panel switch.
These files are stored in /usr/share/tesseract/tessdata/ directory and are 8 for each
language (???.DangAmbigs ???.inttemp ???.pffmtable ???.user-words ???.freq-dawg
???.normproto ???.unicharset ???.word-dawg), where the ??? is the lang code from
[1] . Also, there were requests for detection of digits 0-9 only.
I include a file extracted from [1], containing languages in the format
cze Czech Čeština
deu German Deutsch
and two additional files containing the code along with only original or english name.
____
[1]: http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes