Anonymous - 2009-02-10

Originally posted by: filip.do...@gmail.com

Moreover, the available tesseract languages should be autodetected. On startup,
Lector will check for required files and show all installed languages in the left
panel switch.

These files are stored in /usr/share/tesseract/tessdata/ directory and are 8 for each
language (???.DangAmbigs  ???.inttemp    ???.pffmtable   ???.user-words ???.freq-dawg
  ???.normproto  ???.unicharset  ???.word-dawg), where the ??? is the lang code from
[1] . Also, there were requests for detection of digits 0-9 only.

I include a file extracted from [1], containing languages in the format
  cze     Czech      Čeština
  deu     German     Deutsch
and two additional files containing the code along with only original or english name.
____
[1]: http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes