setting automatically the location of the languages

An interface to tesseract ocr

Brought to you by: zdpo777

#16 setting automatically the location of the languages

Milestone: Release0.3

Status: Accepted

Owner: nobody

Labels: Usability (9)

Priority: Medium

Component:

OpSys:

Type: Defect

Updated: 2009-02-10

Created: 2008-11-21

Creator: Anonymous

Private: No

Originally created by: chopinX04@gmail.com
Originally owned by: chopinX04@gmail.com

futures:
1. use the default language if exist
2. propose the download if no exist
3. must be a menu to chose the possbility of download

Discussion

Comment has been marked as spam.
Undo

View and moderate all "tickets Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Tickets"

Anonymous - 2009-02-10

Originally posted by: filip.do...@gmail.com

Moreover, the available tesseract languages should be autodetected. On startup,
Lector will check for required files and show all installed languages in the left
panel switch.

These files are stored in /usr/share/tesseract/tessdata/ directory and are 8 for each
language (???.DangAmbigs ???.inttemp    ???.pffmtable   ???.user-words ???.freq-dawg
???.normproto ???.unicharset ???.word-dawg), where the ??? is the lang code from
[1] . Also, there were requests for detection of digits 0-9 only.

I include a file extracted from [1], containing languages in the format
cze     Czech      Čeština
deu     German     Deutsch
and two additional files containing the code along with only original or english name.
____
[1]: http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes

*Originally posted by:* [filip.do...@gmail.com](http://code.google.com/u/108414879469468413899/) Moreover, the available tesseract languages should be autodetected. On startup, Lector will check for required files and show all installed languages in the left panel switch. These files are stored in /usr/share/tesseract/tessdata/ directory and are 8 for each language \(???.DangAmbigs  ???.inttemp    ???.pffmtable   ???.user-words ???.freq-dawg   ???.normproto  ???.unicharset  ???.word-dawg\), where the ??? is the lang code from \[1\] . Also, there were requests for detection of digits 0-9 only. I include a file extracted from \[1\], containing languages in the format   cze     Czech      Čeština   deu     German     Deutsch and two additional files containing the code along with only original or english name. \_\_\_\_ \[1\]: [http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes](http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes)

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

langs-list-eng

langs-list-full

langs-list-orig

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.