Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.
Hi there, maybe my problem is the same as evkons, I'm not sure.
Though in Configuration/Languages I can see that a lot of laguages are installed, I can select only English and German. Now I have a Czech document. I even installed Czech language support in system settings, but still I can select only English and German. Btw, two more languages are installed too according to system preferences, but I can not select them either.
With wrong language setting, the Czech document recognizes completely wrong.
Are you sure you have installed the actual tesseract language files, not just the spellchecking dictionaries? If you have installed the tesseract language files, are they located in the correct path, as is set in the gimagereader settings? (usually (/usr/share/tesseract/tessdata or similar on linux).
I have the same problem as w-sky. I've installed tesseract, gImageReader and the dutch language files from the Ubuntu 12.04 LTS repos.
/usr/share/tesseract-ocr/tessdata does contain a file named nld.traineddata.
The path in the config window is set to:
However, switching the language to "Netherlands -> nl_NL" results in a missing dictionary error.
Same problem with the french files. The only languages available are english and german
What did I miss?
Is the error you are talking about the one displayed in the notification bar in the bottom part of the application? If so, what you are missing is the spellcheck dictionary. Search for hunspell-nl or similar in the package manager and install the package. Otherwise, you need to give me more information on what error is displayed.
That did work. Thank you!
Do you need the tesseract-ocr-XXX _and hunspell-XXX dictionary to get gImageReader going?
Where is the difference?
Tesseract itself works without the hunspell files, e.g.:
tesseract xyz.tiff abc -l nld
The spelling dictionary is used in the output pane to highlight spelling errors in the recognized text.