In predefined tesseract dictionaries I have Russian language, but I can not choose as preffered. What can I do? Also I have installed aspell-ru.
I cannot reproduce the problem here on Fedora 14 with tesseract 3.00. On what platform are you with what tesseract version? How did you install the russian language data?
Oh, I'm sorry, I clicked on main page and got a .deb package, but package manager took tesseract-ocr 2.04 as dependence on my Ubuntu 10.04. I added an additional repository (https://launchpad.net/~alex-p/+archive/notesalexp), where possible to get a newer version tesseract and tesseract-ocr-rus.
And now, how to combine to get ru-en recognition and aspell-ru-en dictionary at the same time?
It should actually work with both tesseract 2.04 and 3.00.
What do you mean with "combine"? gImageReader automatically chooses the spellcheck dictionary corresponding to the recognition language if it is installed, otherwise it displays a message notifying about the missing spellcheck dictionary. What does not work as expected in your case?
It tries to replace the English text suitable Russian letters and vice versa.
But in http://i52.tinypic.com/3g3k8.png the recognition language you chose was english (button left of the recognize button)? Btw what version of gImageReader are you using?
gImageReader version is 0.8.1-1 from the main page.
http://i56.tinypic.com/r0w9vo.png there recognition language is russian and the phrase "Open System Interconnection - OSI" unreadable. draw attentions on the highlighted words in this and the previous screenshot. I would like to recognize text in different languages.
Sorry for my English, if I'm impossible to understand.
now I understand your issue. That is something beyond my control: it depends on the language traineddata (i.e. the file included in the language pack for tesseract) whether tesseract is able to recognize mixed alphabets (i.e. Latin and Cyrillic characters). You may want to contact the maintainer for the russian language pack to ask him to address this issue.
Thank you very much. I wish you success and progress of the project.
Does anyone know if there is a way to get this to read Japanese or other Asian languages?
What is the specific issue you are encountering? I.e. tesseract language files not found / font rendering issues / … ?
Log in to post a comment.