How to use other languages to recognition?

Help
Anonymous
2011-02-14
2013-04-21
  • Anonymous

    Anonymous - 2011-02-14

    In predefined tesseract dictionaries I have Russian language, but I can not choose as preffered. What can I do? Also I have installed aspell-ru.

     
  • Sandro Mani

    Sandro Mani - 2011-02-14

    Hello,
    I cannot reproduce the problem here on Fedora 14 with tesseract 3.00. On what platform are you with what tesseract version? How did you install the russian language data?
    Best,
    Sandro

     
  • Anonymous

    Anonymous - 2011-02-14

    Oh, I'm sorry, I clicked on main page and got a .deb package, but package manager took tesseract-ocr 2.04 as dependence on my Ubuntu 10.04. I added an additional repository (https://launchpad.net/~alex-p/+archive/notesalexp), where possible to get a newer version tesseract and tesseract-ocr-rus.

    And now, how to combine to get ru-en recognition and aspell-ru-en dictionary at the same time?

     
  • Sandro Mani

    Sandro Mani - 2011-02-14

    It should actually work with both tesseract 2.04 and 3.00.
    What do you mean with "combine"? gImageReader automatically chooses the spellcheck dictionary corresponding to the recognition language if it is installed, otherwise it displays a message notifying about the missing spellcheck dictionary. What does not work as expected in your case?
    Best,
    Sandro

     
  • Sandro Mani

    Sandro Mani - 2011-02-15

    But in http://i52.tinypic.com/3g3k8.png the recognition language you chose was english (button left of the recognize button)? Btw what version of gImageReader are you using?

     
  • Anonymous

    Anonymous - 2011-02-15

    gImageReader version is 0.8.1-1 from the main page.
    http://i56.tinypic.com/r0w9vo.png there recognition language is russian and the phrase "Open System Interconnection - OSI" unreadable. draw attentions on the highlighted words in this and the previous screenshot. I would like to recognize text in different languages.
    Sorry for my English, if I'm impossible to understand.

     
  • Sandro Mani

    Sandro Mani - 2011-02-15

    Hello,
    now I understand your issue. That is something beyond my control: it depends on the language traineddata (i.e. the file included in the language pack for tesseract) whether tesseract is able to recognize mixed alphabets (i.e. Latin and Cyrillic characters). You may want to contact the maintainer for the russian language pack to ask him to address this issue.
    Best,
    Sandro

     
  • Anonymous

    Anonymous - 2011-02-16

    Thank you very much. I wish you success and progress of the project.

     
  • Anonymous

    Anonymous - 2011-04-25

    Does anyone know if there is a way to get this to read Japanese or other Asian languages?

     
  • Sandro Mani

    Sandro Mani - 2011-04-25

    What is the specific issue you are encountering? I.e. tesseract language files not found / font rendering issues / … ?

     

Log in to post a comment.