which languages?

  • Enrico SEGRE

    Enrico SEGRE - 2010-12-16

    a curiosity: where does gimagereader get its language list? I thought that it gets it from the contents of
    tessdata/*.traineddata, but in my installation of tesseract  I have ara, heb and jap.traineddata which do not show in the list. any reason?

  • Sandro Mani

    Sandro Mani - 2010-12-16

    Unfortunately the traineddata files do not provide enough information to identify the languages (i.e. language and country codes). At the moment a predefined list of dictionaries is stored in the config.py file, but in the next version there will be a more confortable GUI front end for adding custom dictionaries. You can checkout the SVN, then write "make deb" in the trunk folder to generate deb packages. I am close to releasing the new version, but probably won't manage until Christmas = semester end.

  • Enrico SEGRE

    Enrico SEGRE - 2010-12-18

    Thanks, I located the file and succeeded. In fact tesseract support for the language I was interested into (heb) is very underdeveloped as of now, but at least this way I have an easy way to test it.


Log in to post a comment.