a curiosity: where does gimagereader get its language list? I thought that it gets it from the contents of
tessdata/*.traineddata, but in my installation of tesseract I have ara, heb and jap.traineddata which do not show in the list. any reason?
Unfortunately the traineddata files do not provide enough information to identify the languages (i.e. language and country codes). At the moment a predefined list of dictionaries is stored in the config.py file, but in the next version there will be a more confortable GUI front end for adding custom dictionaries. You can checkout the SVN, then write "make deb" in the trunk folder to generate deb packages. I am close to releasing the new version, but probably won't manage until Christmas = semester end.
Thanks, I located the file and succeeded. In fact tesseract support for the language I was interested into (heb) is very underdeveloped as of now, but at least this way I have an easy way to test it.
Log in to post a comment.