Is there a method through which GOCR can be trained for Indian Languages.
Altough I'm new to gocr I can give you one hint. In the directory '/gocr-0.39/bin' you will find a script called ''create_db', this script will create the glyph-database with the character-templates. It's a simple bash-script. It creates the glyphs from tex-files, if your LateX-version supports Indian glyphs, simply try to customise this script. Then it could work with Indian Languages...
But the main fact is, gocr does not work in any way with a trained db (like knn-Algorithms do in other programs), it workes with the template-method.
I hope, even as 'noob' , i was able to help a little bit.. ;-)
Unfortunately as mentioned by Joerg the code for the db is bad and because of the j/gocr concept I don't think that even with a correct created db in some other language it is not working. In my case cyrillic where some letters look like latin ones is compleetely mising up the whole stuff.
I tried to 'sed' the output but results were very poor :-( because of the latin encoding, that is only supported. I ended up with a text containing latin and cyrillic encoded chars.
I think this information will help you save time.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.