Re: [Ankur-core] Bangla OCR progress

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Dear Salahuddin,
>
>
>  I was working with OCR for my university. I took most of the idea
> from bocra.sourceforge.net
>
> It is written using graphicsmagick library & C++.  Any suggestion from
> you about matching alphabet.

You now need a recogniser. You could use a neural network library or
an adaptive classifier. Tesseract-OCR, the one I am trying to adapt,
used a neural net named aspirine/migraine previously and then switched
to a nearest-neighbour based adaptive classifier engine. This switch
was made due to licensing issues with aspirine i believe.
The challenge ofcourse is not to build a recogniser, since you can use
one of the available ones. The challenge is to gather sufficient
training data, or better yet, create a tool that automatically
generates training data  (given a font name and size) for this OCR
system using image rendering in a matter of seconds.
I have been  trying to do it but my initial approach was wrong.
However I believe I now know the correct approach.
Kindly go through http://hacking-tesseract.blogspot.com/.
>


-- 
Be Intelligent, Use GNU/Linux

http://debayanin.googlepages.com/
http://debayan.wordpress.com
http://lug.nitdgp.ac.in