|
From: Debayan B. <deb...@gm...> - 2009-04-19 13:17:23
|
Dear Salahuddin, > > > I was working with OCR for my university. I took most of the idea > from bocra.sourceforge.net > > It is written using graphicsmagick library & C++. Any suggestion from > you about matching alphabet. You now need a recogniser. You could use a neural network library or an adaptive classifier. Tesseract-OCR, the one I am trying to adapt, used a neural net named aspirine/migraine previously and then switched to a nearest-neighbour based adaptive classifier engine. This switch was made due to licensing issues with aspirine i believe. The challenge ofcourse is not to build a recogniser, since you can use one of the available ones. The challenge is to gather sufficient training data, or better yet, create a tool that automatically generates training data (given a font name and size) for this OCR system using image rendering in a matter of seconds. I have been trying to do it but my initial approach was wrong. However I believe I now know the correct approach. Kindly go through http://hacking-tesseract.blogspot.com/. > -- Be Intelligent, Use GNU/Linux http://debayanin.googlepages.com/ http://debayan.wordpress.com http://lug.nitdgp.ac.in |