From: Omega S. <ome...@gm...> - 2007-06-22 19:13:08
|
Hi everyone! I have finished the first phase in segmenting the Sahana OCR form and I have started work on character identification. Sahana OCR form should be a web generated form and the final format is not yet finalized. Therefore, I'm using the draft OCR form (available at http://www.cse.mrt.ac.lk/~omega/draft.pdf) I developed as the input. I can distinguish required fields needed for character and data filed recognition in the draft form. A sample output is available at http://www.cse.mrt.ac.lk/~omega/sahana_ocr/output.jpg The source code of my developments is available at http://www.cse.mrt.ac.lk/~omega/sahana_ocr/sahana_ocr_source.tar.gz. To compile and run the source, you need OpenCV installed in your PC, available at http://sourceforge.net/projects/opencvlibrary (For more info, http://opencvlibrary.sourceforge.net). How to compile and run the source is specified in a separate file, also included with the source. The core concept behind the segmentation is to recognize squares/rectangles in any given image. (Almost all the OCR forms being used today, provide square shaped areas for the user to enter data, because it minimizes the errors) Therefore, despite the fact that I'm using the draft OCR form at the moment, the same code can be used in segmenting the final format of the Sahana OCR form with very little modification, since it should and will be based on the square shaped input areas. The code I have written just uses thresholding, contour tracking and Hough transform together with some mathematical testing to identify the square/rectangle shaped contours. The code can be improved to do more thorough processing to identify squares if needed. One important thing I realized in doing the development is that we should have some kind of standard in developing the OCR form. Since the each Sahana web interface should have the corresponding OCR form, a standard is very much needed to improve the accuracy. Moreover, a standard will allow users other than Sahana to come up with their own OCR form adhering to the standard and use the OCR module to process their forms as well. I'm doing some homework on this and will start a discussion on this topic in the very near future. I would really appreciate any comments or suggestions on my developments so that I can take corrective actions and improve the functionality of my work -- Regards, Omega Silva |