Hello I am working with a project developing a program to analyze handwritten notes called Smart GS (Written in Java). We are looking at OpenIMAJ as a possible open source option for our search function because we have run into portability issues with the closed source search engine library we are using, but we are wondering if OpenIMAJ would be suitable for what we need. Our program is designed to match text patterns in the handwriting of ANY language, and therefore does not rely on OCR functionality, it simply identifies similarities in text. Many languages and handwriting styles do not have OCR engines capable of identifying them, so we simply match image patterns in order to maximize the text our program can be applied to.
I have found the earlier discussion about this library for use with OCR, but I was wondering if it would be possible to use the library to simply identify a text pattern and then match that pattern against other sections of text. That would be all we would need for our purposes. Thank you for your time in advance.
EDIT: I accidentally posted this topic earlier without having logged in. Please delete the earlier post, I apologize for double posting.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello I am working with a project developing a program to analyze handwritten notes called Smart GS (Written in Java). We are looking at OpenIMAJ as a possible open source option for our search function because we have run into portability issues with the closed source search engine library we are using, but we are wondering if OpenIMAJ would be suitable for what we need. Our program is designed to match text patterns in the handwriting of ANY language, and therefore does not rely on OCR functionality, it simply identifies similarities in text. Many languages and handwriting styles do not have OCR engines capable of identifying them, so we simply match image patterns in order to maximize the text our program can be applied to.
I have found the earlier discussion about this library for use with OCR, but I was wondering if it would be possible to use the library to simply identify a text pattern and then match that pattern against other sections of text. That would be all we would need for our purposes. Thank you for your time in advance.
EDIT: I accidentally posted this topic earlier without having logged in. Please delete the earlier post, I apologize for double posting.
(apologies for the slow reply - just got back from holiday)
I'd think that OpenIMAJ has the functionality you need; take a look at the https://sourceforge.net/p/openimaj/code/HEAD/tree/trunk/demos/SimpleOCR/src/main/java/org/openimaj/image/ocr/SimpleOCR.java demo that I wrote to extract the date and time embedded in camera frames (see http://glacsweb.org/glacsweb-meets-openimaj/ for details). This uses a simple technique called template matching to search for likely instances of a previously identified template pattern in an image.
For identifying potential areas of text, the https://sourceforge.net/p/openimaj/code/HEAD/tree/trunk/image/image-processing/src/main/java/org/openimaj/image/processing/edges/StrokeWidthTransform.java class might be useful (see http://research.microsoft.com/apps/pubs/default.aspx?id=149305 for details).
Thank you very much for your reply. I will respond again shortly if we have any questions for you.