JavaOCR is optimised for perfromance and low memory footprint. Images are monochrome and pixel data stored in linear arrays (byte integer and float pixels are supported). Several image objects can be created over the same pixel data with different offsets and sizes ( image object provides lookup in pixel array utilising origin offset and scan length ) without copying them (chiseling). This feature allows easy creation if image processing pipelines.
Image objects provide iteration routines for image filtering, and filter object do actual work. Some of them do not modify images, some work in place and some write results in separate images. Basic filters include thresholding, lookup, range, histogram processing etc. See sources in: core/src/main/java/netsourceforge/javaocr/filter/
Prior to text recognition incoming image has to be converted to black and white binary image. There is a lot of possible filters, ranging from simplest (and thus fastest) threshold, adaptive threshold up to sophysticated sauvola filters
Sauvola binarisation is computational intensive (it is windowed filter), but it also provides kind of high-pass and low-pass filetring, which can be tuned by filter parameters for optimal binarisation results ( implementation is located in file: core/src/main/java/net/sourceforge/javaocr/filter/SauvolaBinarisationFilter.java )
This filter is tunable by weight ( determines weight of variance over the window component in computation ) and window size ( determines computation window, as we use integral images to computation overall processing time is independent of window size, I use 50x50 on android phones ):
// process image has borders of proper size (half window) processImage = new PixelImage(w + SAUVOLA_WINDOW, h + SAUVOLA_WINDOW); // return image is chiseled out of process image (less borders) returnImage = (PixelImage) processImage.chisel(SAUVOLA_WINDOW / 2, SAUVOLA_WINDOW / 2, w, h); sauvolaBinarisationFilter = new SauvolaBinarisationFilter(above, below, processImage, 256, SAUVOLA_WEIGHT, SAUVOLA_WINDOW);
This sample demonstrates sauvola filter preparation for image processing. Note that result image (returnImage) is cut out of filter destination image - for performance reasons results of window filters are not computed for image borders of respective size.
JavaOCR has no direct support to locate lines of text and individual characters (volunters are welcome) - android demos utilize aiming help to assist used in positioning camera properly. I case you are sure that text lines are aligned horisontally you may use integral images to determine position of text lines and further glyph extraction.
Invariant moments computed over image have properties useful for image recognition - most notably independency from translation, rotation and scaling of images. Carefully choosen set of moments (just a vestor of float values ) can be used to match against known characters using standart techniques for cluster analysis. Hu moments are most popular and well known set of moments to use for image recognition
Once you extracted image features (with invariant moments or by other means) you have just vector of fliat point numbers - it represents a point in some space and can be matched to specific character class. To perform matching you will need:
Clusters are established by learning process. First you have to collect samples for desired characters / fonts / resoultions. You shall have reasonable amount of samples (you decide what is reasonable here, I try to have about 200 per character). JavaOCR provides demo appliction for sampling with android phone ( demos/sampler/ subdirectory)
Once you have samples you can start training. demos/trainer contains sample java application which you can use as template. Training is simple process
after you have achieved acceptable configuration you can serialise cluster objects to JSON (or by other means) and use them in real world applications. There will be one or more clusters for everz trained character
Once you have feature extractors and configured cluster objects matching is easy.
Now you have list of clusters (correspondig to characters) ordered by probablity - take first one,
this is most probably your result.