Java OCR Wiki

Status: Alpha

Brought to you by: ko5tik, mrwwhitney, roncemer

image recognition using moments

Text recognition using invariant moments

Image formats and base filters

JavaOCR is optimised for perfromance and low memory footprint. Images are monochrome and pixel data stored in linear arrays (byte integer and float pixels are supported). Several image objects can be created over the same pixel data with different offsets and sizes ( image object provides lookup in pixel array utilising origin offset and scan length ) without copying them (chiseling). This feature allows easy creation if image processing pipelines.

Image objects provide iteration routines for image filtering, and filter object do actual work. Some of them do not modify images, some work in place and some write results in separate images. Basic filters include thresholding, lookup, range, histogram processing etc. See sources in: core/src/main/java/netsourceforge/javaocr/filter/

Binarisation

Prior to text recognition incoming image has to be converted to black and white binary image. There is a lot of possible filters, ranging from simplest (and thus fastest) threshold, adaptive threshold up to sophysticated sauvola filters

Sauvola binarisation is computational intensive (it is windowed filter), but it also provides kind of high-pass and low-pass filetring, which can be tuned by filter parameters for optimal binarisation results ( implementation is located in file: core/src/main/java/net/sourceforge/javaocr/filter/SauvolaBinarisationFilter.java )

This filter is tunable by weight ( determines weight of variance over the window component in computation ) and window size ( determines computation window, as we use integral images to computation overall processing time is independent of window size, I use 50x50 on android phones ):

    // process image has borders of proper size (half window)
    processImage = new PixelImage(w + SAUVOLA_WINDOW, h + SAUVOLA_WINDOW);
    // return image is chiseled out of process image (less borders)
    returnImage = (PixelImage) processImage.chisel(SAUVOLA_WINDOW / 2, SAUVOLA_WINDOW / 2, w, h);

    sauvolaBinarisationFilter = new SauvolaBinarisationFilter(above, below, processImage, 256, SAUVOLA_WEIGHT, SAUVOLA_WINDOW);

This sample demonstrates sauvola filter preparation for image processing. Note that result image (returnImage) is cut out of filter destination image - for performance reasons results of window filters are not computed for image borders of respective size.

Location of text and characters

JavaOCR has no direct support to locate lines of text and individual characters (volunters are welcome) - android demos utilize aiming help to assist used in positioning camera properly. I case you are sure that text lines are aligned horisontally you may use integral images to determine position of text lines and further glyph extraction.

Invariant moments

Invariant moments computed over image have properties useful for image recognition - most notably independency from translation, rotation and scaling of images. Carefully choosen set of moments (just a vestor of float values ) can be used to match against known characters using standart techniques for cluster analysis. Hu moments are most popular and well known set of moments to use for image recognition

Training

Once you extracted image features (with invariant moments or by other means) you have just vector of fliat point numbers - it represents a point in some space and can be matched to specific character class. To perform matching you will need:

defined clusters correspoding to individual characters ( there can be more than one cluster for given character, say for different fonts)
metric function to determine distance between point and cluster ( simplesexample would be euclidian metric, JavaOCR prvides mahalanobis distance which delievers better results )

Clusters are established by learning process. First you have to collect samples for desired characters / fonts / resoultions. You shall have reasonable amount of samples (you decide what is reasonable here, I try to have about 200 per character). JavaOCR provides demo appliction for sampling with android phone ( demos/sampler/ subdirectory)

Once you have samples you can start training. demos/trainer contains sample java application which you can use as template. Training is simple process

extract image features ( it shall be the same feature set you will use in recognition )
build initial clusters for character
use cluster analysis to build real clusters used in recognition ( with some arcane mahematical methods, sample is available )
perform cluster matching to determine quality of solution
adjust features and clustering algorithms, rinse, repeat until you get acceptable quality

after you have achieved acceptable configuration you can serialise cluster objects to JSON (or by other means) and use them in real world applications. There will be one or more clusters for everz trained character

Matching

Once you have feature extractors and configured cluster objects matching is easy.

obtain single character glyph ( binarise image, slice it up )
compute desired image moments
compute "distance" between verctor and all the clusters
order cluster by distance to feature vector, trow away poor matching clusters

Now you have list of clusters (correspondig to characters) ordered by probablity - take first one,
this is most probably your result.

Wiki: Home