Emilio - 2008-06-26

Hi there

I'm trying to use tesseract to read identity documents. These documents have a standard area called ICAO, in which the characters are equally spaced and clearly printed.
I'm using 300 dpi images.
These ICAO areas tippically consist on two or three lines and around 30-60 columns. The characters are only capital letters, numbers and the "<" symbol.

Here you can get a sample:

http://content.answers.com/main/content/wp/en-commons/thumb/f/f9/250px-MustermannPA.jpg

When I run tesseract on an image of three lines and 30 columns (90 characters) the result gives me around 95% matchs.

What I miss is the possibility to tell the program options like: "ok, only capital letters and numbers, equally spaced characters, no empty spaces at all".

I've checked the man in the linux command line but I only get the instructions to run the program, with no options at all.

Does tesseract have any kind of extra options?

I guess even something like a verbose mode could be usefull to me.

If not, could you give me any suggestion in order to gain advantage from the special characteristics of the ICAO areas (equal spaced characters, only capital letters + numbers + "<"...)?

Tnaks in advance

     cacamara