runholen
2005-05-11
I have a question about gocr performance.
I'm trying to use gocr to automatically obtain invoice information.
I have been informed that the invoices will all be in Courier12, which
should closely resemble the ocr-b-standard.
However, when I tried gocr on a test-image, I did not get good results.
I wrote a test-image in Courier12 with the following text:
This is a test!!!! ??? 1234567890 1 2 3 4 5 6 7 8 9 0
55.60 57,10 kr abcdefghijklmnopqrstuvwxyz
gocr gave the following output:
Thls ls a testl I I I 111 1234567890 1 2 3 4 5 6 7 8 9 O
55.60 57,10 kr abcdefghl_klmnopqrstu_xyx
I am quite satisfied with the output for numbers, but when a total of
5 characters are misinterpretated, plus the wrong symbols, it will be
hard to do text-recognition.
Can I somehow configure gocr to more closely interpret Courier12? And how do I do this?
Regards,
Runar Holen