The log file below is the result of training with an image containing "slashed" zeros (zero with a diagonal line in it to differentiate it from Upper-case O.) If I edit out the diagonal, there are no errors in tesseract.log, but interpretation of zero and O are unreliable, even with a line in eng.unicharambigs.
How can I get tesseract to accept the slashed zero? So far I have converted the image to black text on white background and scaled up to approx. 300 dpi.
----------- tesseract.log -------------------------
Found fonts:
Tesseract Open Source OCR Engine with Leptonica
APPLY_BOXES: boxfile 1/51/0 ((2295,326),(2323,370)): FAILURE! box overlaps no bl
obs or blobs in multiple rows
APPLY_BOXES: boxfile 3/51/0 ((2289,137),(2317,181)): FAILURE! box overlaps no bl
obs or blobs in multiple rows
APPLY_BOXES: More than one block??
APPLY_BOXES: FATALITY - 0 labelled samples of "0 " - target is 2:
APPLY_BOXES:
Boxes read from boxfile: 226
Initially labelled blobs: 224 in 4 rows
Box failures detected: 2
Duped blobs for rebalance: 0
"0" has fewest samples: 0
Total unlabelled words: 0
Final labelled words: 224
Generating training data
TRAINING … Font name = IA
Generated training data for 224 blobs
The log file below is the result of training with an image containing "slashed" zeros (zero with a diagonal line in it to differentiate it from Upper-case O.) If I edit out the diagonal, there are no errors in tesseract.log, but interpretation of zero and O are unreliable, even with a line in eng.unicharambigs.
How can I get tesseract to accept the slashed zero? So far I have converted the image to black text on white background and scaled up to approx. 300 dpi.
----------- tesseract.log -------------------------
Found fonts:
Tesseract Open Source OCR Engine with Leptonica
APPLY_BOXES: boxfile 1/51/0 ((2295,326),(2323,370)): FAILURE! box overlaps no bl
obs or blobs in multiple rows
APPLY_BOXES: boxfile 3/51/0 ((2289,137),(2317,181)): FAILURE! box overlaps no bl
obs or blobs in multiple rows
APPLY_BOXES: More than one block??
APPLY_BOXES: FATALITY - 0 labelled samples of "0 " - target is 2:
APPLY_BOXES:
Boxes read from boxfile: 226
Initially labelled blobs: 224 in 4 rows
Box failures detected: 2
Duped blobs for rebalance: 0
"0" has fewest samples: 0
Total unlabelled words: 0
Final labelled words: 224
Generating training data
TRAINING … Font name = IA
Generated training data for 224 blobs
See tif image at:
http://www.flickr.com/photos/59351419@N05/5434403800/