Training Won't Accept 'Slashed' Zero

Help
JM56
2011-02-10
2013-04-25
  • JM56
    JM56
    2011-02-10

    The log file below is the result of training with an image containing "slashed" zeros (zero with a diagonal line in it to differentiate it from Upper-case O.)  If I edit out the diagonal, there are no errors in tesseract.log, but interpretation of zero and O are unreliable, even with a line in eng.unicharambigs.

    How can I get tesseract to accept the slashed zero?  So far I have converted the image to black text on white background and scaled up to approx. 300 dpi.

    ----------- tesseract.log -------------------------
    Found fonts:
    Tesseract Open Source OCR Engine with Leptonica
    APPLY_BOXES: boxfile 1/51/0 ((2295,326),(2323,370)): FAILURE! box overlaps no bl
    obs or blobs in multiple rows
    APPLY_BOXES: boxfile 3/51/0 ((2289,137),(2317,181)): FAILURE! box overlaps no bl
    obs or blobs in multiple rows
    APPLY_BOXES: More than one block??
    APPLY_BOXES: FATALITY - 0 labelled samples of "0 " - target is 2:
    APPLY_BOXES:
       Boxes read from boxfile:     226
       Initially labelled blobs:    224 in 4 rows
       Box failures detected:            2
       Duped blobs for rebalance:     0
       "0" has fewest samples:     0
                    Total unlabelled words:        0
                    Final labelled words:        224
    Generating training data
    TRAINING … Font name = IA
    Generated training data for 224 blobs

    See tif image at:

    http://www.flickr.com/photos/59351419@N05/5434403800/