Menu

jTessBoxEditor - Shape Clustering and MF Training: Bad Properties

Help
2017-03-29
2017-03-29
  • Pedro Correia

    Pedro Correia - 2017-03-29

    Hi there,
    I have a directory with a TIFF book's page and its corresponding .box. I've been trying to train a model with the "Train with Existing Box" option, but it always yields the following output:

    ** Shape Clustering **
    [/usr/local/bin/shapeclustering, -F, chp.font_properties, -U, unicharset, chp.cu.exp0.tr]
    Reading chp.cu.exp0.tr ...
    Bad properties for index 3, char P: 0,255 0,255 0,0 0,0 0,0
    Bad properties for index 4, char R: 0,255 0,255 0,0 0,0 0,0
    Bad properties for index 5, char E: 0,255 0,255 0,0 0,0 0,0
    ...
    Bad properties for index 55, char V: 0,255 0,255 0,0 0,0 0,0
    Bad properties for index 56, char ”: 0,255 0,255 0,0 0,0 0,0
    Building master shape table
    
    Computing shape distances...
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances...
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances...
    **...**
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances...
    Stopped with 0 merged, min dist 999.000000
    Computing shape distances... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
    Stopped with 0 merged, min dist 0.040323
    Master shape_table:Number of shapes = 54 max unichars = 1 number with multiple unichars = 0
    
    ** MF Training **
    [/usr/local/bin/mftraining, -F, chp.font_properties, -U, unicharset, -O, chp.unicharset, chp.cu.exp0.tr]
    Read shape table shapetable of 54 shapes
    Reading chp.cu.exp0.tr ...
    Bad properties for index 3, char P: 0,255 0,255 0,0 0,0 0,0
    Bad properties for index 4, char R: 0,255 0,255 0,0 0,0 0,0
    **...**
    Bad properties for index 6, char F: 0,255 0,255 0,0 0,0 0,0
    Bad properties for index 56, char ”: 0,255 0,255 0,0 0,0 0,0
    Warning: no protos/configs for Joined in CreateIntTemplates()
    Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
    Done!
    

    Checking the unicahrset file, I've noticed that it seems strange:

    57
    NULL 0 NULL 0
    Joined 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # Joined [4a 6f 69 6e 65 64 ]
    |Broken|0|1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0    # Broken
    P 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # P [50 ]
    R 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # R [52 ]
    E 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # E [45 ]
    F 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # F [46 ]
    A 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # A [41 ]
    C 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # C [43 ]
    T 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # T [54 ]
    O 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # O [4f ]
    2 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # 2 [32 ]
    N 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # N [4e ]
    ...
    

    However, when I use the option "validate", the model seems to be working, because it succesfully OCR any image. So, I'm wondering: is this actually something to be concerned with?

    One thing that I've also noticed is that the same thing occurs when I try to train a model with the images in samples/vie.

    Thanks in advance!

     
  • Quan Nguyen

    Quan Nguyen - 2017-03-30

    From my personal experience, the latest Tesseract training executable has produced more warnings, some of which can be safely ignored. If the generated traineddata file is working for you, you need not be concerned.

     
  • Pedro Correia

    Pedro Correia - 2017-04-03

    Thanks a lot again (you had already helped me out on github). You're awesome!

     

Log in to post a comment.