Hi there,
I have a directory with a TIFF book's page and its corresponding .box. I've been trying to train a model with the "Train with Existing Box" option, but it always yields the following output:
** Shape Clustering **
[/usr/local/bin/shapeclustering, -F, chp.font_properties, -U, unicharset, chp.cu.exp0.tr]
Reading chp.cu.exp0.tr ...
Bad properties for index 3, char P: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char R: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 5, char E: 0,255 0,255 0,0 0,0 0,0
...
Bad properties for index 55, char V: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 56, char ”: 0,255 0,255 0,0 0,0 0,0
Building master shape table
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
**...**
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
Stopped with 0 merged, min dist 0.040323
Master shape_table:Number of shapes = 54 max unichars = 1 number with multiple unichars = 0
** MF Training **
[/usr/local/bin/mftraining, -F, chp.font_properties, -U, unicharset, -O, chp.unicharset, chp.cu.exp0.tr]
Read shape table shapetable of 54 shapes
Reading chp.cu.exp0.tr ...
Bad properties for index 3, char P: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char R: 0,255 0,255 0,0 0,0 0,0
**...**
Bad properties for index 6, char F: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 56, char ”: 0,255 0,255 0,0 0,0 0,0
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
Done!
Checking the unicahrset file, I've noticed that it seems strange:
57
NULL 0 NULL 0
Joined 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # Joined [4a 6f 69 6e 65 64 ]
|Broken|0|1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # Broken
P 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # P [50 ]
R 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # R [52 ]
E 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # E [45 ]
F 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # F [46 ]
A 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # A [41 ]
C 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # C [43 ]
T 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # T [54 ]
O 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # O [4f ]
2 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # 2 [32 ]
N 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # N [4e ]
...
However, when I use the option "validate", the model seems to be working, because it succesfully OCR any image. So, I'm wondering: is this actually something to be concerned with?
One thing that I've also noticed is that the same thing occurs when I try to train a model with the images in samples/vie.
Thanks in advance!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
From my personal experience, the latest Tesseract training executable has produced more warnings, some of which can be safely ignored. If the generated traineddata file is working for you, you need not be concerned.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi there,
I have a directory with a TIFF book's page and its corresponding .box. I've been trying to train a model with the "Train with Existing Box" option, but it always yields the following output:
Checking the unicahrset file, I've noticed that it seems strange:
However, when I use the option "validate", the model seems to be working, because it succesfully OCR any image. So, I'm wondering: is this actually something to be concerned with?
One thing that I've also noticed is that the same thing occurs when I try to train a model with the images in samples/vie.
Thanks in advance!
From my personal experience, the latest Tesseract training executable has produced more warnings, some of which can be safely ignored. If the generated traineddata file is working for you, you need not be concerned.
Thanks a lot again (you had already helped me out on github). You're awesome!