Many people found shapeclustering helped improve the quality of generated traineddata files; therefore, it was decided that the step is always included in the training process in jTessBoxEditor, even for non-Indic languages. To not run it would require a code change to the program.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
shapeclustering should not be used except for the Indic languages.
shapeclustering -F font_properties -U unicharset lang.fontname.exp0.tr lang.fontname.exp1.tr ...
shapeclustering creates a master shape table by shape clustering and writes it to a file named shapetable.
Source: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract
Is there an option which can be used to not NOT run the shapeclustering step when running training?
Many people found shapeclustering helped improve the quality of generated
traineddata
files; therefore, it was decided that the step is always included in the training process in jTessBoxEditor, even for non-Indic languages. To not run it would require a code change to the program.ok. Thanks.