Generally, all the standard .traineddata files include italic font style. If need be, you can train Tesseract and add the generated .traineddata to VietOCR's tessdata folder.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
There is an image that use font Unicode and was italicized so I cannot convert it to text. Is there anyway that I can add new font to VietORC?
Generally, all the standard
.traineddata
files include italic font style. If need be, you can train Tesseract and add the generated.traineddata
to VietOCR'stessdata
folder.Thank you but I'm confused and really do not understand much about it. Can you give me more details?
Please be more specific. What are you confused about? Please attach your image, if possible.
You can use jTessBoxEditor to assist you with the training.
I tried to convert this image but vietorc does not work.
You need to scan your image better: 300 DPI, TIFF or PNG image format, not JPEG.
And that font is not supported. You'll need to train Tesseract for it. What's the name of the font?
I'm not sure which font is it. Can you teach me how to train Tesseract for a font?
Put in the effort to look around and find out what font it is. It will help you create the TIFF/Box files used in training.
The training procedure was already mentioned in previous posts -- you'll need to read through it.