I'm trying to read the text from file which is of plain english but not sure of the font. The text is not properly aligned or the text is somewhat overlapped. I'm getting Empty page!! when executed. Request you please help. I'm new to the OCR.
Any OCR engine would have difficulty handling CAPTCHA.
You may have better success rates after thresholding to gray or monochome and rescaling image to 300 DPI, and trying out with different page segmentation modes.
Would you please tell me what is page segmentation mode?, any example that support.
As seen the code we have only one eng.traineddata file available. Would this consider all font types (Arial, Times New Roman etc) and overlapped chars including Bold, Italic?
Also let me know how to create traineddata file for any other languages or fonts.
Appreciate your help on this.
You can check the project's documentation for info about PSM -- their names literally describe what each mode does.
eng.traineddata covers basic fonts and styles. You can unpack the file or check Tesseract Wiki for details about the language data and the training process.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.