VietOCR / Discussion / Help: VietOCR

VietOCR

Forum: Help

Creator: mourinhojosee

Created: 2014-04-15

Updated: 2014-04-20

mourinhojosee - 2014-04-15

I am trying to extract a text from scanned pdf.. for this i use VietOCR...i download a arabic package but still i receive a wrong text...i receive arabic character but 20% words with errors
how i can resolve this? i try also to do a small java tool with tess4j...but i got the same effects?
Thank you for any help.
Mourinho

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Quan Nguyen - 2014-04-20

The quality of the image plays an important part on the quality of the output text. You may need to improve the scanninng (300DPI, grayscale or B/W, for example), preprocess the image (Improve Quality), tweak Tesseract engine, and lastly, perform post-OCR corrections.

If the font does not resemble the supported fonts, you may need to consider training Tesseract to recognize that font.

Last edit: Quan Nguyen 2014-04-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.