Menu

VietOCR

Help
2014-04-15
2014-04-20
  • mourinhojosee

    mourinhojosee - 2014-04-15

    I am trying to extract a text from scanned pdf.. for this i use VietOCR...i download a arabic package but still i receive a wrong text...i receive arabic character but 20% words with errors
    how i can resolve this? i try also to do a small java tool with tess4j...but i got the same effects?
    Thank you for any help.
    Mourinho

     
  • Quan Nguyen

    Quan Nguyen - 2014-04-20

    The quality of the image plays an important part on the quality of the output text. You may need to improve the scanninng (300DPI, grayscale or B/W, for example), preprocess the image (Improve Quality), tweak Tesseract engine, and lastly, perform post-OCR corrections.

    If the font does not resemble the supported fonts, you may need to consider training Tesseract to recognize that font.

     

    Last edit: Quan Nguyen 2014-04-20

Log in to post a comment.