Failing to read text from image, says Empty page!!

  • Quan Nguyen

    Quan Nguyen - 2012-11-22

    Any OCR engine would have difficulty handling CAPTCHA.

    You may have better success rates after thresholding to gray or monochome and rescaling image to 300 DPI, and trying out with different page segmentation modes.

    Last edit: Quan Nguyen 2012-11-22
  • Lakshmana Kumar Yeddu

    Hi Quan,

    Would you please tell me what is page segmentation mode?, any example that support.
    As seen the code we have only one eng.traineddata file available. Would this consider all font types (Arial, Times New Roman etc) and overlapped chars including Bold, Italic?
    Also let me know how to create traineddata file for any other languages or fonts.

    Appreciate your help on this.


  • Quan Nguyen

    Quan Nguyen - 2012-11-26

    Hi Lakshman,

    You can check the project's documentation for info about PSM -- their names literally describe what each mode does.

    eng.traineddata covers basic fonts and styles. You can unpack the file or check Tesseract Wiki for details about the language data and the training process.



Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks