OCR for Devanagari

This project is for sharing the training sources and traineddata files for devanagari script for use with Tesseract OCR.

Sanskrit/Hindi Traineddata

Please note that Tesseract 4.0.0-alpha with LSTM engine gives better results for Hindi and other Indian languages.

See some OCR evalaution reports at:

My Traineddata for Sanskrit/Hindi for specific Devanagari fonts

Devanagari Unicode Fonts

Sanskrit2003

Siddhanta, Chandas, Uttara

Santidev OT

Nakula, Sahadeva

Annapoorna SIL

Lohit Devanagari

FreeSerif with XeLatex

Google Devanagari Fonts

Software Used

The following software packages and utilities were used for this.

Tesseract OCR 3.05 dev

Tesseract OCR 3.02

TIF/Box File Generator and Box Editors

OCR Evaluation Tools

  • Fork of the OCR evaluation tools from UNLV/ISRI - modified by Nick White for Unicode -
    from 'ocr-evaluation-tools' from http://ancientgreekocr.org/

https://github.com/Shreeshrii/ocr-evaluation-tools
https://gitorious.org/ancient-greek-training-for-tesseract/ocr-evaluation-tools/archive-tarball/master

Use VIETOCR GUI for Tesseract with these traineddata files to OCR the text.

https://sourceforge.net/projects/vietocr/files/vietocr.net/
https://sourceforge.net/projects/vietocr/files/vietocr/