Download Latest Version char_test_train.zip (21.1 MB)
Email in envelope

Get an email when there's a new version of Devanagari OCR

Home
Name Modified Size InfoDownloads / Week
Documentation and papers 2013-10-02
Data For Training Testing 2013-10-02
README.txt 2013-10-02 1.4 kB
tool_document_samples.zip 2013-09-09 50.3 MB
Totals: 4 Items   50.3 MB 0
The documentation has a few related papers and instructions on using the tool. Please use the following articles to cite this work: 

"Devanagari OCR using a recognition driven segmentation framework and stochastic language models", Suryaprakash Kompalli, Srirangaraj Setlur, Venu Govindaraju, IJDAR, 2009, Volume: 12, Pg.: 123–138

“Design and Comparison of Segmentation Driven and Recognition Driven Devanagari OCR”, Suryaprakash Kompalli, Srirangaraj Setlur, and Venu Govindaraju.International Workshop on Document Image Analysis and Libraries, 2006, Pg.: 96-102.

"A Framework for Creation of Multi-Lingual OCR Datasets.", Suryaprakash Kompalli, Srirangaraj Setlur, Venu Govindaraju, Ramanaprasad Vemulapati. Symposium on Document Image Understanding Technology, 2003, Pg.: 189-196.

The folder "Data For Training Testing" contains character images. These are annotated using the Unicode code converted to Decimal. For instance, images of the vowel "a", represented by the Unicode 0905 is annotated as 2309 in the files/folders. Similarly, the consonant "ka" has unicode 0915, and is annotated as 2325. The relevant Unicode chart is located here: www.unicode.org/charts/PDF/U0900.pdf‎

tool_document_samples.zip: This contains a few grayscale images scanned at 300 dpi. Each tiff image has a coressponding xml groundtruth file. The file contains bounding box of each word, ITRANS transliteration and Unicode representation of the word. 
Source: README.txt, updated 2013-10-02