Devanagari OCR - Browse Files at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Documentation and papers	2013-10-02		0
Data For Training Testing	2013-10-02		2
README.txt	2013-10-02	1.4 kB	0
tool_document_samples.zip	2013-09-09	50.3 MB	0
Totals: 4 Items		50.3 MB	2

The documentation has a few related papers and instructions on using the tool. Please use the following articles to cite this work: 

"Devanagari OCR using a recognition driven segmentation framework and stochastic language models", Suryaprakash Kompalli, Srirangaraj Setlur, Venu Govindaraju, IJDAR, 2009, Volume: 12, Pg.: 123–138

“Design and Comparison of Segmentation Driven and Recognition Driven Devanagari OCR”, Suryaprakash Kompalli, Srirangaraj Setlur, and Venu Govindaraju.International Workshop on Document Image Analysis and Libraries, 2006, Pg.: 96-102.

"A Framework for Creation of Multi-Lingual OCR Datasets.", Suryaprakash Kompalli, Srirangaraj Setlur, Venu Govindaraju, Ramanaprasad Vemulapati. Symposium on Document Image Understanding Technology, 2003, Pg.: 189-196.

The folder "Data For Training Testing" contains character images. These are annotated using the Unicode code converted to Decimal. For instance, images of the vowel "a", represented by the Unicode 0905 is annotated as 2309 in the files/folders. Similarly, the consonant "ka" has unicode 0915, and is annotated as 2325. The relevant Unicode chart is located here: www.unicode.org/charts/PDF/U0900.pdf‎

tool_document_samples.zip: This contains a few grayscale images scanned at 300 dpi. Each tiff image has a coressponding xml groundtruth file. The file contains bounding box of each word, ITRANS transliteration and Unicode representation of the word.

Source: README.txt, updated 2013-10-02

Devanagari OCR Files

Devanagari Optical Character Recognition, Annotation tool

Devanagari OCR Files

Devanagari Optical Character Recognition, Annotation tool

Get an email when there's a new version of Devanagari OCR