Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns.
Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. It also has unicode (UTF-8) support.
Features
- OCR engine and command line program
- Line recognition and character pattern recognition
- Unicode (UTF-8) support
- Recognizes more than 100 languages, and can be trained to recognize others
- Supports various output formats
License
Apache License V2.0Follow Tesseract OCR
Other Useful Business Software
MongoDB Atlas runs apps anywhere
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Rate This Project
Login To Rate This Project
User Reviews
-
Enjoy this project for my mission
-
Brilliant. Worked properly first time. great code.
-
very good OCR project!
-
wow, good OCR. The release files are very oldest than http://code.google.com/p/tesseract-ocr/ I packed tesseract with gImageReader http://sourceforge.net/projects/gimagereader/
-
how to install in win Xp?