Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns.
Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. It also has unicode (UTF-8) support.
Features
- OCR engine and command line program
- Line recognition and character pattern recognition
- Unicode (UTF-8) support
- Recognizes more than 100 languages, and can be trained to recognize others
- Supports various output formats
License
Apache License V2.0Follow Tesseract OCR
You Might Also Like
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of Tesseract OCR!