Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns.

Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. It also has unicode (UTF-8) support.

Features

  • OCR engine and command line program
  • Line recognition and character pattern recognition
  • Unicode (UTF-8) support
  • Recognizes more than 100 languages, and can be trained to recognize others
  • Supports various output formats

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Tesseract OCR

Tesseract OCR Web Site

You Might Also Like
Bulk Email Address Verification and Validation API - Bouncer Icon
Bouncer protects your sender’s reputation, decreases bounce rate and improves your deliverability, by not allowing a single undeliverable, risky or unknown email address to sneak into your email list.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Tesseract OCR!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

C++

Related Categories

C++ Image Recognition Software, C++ OCR Software

Registered

2020-05-04