Open Source OCR Engine
Awesome multilingual OCR toolkits based on PaddlePaddle
Contexts Optical Compression
Open source semantic search and text analytics for large document sets
OCR software, free and offline
Crowdsourcing platform for full text transcription and tagging
A framework to enable multimodal models to operate a computer
OCRmyPDF adds an OCR text layer to scanned PDF files
Enhances Tesseract OCR output using LLMs (local or API)
A cross-platform software for text translation and recognition
Accurate × Fast × Comprehensive
Visual Causal Flow
A simple tool for reading in poorly redacted documents
OCR expert VLM powered by Hunyuan's native multimodal architecture
The media player for language learning, with dual subtitles
An on-premises, OCR-free unstructured data extraction
A ranked list of awesome machine learning Python libraries
JavaScript OCR and text extraction for images and PDFs
Assist in organizing your piles of documents
A Python application to add watermarks (text or image) to PDF files
Scan Tailor Experimental is an interactive post-processing tool
Your Private Offline Translator
Command-line toolset for extracting text from files
ITTT is a Free tool designed to Scan and extract Text from Images.