Document (PDF, Word, PPTX ...) extraction and parse API
Did you say you like data?
Extract one time password (OTP) secrets from QR codes
Read and extract text and other content from PDFs in C#
WindowTextExtractor allows you to get a text from any OS
PDFsam, a desktop application to split, merge, mix, rotate PDF files
Comprehensive Gradio WebUI for audio processing
JavaScript OCR and text extraction for images and PDFs
A GUI tool for extracting hard-coded subtitle (hardsub) from videos
A pure-python PDF library capable of splitting, merging, cropping
LLM
OCR model for complex documents with layout-aware structured outputs
Library for OCR-related tasks powered by Deep Learning
OCR software, free and offline
A cross-platform software for text translation and recognition
Image Toolbox is an powerful picture editor, which can crop
Contexts Optical Compression
A simple native web interface that uses ChatTTS to synthesize text
A fast, helpful, and open-source document parser
Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML
A Python tool to help extracting information from structured PDFs
Open source semantic search and text analytics for large document sets
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Python bindings for MuPDF's rendering library.
CLI tool to extract (meta)data from PDF and manipulate PDF files