Document (PDF, Word, PPTX ...) extraction and parse API
Did you say you like data?
Extract one time password (OTP) secrets from QR codes
Read and extract text and other content from PDFs in C#
Comprehensive Gradio WebUI for audio processing
PDFsam, a desktop application to split, merge, mix, rotate PDF files
JavaScript OCR and text extraction for images and PDFs
WindowTextExtractor allows you to get a text from any OS
Library for OCR-related tasks powered by Deep Learning
A pure-python PDF library capable of splitting, merging, cropping
A GUI tool for extracting hard-coded subtitle (hardsub) from videos
LLM
OCR model for complex documents with layout-aware structured outputs
A cross-platform software for text translation and recognition
OCR software, free and offline
A Python tool to help extracting information from structured PDFs
Image Toolbox is an powerful picture editor, which can crop
The Refactoring library based off the Refactoring book
Open source semantic search and text analytics for large document sets
Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML
Contexts Optical Compression
To extract main article from given URL with Node.js
Handwritten Text Recognition (HTR) system implemented with TensorFlow
A modular graph-based Retrieval-Augmented Generation (RAG) system
A fast, helpful, and open-source document parser