A high-quality tool for convert PDF to Markdown and JSON
Get your documents ready for gen AI
Multilingual Document Layout Parsing in a Single Vision-Language Model
Contexts Optical Compression
An on-premises, OCR-free unstructured data extraction
Open source semantic search and text analytics for large document sets
A Repo For Document AI
OCR software, free and offline
Library for OCR-related tasks powered by Deep Learning
Canvas-based WYSIWYG rich text editor with advanced layout tools
Map location picker component for Android
Enhances Tesseract OCR output using LLMs (local or API)
The SILE Typesetter — Simon’s Improved Layout Engine
OCR model for complex documents with layout-aware structured outputs
Accurate × Fast × Comprehensive
R Markdown Résumés and CVs
Extract and convert data from any document, images, pdfs, word doc
Open-Source Python3 tool for recognizing layouts, tables, and math
Assist in organizing your piles of documents
OCR expert VLM powered by Hunyuan's native multimodal architecture
Collabora Online is a collaborative online office suite
CLI tool to extract (meta)data from PDF and manipulate PDF files
Video translation and dubbing tool powered by LLMs
PDF Parser for AI-ready data. Automate PDF accessibility
Apache OpenNLP