A high-quality tool for convert PDF to Markdown and JSON
Get your documents ready for gen AI
Multilingual Document Layout Parsing in a Single Vision-Language Model
An on-premises, OCR-free unstructured data extraction
Contexts Optical Compression
Open source semantic search and text analytics for large document sets
A Repo For Document AI
OCR software, free and offline
Canvas-based WYSIWYG rich text editor with advanced layout tools
The SILE Typesetter — Simon’s Improved Layout Engine
OCR model for complex documents with layout-aware structured outputs
Map location picker component for Android
Library for OCR-related tasks powered by Deep Learning
Assist in organizing your piles of documents
Accurate × Fast × Comprehensive
Enhances Tesseract OCR output using LLMs (local or API)
R Markdown Résumés and CVs
Extract and convert data from any document, images, pdfs, word doc
OCR expert VLM powered by Hunyuan's native multimodal architecture
Open-Source Python3 tool for recognizing layouts, tables, and math
CLI tool to extract (meta)data from PDF and manipulate PDF files
PDF Parser for AI-ready data. Automate PDF accessibility
Collabora Online is a collaborative online office suite
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Apache OpenNLP