A high-quality tool for convert PDF to Markdown and JSON
Get your documents ready for gen AI
Multilingual Document Layout Parsing in a Single Vision-Language Model
An on-premises, OCR-free unstructured data extraction
Contexts Optical Compression
Open source semantic search and text analytics for large document sets
A Repo For Document AI
OCR software, free and offline
Canvas-based WYSIWYG rich text editor with advanced layout tools
OCR model for complex documents with layout-aware structured outputs
Map location picker component for Android
The SILE Typesetter — Simon’s Improved Layout Engine
Accurate × Fast × Comprehensive
Enhances Tesseract OCR output using LLMs (local or API)
Library for OCR-related tasks powered by Deep Learning
Open-Source Python3 tool for recognizing layouts, tables, and math
R Markdown Résumés and CVs
Extract and convert data from any document, images, pdfs, word doc
OCR expert VLM powered by Hunyuan's native multimodal architecture
Assist in organizing your piles of documents
CLI tool to extract (meta)data from PDF and manipulate PDF files
PDF Parser for AI-ready data. Automate PDF accessibility
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A versatile toolkit for PDF manipulation
Apache OpenNLP