Contexts Optical Compression
Accurate × Fast × Comprehensive
PDF to Markdown with vision models
Visual Causal Flow
Awesome multilingual OCR toolkits based on PaddlePaddle
Convert AI papers to GUI
A framework to enable multimodal models to operate a computer
Enhances Tesseract OCR output using LLMs (local or API)
In-depth tutorials on LLMs, RAGs and real-world AI agent applications
PDF scientific paper translation with preserved formats
OCR expert VLM powered by Hunyuan's native multimodal architecture
Get your documents ready for gen AI
OpenRecall is a fully open-source, privacy-first alternative
A Repo For Document AI
A simple tool for reading in poorly redacted documents
Document content and metadata extraction microservice
AI tool for automating desktop tasks via natural language input
An on-premises, OCR-free unstructured data extraction
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Qwen3-omni is a natively end-to-end, omni-modal LLM
A Python application to add watermarks (text or image) to PDF files
FaceOnLive Open KYC: Streamlining Identity Verification with AI
Img2Txt - Extract Text From Images using AI