A high-quality tool for convert PDF to Markdown and JSON
LLM framework for document understanding and semantic retrieval
Get your documents ready for gen AI
AI-powered document analysis and tagging for Paperless-ngx
Open Source Document Management System for Digital Archives
Document (PDF, Word, PPTX ...) extraction and parse API
A Repo For Document AI
An on-premises, OCR-free unstructured data extraction
Multilingual Document Layout Parsing in a Single Vision-Language Model
Multi-tool for semantic search
Structured data extraction and instruction calling with ML, LLM
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
An Open-Source Toolkit for General-OCR Research and Applications
Document content and metadata extraction microservice
A community-supported supercharged version of paperless
The official implementation of RAPTOR
File Parser optimised for LLM Ingestion with no loss
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine
Parse files for optimal RAG
A high-quality PDF to Markdown tool based on large language model
Research-oriented chatbot framework
Generate audiobooks from EPUBs, PDFs and text with captions
A Model Context Protocol (MCP) server implementation
Unified framework for building enterprise RAG pipelines
Library for OCR-related tasks powered by Deep Learning