A community-supported supercharged version of paperless
Open Source Document Management System for Digital Archives
Python tool for converting files and office documents to Markdown
Small python-gtk application, to merge or split PDFs
The awesome document factory
An open-source RAG-based tool for chatting with your documents
Package for converting and rendering markdown documents in TeX
Python bindings for MuPDF's rendering library.
Interact with your documents using the power of GPT
Generate audiobooks from EPUBs, PDFs and text with captions
A full spaCy pipeline and models for scientific/biomedical documents
JupyterLab extension for live editing of LaTeX documents
An AI personal assistant for your digital brain
A Repo For Document AI
Library for OCR-related tasks powered by Deep Learning
ktrain is a Python library that makes deep learning AI more accessible
The ChatGPT Retrieval Plugin lets you easily find personal documents
Python scraper based on AI
Neural Search
ContextGem: Effortless LLM extraction from documents
OCRmyPDF adds an OCR text layer to scanned PDF files
Open source libraries and APIs to build custom preprocessing pipelines
Powerful and highly extensible command-line based document
File Parser optimised for LLM Ingestion with no loss
Haystack is an open source NLP framework to interact with your data