A Repo For Document AI
Open source semantic search and text analytics for large document sets
Readest is a modern, feature-rich ebook reader
A free, open source, and extensible speech-to-text application
Easy-to-use and powerful NLP library with Awesome model zoo
Agent harness to make your slop code well-engineered and beautiful
The most accurate natural language detection library for Rust
Generate audiobooks from EPUBs, PDFs and text with captions
A high-quality PDF to Markdown tool based on large language model
Screenshots, word marking, OCR, AI, translation software
Easy-to-use and high-performance NLP and LLM framework
Go efficient multilingual NLP and text segmentation
Apache OpenNLP
Enhances Tesseract OCR output using LLMs (local or API)
Open source libraries and APIs to build custom preprocessing pipelines
A very simple framework for state-of-the-art NLP
AI-powered tool for generating, optimizing, and translating subtitles
OCR software, free and offline
Easily compute clip embeddings and build a clip retrieval system
A fast, helpful, and open-source document parser
Python binding to the Apache Tika™ REST services
Automatic Speech Recognition with Word-level Timestamps
Audiocraft is a library for audio processing and generation
Advanced NLP with spaCy: A free online course
An opinionated CLI to transcribe Audio files w/ Whisper on-device