Provides line-oriented text file editing capabilities
Document (PDF, Word, PPTX ...) extraction and parse API
OCRmyPDF adds an OCR text layer to scanned PDF files
A GUI tool for extracting hard-coded subtitle (hardsub) from videos
Comprehensive Gradio WebUI for audio processing
Python library and CLI tool to interface with Google Translate
Ready-to-use OCR with 80+ supported languages
TTS with kokoro and onnx runtime
A text-to-speech, speech-to-text and speech-to-speech library
Python binding to the Apache Tika™ REST services
EPUB to audiobook converter, optimized for Audiobookshelf
Generate audiobooks from e-books, voice cloning & 1107+ languages
Speech recognition module for Python
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Generate blog articles from video or audio
Controllable and fast Text-to-Speech for over 7000 languages
Concatenate a directory full of files into a single prompt
Create prompt-friendly codebase digests from any Git repository URL
An open source implementation of CLIP
Document content and metadata extraction microservice
Automatically translates the text of a video based on a subtitle file
Parse files for optimal RAG
Powerful Android AI agent with tools, automation, and Linux shell
An opinionated CLI to transcribe Audio files w/ Whisper on-device
Context-aware desktop AI assistant that understands screen content