A text-to-speech, speech-to-text and speech-to-speech library
GUI for a Vocal Remover that uses Deep Neural Networks
The most powerful and modular diffusion model GUI, api and backend
Generate audiobooks from e-books, voice cloning & 1107+ languages
Generate audiobooks from e-books
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Interface for OuteTTS models
AI video generator optimized for low VRAM and older GPUs use
A GUI tool for extracting hard-coded subtitle (hardsub) from videos
Synchronized Translation for Videos
Fast stable diffusion on CPU and AI PC
Real-World Centric Foundation GUI Agents
Framework and no-code GUI for fine-tuning LLMs
A Web UI for easy subtitle using whisper model
An opinionated CLI to transcribe Audio files w/ Whisper on-device
Free, high-quality text-to-speech API endpoint to replace OpenAI
A sound cloning tool with a web interface, using your voice
Unofficial Python API and agentic skill for Google NotebookLM
Python library and CLI tool to interface with Google Translate
EPUB to audiobook converter, optimized for Audiobookshelf
Software that uses AI to perform real-time voice conversion
A simple screen parsing tool towards pure vision based GUI agent
Label Studio is a multi-type data labeling and annotation tool
Unified web UI for training and running open models locally
A simple native web interface that uses ChatTTS to synthesize text