A text-to-speech, speech-to-text and speech-to-speech library
GUI for a Vocal Remover that uses Deep Neural Networks
The most powerful and modular diffusion model GUI, api and backend
A single Gradio + React WebUI with extensions for ACE-Step
Generate audiobooks from e-books, voice cloning & 1107+ languages
Generate audiobooks from e-books
Clone a voice in 5 seconds to generate arbitrary speech in real-time
AI video generator optimized for low VRAM and older GPUs use
Interface for OuteTTS models
Code for openai.fm, a demo for the OpenAI Speech API
Stable diffusion for real-time music generation (web app)
Synchronized Translation for Videos
Real-World Centric Foundation GUI Agents
Fast stable diffusion on CPU and AI PC
Self-hosted AI audio transcription
Generate music based on natural language prompts using LLMs
Framework and no-code GUI for fine-tuning LLMs
Open source text-to-speech tool, supports extra-long text
Free, high-quality text-to-speech API endpoint to replace OpenAI
A native desktop GUI for Claude Code
A Web UI for easy subtitle using whisper model
A sound cloning tool with a web interface, using your voice
Unofficial Python API and agentic skill for Google NotebookLM
The python library for real-time communication
Speech recognition for your site