TTS-WebUI is a unified Gradio + React web interface that brings together a large ecosystem of text-to-speech, voice conversion, and audio generation models under a single UI. It supports a wide range of models such as Bark, MusicGen, Tortoise, RVC, StyleTTS2, ParlerTTS, CosyVoice, XTTSv2, Stable Audio, SeamlessM4T, and many others, exposing them as interchangeable backends for speech and music synthesis. The project provides an installer that sets up Conda, Python environments, and all necessary dependencies, so users can focus on experimenting with voices instead of managing tooling. It offers both a Gradio backend and an optional React frontend, which can be accessed on separate ports and even run inside Docker for more reproducible deployments. An extension system lets you enable extra models and tools, install community extensions from a catalog, and manage them via a dedicated GUI or CLI extension manager.
Features
- Unified Gradio + React interface for many TTS, VC, and audio generation models
- One-click installer and Docker images for streamlined local setup
- Extension marketplace with GUI/CLI tools for installing and managing add-ons
- OpenAI-compatible TTS API endpoints for integration with external chat and RP tools
- Support for a large catalog of speech, music, separation, and enhancement models
- Integrations with tools like Silly Tavern and OpenWebUI via simple HTTP configuration