OuteTTS is an interface library for running OuteTTS text-to-speech models across a range of backends, making it easier to deploy the same model on different hardware and runtimes. It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines. The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face Transformers, ExLlamaV2, VLLM and a JavaScript interface via Transformers.js, allowing it to run on CPUs, NVIDIA CUDA GPUs, AMD ROCm, Vulkan-capable GPUs, and Apple Metal. It also includes a notion of speaker profiles: you can create a speaker from a short audio sample, save it as JSON, and reuse it for consistent voice identity across generations and sessions. For best quality, the model is designed to work with a reference speaker clip and will inherit emotion, style, and accent from that reference.
Features
- Multi-backend support for llama.cpp, Transformers, ExLlamaV2, VLLM and Transformers.js
- Hardware-aware install recipes for CUDA, ROCm, Vulkan, Metal and CPU-only setups
- High-level Interface API for configuring models, generating speech and saving audio
- Speaker profile system for creating, saving and reusing custom voices from audio samples
- Cross-lingual synthesis that preserves the reference speaker’s accent, emotion and style
- Integration hooks for third-party runtimes such as MLX-Audio, Llama.cpp and KoboldCPP