A text-to-speech, speech-to-text and speech-to-speech library
SOTA discrete acoustic codec models with 40/75 tokens per second
Generate audiobooks from e-books, voice cloning & 1107+ languages
Synchronized Translation for Videos
Generate audiobooks from EPUBs, PDFs and text with captions
A nearly-live implementation of OpenAI's Whisper
Comprehensive Gradio WebUI for audio processing
Instant voice cloning by MIT and MyShell. Audio foundation model
Free, high-quality text-to-speech API endpoint to replace OpenAI
Automatically translates the text of a video based on a subtitle file
EPUB to audiobook converter, optimized for Audiobookshelf
Offline Text To Speech synthesis for python
Interface for OuteTTS models
MARS5 speech model (TTS) from CAMB.AI
Use Microsoft Edge's online text-to-speech service from Python
SOTA Open Source TTS
One-click deployment (including offline integration package)
A TTS model capable of generating ultra-realistic dialogue
A sound cloning tool with a web interface, using your voice
Towards Human-Sounding Speech
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Industrial-level controllable zero-shot text-to-speech system
Controllable & emotion-expressive zero-shot TTS
The official Python SDK for the ElevenLabs API
Converts text to speech in realtime