A nearly-live implementation of OpenAI's Whisper
Controllable & emotion-expressive zero-shot TTS
Qwen3-TTS is an open-source series of TTS models
A high-quality rapid TTS voice cloning model
Foundational model for human-like, expressive TTS
StreamSpeech is a seamless model for offline speech recognition
Framework for building neural networks
MARS5 speech model (TTS) from CAMB.AI
Converts text to speech in realtime
Python library and CLI tool to interface with Google Translate
Generate audiobooks from e-books
Virtual AI anchor that combines state-of-the-art technology
A text-to-speech, speech-to-text and speech-to-speech library
One-click deployment (including offline integration package)
Official MiniMax Model Context Protocol (MCP) server
Open source implementation of Microsoft's VALL-E X zero-shot TTS model
Singing Voice Synthesis via Shallow Diffusion Mechanism