Multi-lingual large voice generation model, providing inference
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Foundational model for human-like, expressive TTS
Toolkit for conversational AI
A TTS that fits in your CPU (and pocket)
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
A fast TTS architecture with conditional flow matching
SOTA discrete acoustic codec models with 40/75 tokens per second
Build Vision Agents quickly with any model or video provider
An Open Source text-to-speech system built by inverting Whisper
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Official MiniMax Model Context Protocol (MCP) server
Automatically translates the text of a video based on a subtitle file
A lightweight text-to-speech model with zero-shot voice cloning
StreamSpeech is a seamless model for offline speech recognition
Scalable generative AI framework built for researchers and developers
Interface for OuteTTS models
MARS5 speech model (TTS) from CAMB.AI
Converts text to speech in realtime
Real-time voice interactive digital human
Long-form streaming TTS system for multi-speaker dialogue generation
Spark-TTS Inference Code
Framework for building neural networks
Controllable & emotion-expressive zero-shot TTS
Controllable and fast Text-to-Speech for over 7000 languages