Industrial-level controllable zero-shot text-to-speech system
Speech-AI-Forge is a project developed around TTS generation model
Foundational model for human-like, expressive TTS
Multi-lingual large voice generation model, providing inference
A generative speech model for daily dialogue
Official MiniMax Model Context Protocol (MCP) server
Free, high-quality text-to-speech API endpoint to replace OpenAI
Synchronized Translation for Videos
The official Python SDK for the ElevenLabs API
Build Vision Agents quickly with any model or video provider
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
A text-to-speech, speech-to-text and speech-to-speech library
End-to-end speech processing toolkit
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Toolkit for conversational AI
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Long-form streaming TTS system for multi-speaker dialogue generation
Spark-TTS Inference Code
A lightweight text-to-speech model with zero-shot voice cloning
Framework for building neural networks
StreamSpeech is a seamless model for offline speech recognition
Controllable & emotion-expressive zero-shot TTS
SOTA discrete acoustic codec models with 40/75 tokens per second
One-click deployment (including offline integration package)
A TTS model capable of generating ultra-realistic dialogue