Capable of understanding text, audio, vision, video
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
State-of-the-art TTS model under 25MB
The official Python SDK for the ElevenLabs API
Offline inference engine for art, real-time voice conversations
Generate audiobooks from e-books, voice cloning & 1107+ languages
Qwen3-ASR is an open-source series of ASR models
Synchronized Translation for Videos
MARS5 speech model (TTS) from CAMB.AI
Open-source framework for intelligent speech interaction
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Framework for building real-time voice and multimodal AI agents
Official MiniMax Model Context Protocol (MCP) server
EPUB to audiobook converter, optimized for Audiobookshelf
A nearly-live implementation of OpenAI's Whisper
MOSS-TTS-Nano is an open-source multilingual tiny speech generation
AI-powered tool for generating, optimizing, and translating subtitles
Multi-lingual large voice generation model, providing inference
SoTA open-source TTS
Real-time voice interactive digital human
Controllable and fast Text-to-Speech for over 7000 languages
A simple native web interface that uses ChatTTS to synthesize text
SOTA discrete acoustic codec models with 40/75 tokens per second
Open-source industrial-grade ASR models
Management of Yandex Station and other smart home devices