Voice Recognition to Text Tool
Towards Human-Sounding Speech
Open-source multi-speaker long-form text-to-speech model
Free, high-quality text-to-speech API endpoint to replace OpenAI
Capable of understanding text, audio, vision, video
State-of-the-art TTS model under 25MB
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
The official Python SDK for the ElevenLabs API
Offline inference engine for art, real-time voice conversations
Generate audiobooks from e-books, voice cloning & 1107+ languages
Qwen3-ASR is an open-source series of ASR models
Synchronized Translation for Videos
MARS5 speech model (TTS) from CAMB.AI
Open-source framework for intelligent speech interaction
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Framework for building real-time voice and multimodal AI agents
Official MiniMax Model Context Protocol (MCP) server
EPUB to audiobook converter, optimized for Audiobookshelf
A nearly-live implementation of OpenAI's Whisper
MOSS-TTS-Nano is an open-source multilingual tiny speech generation
AI-powered tool for generating, optimizing, and translating subtitles
A speech-text foundation model for real time dialogue
SoTA open-source TTS
Multi-lingual large voice generation model, providing inference
Real-time voice interactive digital human