Open-source multi-speaker long-form text-to-speech model
Framework for building realtime multimodal voice AI agents apps
Speakr is a personal, self-hosted web application
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Converts text to speech in realtime
Synchronized Translation for Videos
A nearly-live implementation of OpenAI's Whisper
Voice Recognition to Text Tool
Audio foundation model excelling in audio understanding
Official MiniMax Model Context Protocol (MCP) server
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Generate audiobooks from e-books, voice cloning & 1107+ languages
Offline inference engine for art, real-time voice conversations
Persian NLP Toolkit
A simple native web interface that uses ChatTTS to synthesize text
A speech-text foundation model for real time dialogue
AI-powered tool for generating, optimizing, and translating subtitles
A sound cloning tool with a web interface, using your voice
EPUB to audiobook converter, optimized for Audiobookshelf
Controllable and fast Text-to-Speech for over 7000 languages
SOTA discrete acoustic codec models with 40/75 tokens per second
An opinionated CLI to transcribe Audio files w/ Whisper on-device
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Framework for building real-time voice and multimodal AI agents
Interface for OuteTTS models