Clone a voice in 5 seconds to generate arbitrary speech in real-time
Fast and accurate automatic speech recognition (ASR) for edge devices
Comprehensive Gradio WebUI for audio processing
Minimalistic audiobook player
Framework for building real-time voice and multimodal AI agents
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Realtime AI Voice Agents with SoTA Multimodal AI models on Arduino ESP
The behavior guidance framework for customer-facing LLM agents
Build voice-based LLM agents. Modular + open source
Real-time voice interactive digital human
A high-quality rapid TTS voice cloning model
Self hosted, you-owned Grok Companion
Fast multimodal LLM for real-time voice interaction and AI apps
1 min voice data can also be used to train a good TTS model
The python library for real-time communication
Open-source framework for conversational voice AI agents
Software that uses AI to perform real-time voice conversion
Qwen3-TTS is an open-source series of TTS models
State-of-the-art TTS model under 25MB
Large Audio Language Model built for natural interactions
A lightweight text-to-speech model with zero-shot voice cloning
PersonaPlex code
AI teacher that lives as a buddy next to your cursor
In-App assistant SDK to build a multimodal conversational UX for iOS
Code for openai.fm, a demo for the OpenAI Speech API