Large Audio Language Model built for natural interactions
A text-to-speech, speech-to-text and speech-to-speech library
The Triton Inference Server provides an optimized cloud
Free, high-quality text-to-speech API endpoint to replace OpenAI
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Self-hosted game stream host for Moonlight
Swing Music is a beautiful, self-hosted music player
Streaming Real-time Audio-Driven Avatar Generation
A lightweight text-to-speech model with zero-shot voice cloning
A speech-text foundation model for real time dialogue
Automated Music Discovery and Collection Manager
Document Image Parsing via Heterogeneous Anchor Prompting”
MOSS‑TTS Family open‑source speech and sound generation model
Pure Python FFmpeg-based live video / audio streaming to YouTube
Capable of understanding text, audio, vision, video
Oobabooga - The definitive Web UI for local AI, with powerful features
Data manipulation and transformation for audio signal processing
The official Python SDK for the ElevenLabs API
WhatsApp MCP server enabling AI access to chats and messaging
Tokenizer-Free TTS for Multilingual Speech Generation
StreamSpeech is a seamless model for offline speech recognition
Qwen3-omni is a natively end-to-end, omni-modal LLM
A nearly-live implementation of OpenAI's Whisper
Open-Source Low-Latency Accelerated Linux WebRTC HTML5 Remote Desktop
Towards Human-Sounding Speech