Large Audio Language Model built for natural interactions
A text-to-speech, speech-to-text and speech-to-speech library
The Triton Inference Server provides an optimized cloud
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Free, high-quality text-to-speech API endpoint to replace OpenAI
Convert files and web content into clean, usable Markdown easily
The python library for real-time communication
A lightweight text-to-speech model with zero-shot voice cloning
Document Image Parsing via Heterogeneous Anchor Prompting”
MOSS‑TTS Family open‑source speech and sound generation model
A react-based starter app for using the Live API over websockets
Capable of understanding text, audio, vision, video
Oobabooga - The definitive Web UI for local AI, with powerful features
The official Python SDK for the ElevenLabs API
Data manipulation and transformation for audio signal processing
Access to Anthropic's safety-first language model APIs
WhatsApp MCP server enabling AI access to chats and messaging
StreamSpeech is a seamless model for offline speech recognition
Open source text-to-speech tool, supports extra-long text
Qwen3-omni is a natively end-to-end, omni-modal LLM
Tokenizer-Free TTS for Multilingual Speech Generation
A HTML5 video player with a parser that saves traffic
A nearly-live implementation of OpenAI's Whisper
A 0.1B Omni model trained from scratch
Towards Human-Sounding Speech