Large Audio Language Model built for natural interactions
A text-to-speech, speech-to-text and speech-to-speech library
The Triton Inference Server provides an optimized cloud
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Free, high-quality text-to-speech API endpoint to replace OpenAI
A lightweight text-to-speech model with zero-shot voice cloning
Document Image Parsing via Heterogeneous Anchor Prompting”
MOSS‑TTS Family open‑source speech and sound generation model
Capable of understanding text, audio, vision, video
Oobabooga - The definitive Web UI for local AI, with powerful features
Data manipulation and transformation for audio signal processing
The official Python SDK for the ElevenLabs API
WhatsApp MCP server enabling AI access to chats and messaging
StreamSpeech is a seamless model for offline speech recognition
Tokenizer-Free TTS for Multilingual Speech Generation
Qwen3-omni is a natively end-to-end, omni-modal LLM
Towards Human-Sounding Speech
A nearly-live implementation of OpenAI's Whisper
A 0.1B Omni model trained from scratch
Converts text to speech in realtime
One-click deployment (including offline integration package)
Python library for building agents that leverages Google Antigravity
Execute SQL queries and manage databases seamlessly with Timeplus
MOSS-TTS-Nano is an open-source multilingual tiny speech generation
Provides convenient access to the Anthropic REST API from any Python 3