GLM-4-Voice | End-to-End Chinese-English Conversational Model
Industrial-level controllable zero-shot text-to-speech system
MOSS‑TTS Family open‑source speech and sound generation model
Qwen3-TTS is an open-source series of TTS models
Miso TTS is an 8 billion, highly emotive text-to-speech model
State-of-the-art TTS model under 25MB
Controllable & emotion-expressive zero-shot TTS
Open-source framework for intelligent speech interaction
Repo of Qwen2-Audio chat & pretrained large audio language model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Long-form streaming TTS system for multi-speaker dialogue generation
Open Source Speech Language Model
Multi-modal large language model designed for audio understanding
Qwen3-ASR is an open-source series of ASR models
LLM-based Reinforcement Learning audio edit model
Chat & pretrained large audio language model proposed by Alibaba Cloud
A Conversational Speech Generation Model
Dia-1.6B generates lifelike English dialogue and vocal expressions