Qwen3-TTS is an open-source series of TTS models
Open Source Speech Language Model
Industrial-level controllable zero-shot text-to-speech system
Qwen3-omni is a natively end-to-end, omni-modal LLM
Long-form streaming TTS system for multi-speaker dialogue generation
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Controllable & emotion-expressive zero-shot TTS
Open-source multi-speaker long-form text-to-speech model
Capable of understanding text, audio, vision, video
State-of-the-art TTS model under 25MB
Qwen3-ASR is an open-source series of ASR models
Open-source framework for intelligent speech interaction
Open-source industrial-grade ASR models
LLM-based Reinforcement Learning audio edit model
Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Multi-modal large language model designed for audio understanding
A Conversational Speech Generation Model
Official Python inference and LoRA trainer package
FAIR Sequence Modeling Toolkit 2
Chat & pretrained large audio language model proposed by Alibaba Cloud
Open-weight, large-scale hybrid-attention reasoning model
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)