Open-source framework for intelligent speech interaction
Multi-modal large language model designed for audio understanding
Official Python inference and LoRA trainer package
A Family of Open Sourced Music Foundation Models
Multimodal Diffusion with Representation Alignment
Multimodal-Driven Architecture for Customized Video Generation
Capable of understanding text, audio, vision, video
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A Systematic Framework for Interactive World Modeling
Controllable & emotion-expressive zero-shot TTS
Qwen3-TTS is an open-source series of TTS models
State-of-the-art TTS model under 25MB
A Conversational Speech Generation Model
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Dia-1.6B generates lifelike English dialogue and vocal expressions