Open Source Speech Language Model
Multimodal embedding and reranking models built on Qwen3-VL
The most powerful local music generation model
Controllable & emotion-expressive zero-shot TTS
Qwen2.5-VL is the multimodal large language model series
A Multi-Modal World Model for Reconstructing, Generating, Simulation
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A Systematic Framework for Interactive World Modeling
General-purpose image editing model that delivers high-fidelity
Accurate × Fast × Comprehensive
Open-source multi-speaker long-form text-to-speech model
Fast stable diffusion on CPU and AI PC
Capable of understanding text, audio, vision, video
Long-form streaming TTS system for multi-speaker dialogue generation
Bidirectional token-classification model for identifiable info
HY-Motion model for 3D character animation generation
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Large-language-model & vision-language-model based on Linear Attention
Visual Causal Flow
Diffusion Transformer with Fine-Grained Chinese Understanding
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Qwen3-ASR is an open-source series of ASR models
Generate Any 3D Scene in Seconds
Unified Multimodal Understanding and Generation Models
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming