Controllable & emotion-expressive zero-shot TTS
The most powerful local music generation model
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Open Source Speech Language Model
Multimodal embedding and reranking models built on Qwen3-VL
Capable of understanding text, audio, vision, video
High-Resolution Image Synthesis with Latent Diffusion Models
A Systematic Framework for Interactive World Modeling
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Open-source multi-speaker long-form text-to-speech model
General-purpose image editing model that delivers high-fidelity
Accurate × Fast × Comprehensive
tiktoken is a fast BPE tokeniser for use with OpenAI's models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Large-language-model & vision-language-model based on Linear Attention
Fast stable diffusion on CPU and AI PC
Long-form streaming TTS system for multi-speaker dialogue generation
Bidirectional token-classification model for identifiable info
HY-Motion model for 3D character animation generation
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Diffusion Transformer with Fine-Grained Chinese Understanding
Unified Multimodal Understanding and Generation Models
Qwen3-ASR is an open-source series of ASR models
Visual Causal Flow
Chinese and English multimodal conversational language model