Generating Immersive, Explorable, and Interactive 3D Worlds
GLM-4-Voice | End-to-End Chinese-English Conversational Model
The most powerful local music generation model
Multimodal embedding and reranking models built on Qwen3-VL
Controllable & emotion-expressive zero-shot TTS
High-Resolution Image Synthesis with Latent Diffusion Models
State-of-the-art (SoTA) text-to-video pre-trained model
Fast stable diffusion on CPU and AI PC
Qwen2.5-VL is the multimodal large language model series
Open-source multi-speaker long-form text-to-speech model
HY-Motion model for 3D character animation generation
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Qwen3-ASR is an open-source series of ASR models
Capable of understanding text, audio, vision, video
Long-form streaming TTS system for multi-speaker dialogue generation
Chinese and English multimodal conversational language model
Bidirectional token-classification model for identifiable info
Visual Causal Flow
Large-language-model & vision-language-model based on Linear Attention
General-purpose image editing model that delivers high-fidelity
Diffusion Transformer with Fine-Grained Chinese Understanding
LLM-based Reinforcement Learning audio edit model
Phi-3.5 for Mac: Locally-run Vision and Language Models
Generate Any 3D Scene in Seconds
High-Resolution 3D Assets Generation with Large Scale Diffusion Models