State-of-the-art (SoTA) text-to-video pre-trained model
Multimodal Diffusion with Representation Alignment
Accurate × Fast × Comprehensive
A Systematic Framework for Interactive World Modeling
code for Mesh R-CNN, ICCV 2019
Qwen3-Coder is the code version of Qwen3
ICLR2024 Spotlight: curation/training code, metadata, distribution
Renderer for the harmony response format to be used with gpt-oss
Revolutionizing Database Interactions with Private LLM Technology
Video Object and Interaction Deletion
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
High-Resolution Image Synthesis with Latent Diffusion Models
GPT4V-level open-source multi-modal model based on Llama3-8B
Programmatic access to the AlphaGenome model
Achieving 3+ generation speedup on reasoning tasks
Easy Docker setup for Stable Diffusion with user-friendly UI
Repo for SeedVR2 & SeedVR
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Generating Immersive, Explorable, and Interactive 3D Worlds
Bidirectional token-classification model for identifiable info
Inference script for Oasis 500M
GLM-4-Voice | End-to-End Chinese-English Conversational Model
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
A series of math-specific large language models of our Qwen2 series