Image generation model with single-stream diffusion transformer
Qwen-Image is a powerful image generation foundation model
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Official inference repo for FLUX.2 models
A Powerful Native Multimodal Model for Image Generation
Official DeiT repository
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Models for object and human mesh reconstruction
Official inference repo for FLUX.1 models
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Diffusion Transformer with Fine-Grained Chinese Understanding
A Unified Framework for Text-to-3D and Image-to-3D Generation
Chat & pretrained large vision language model
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Collection of Gemma 3 variants that are trained for performance
Generating Immersive, Explorable, and Interactive 3D Worlds
CLIP, Predict the most relevant text snippet given an image
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
RGBD video generation model conditioned on camera input
Code for running inference with the SAM 3D Body Model 3DB
Reference PyTorch implementation and models for DINOv3
Towards Real-World Vision-Language Understanding
Multimodal-Driven Architecture for Customized Video Generation