Official inference repo for FLUX.1 models
CLIP, Predict the most relevant text snippet given an image
Official inference repo for FLUX.2 models
Models for object and human mesh reconstruction
A Powerful Native Multimodal Model for Image Generation
Wan2.1: Open and Advanced Large-Scale Video Generative Model
High-Resolution Image Synthesis with Latent Diffusion Models
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Lets make video diffusion practical
Code for running inference with the SAM 3D Body Model 3DB
A Customizable Image-to-Video Model based on HunyuanVideo
A SOTA open-source image editing model
Collection of Gemma 3 variants that are trained for performance
Recovering the Visual Space from Any Views
Diffusion Transformer with Fine-Grained Chinese Understanding
Advancing Open-source World Models
Code for running inference and finetuning with SAM 3 model
Official implementation of DreamCraft3D
Accurate × Fast × Comprehensive
A 0.1B Omni model trained from scratch
Implementation of "MobileCLIP" CVPR 2024
PyTorch code and models for the DINOv2 self-supervised learning
code for Mesh R-CNN, ICCV 2019
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Powerful open source image generation model