Wan2.1: Open and Advanced Large-Scale Video Generative Model
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Wan2.2: Open and Advanced Large-Scale Video Generative Model
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
A Powerful Native Multimodal Model for Image Generation
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Tooling for the Common Objects In 3D dataset
Text and image to video generation: CogVideoX and CogVideo
Generating Immersive, Explorable, and Interactive 3D Worlds
Capable of understanding text, audio, vision, video
Sharp Monocular Metric Depth in Less Than a Second
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Language modeling in a sentence representation space
RGBD video generation model conditioned on camera input
Implementation of "MobileCLIP" CVPR 2024
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
PyTorch code and models for the DINOv2 self-supervised learning
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official DeiT repository
Official repo for consistency models
A latent text-to-image diffusion model
Large-scale autoregressive pixel model for image generation by OpenAI