Image generation model with single-stream diffusion transformer
Wan2.1: Open and Advanced Large-Scale Video Generative Model
CogView4, CogView3-Plus and CogView3(ECCV 2024)
An easy 1-click way to create beautiful artwork on your PC using AI
Wan2.2: Open and Advanced Large-Scale Video Generative Model
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
A Powerful Native Multimodal Model for Image Generation
Text and image to video generation: CogVideoX and CogVideo
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Tooling for the Common Objects In 3D dataset
Generating Immersive, Explorable, and Interactive 3D Worlds
Capable of understanding text, audio, vision, video
Sharp Monocular Metric Depth in Less Than a Second
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Language modeling in a sentence representation space
RGBD video generation model conditioned on camera input
Flux 2 image generation model pure C inference
Implementation of "MobileCLIP" CVPR 2024
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
PyTorch code and models for the DINOv2 self-supervised learning
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official DeiT repository
Official repo for consistency models
A latent text-to-image diffusion model