Image generation model with single-stream diffusion transformer
Synthesizing and manipulating 2048x1024 images with conditional GANs
Modular AI image and video generation web UI with extensible tools
A Powerful Native Multimodal Model for Image Generation
Sharp Monocular Metric Depth in Less Than a Second
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Language modeling in a sentence representation space
RGBD video generation model conditioned on camera input
Easily compute clip embeddings and build a clip retrieval system
Implementation of "MobileCLIP" CVPR 2024
A Pioneering Open-Source Alternative to GPT-4o
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
PyTorch code and models for the DINOv2 self-supervised learning
Any model. Any hardware. Zero compromise
PyTorch code and models for V-JEPA self-supervised learning from video
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official DeiT repository
Learning multi-scale deep model correcting over- and under- exposed
Code release for "Detecting Twenty-thousand Classes
Official repo for consistency models
A latent text-to-image diffusion model
Efficient approximate nearest neighbor search algorithm collections
A real-time approach for mapping all human pixels of 2D RGB images
VGGFace2 Dataset for Face Recognition
CLIP ViT-bigG/14: Zero-shot image-text model trained on LAION-2B