Official DeiT repository
ICLR2024 Spotlight: curation/training code, metadata, distribution
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Reference PyTorch implementation and models for DINOv3
PyTorch code and models for the DINOv2 self-supervised learning
Qwen2.5-VL is the multimodal large language model series
Tooling for the Common Objects In 3D dataset
Contexts Optical Compression
Foundational Models for State-of-the-Art Speech and Text Translation
PyTorch implementation of MAE
Vision-language-action model for robot control via images and text
CLIP ViT-bigG/14: Zero-shot image-text model trained on LAION-2B
Small 3B-base multimodal model ideal for custom AI on edge hardware
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Metric monocular depth estimation (vision model)