Open source framework for deep learning satellite and aerial imagery
Implementation of Vision Transformer, a simple way to achieve SOTA
Training data (data labeling, annotation, workflow) for all data types
Official DeiT repository
Medical imaging toolkit for deep learning
Open Source Differentiable Computer Vision Library
Fast image augmentation library and an easy-to-use wrapper
The open-source tool for building high-quality datasets
ICLR2024 Spotlight: curation/training code, metadata, distribution
Hub of ready-to-use datasets for ML models
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Deep learning library
Automate browser-based workflows with LLMs and Computer Vision
Reference PyTorch implementation and models for DINOv3
The largest collection of PyTorch image encoders / backbones
[CVPR 2025 Best Paper Award] VGGT
PyTorch code and models for the DINOv2 self-supervised learning
Qwen2.5-VL is the multimodal large language model series
Tooling for the Common Objects In 3D dataset
Contexts Optical Compression
Benchmarking Multimodal Agents for Open-Ended Tasks
OCR expert VLM powered by Hunyuan's native multimodal architecture
Automate native Android apps with AI using accessibility APIs
An open sourced end-to-end VLM-based GUI Agent