Open source framework for deep learning satellite and aerial imagery
Implementation of Vision Transformer, a simple way to achieve SOTA
Enable AI to control your desktop, mobile and HMI devices
Build Vision Agents quickly with any model or video provider
Phi-3.5 for Mac: Locally-run Vision and Language Models
Open Source Differentiable Computer Vision Library
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
3D reconstruction software
Witness the aha moment of VLM with less than $3
Visual Instruction Tuning: Large Language-and-Vision Assistant
The repository provides code for running inference with SAM 2
Fast image augmentation library and an easy-to-use wrapper
A lightweight vision library for performing large object detection
Medical imaging toolkit for deep learning
ICLR2024 Spotlight: curation/training code, metadata, distribution
YOLOv5 is the world's most loved vision AI
"Big Model" trains a visual multimodal VLM with 26M parameters
A fast, powerful, and simple hierarchical vision transformer
Datasets, transforms and models specific to Computer Vision
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Making large AI models cheaper, faster and more accessible
Training data (data labeling, annotation, workflow) for all data types
An open phone agent model & framework
Implements weak-to-strong learning for training stronger ML models
Hub of ready-to-use datasets for ML models