PyTorch code and models for VJEPA2 self-supervised learning from video
Overcoming Data Limitations for High-Quality Video Diffusion Models
CLIP + FFT/DWT/RGB = text to image/video
CoTracker is a model for tracking any point (pixel) on a video
Visual localization made easy with hloc
A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator
Python version of ffplay with built-in AI
The official pytorch implementation of our paper
A real-time approach for mapping all human pixels of 2D RGB images
We estimate dense, flicker-free, geometrically consistent depth
Efficient 3D human pose estimation in video using 2D keypoint
Resources about activity recognition
High performance image processing library in C++
Data useful for testing autonomous navigation algorithms