TAPNet is the official Google DeepMind repository for Tracking Any Point (TAP), bundling datasets, models, benchmarks, and demos for precise point tracking in videos. The project includes the TAP-Vid and TAPVid-3D benchmarks, which evaluate long-range tracking of arbitrary points in 2D and 3D across diverse real and synthetic videos. Its flagship models—TAPIR, BootsTAPIR, and the latest TAPNext—use matching plus temporal refinement or next-token style propagation to achieve state-of-the-art accuracy and speed on TAP-Vid. RoboTAP demonstrates how TAPIR-style tracks can drive real-world robot manipulation via efficient imitation, and ships with a dataset of annotated robotics videos. The repo provides JAX and PyTorch checkpoints, Colab demos, and a real-time live demo that runs on a GPU to let you select and track points interactively.
Features
- Clear coordinate conventions and standardized metrics for fair, reproducible comparisons
- Training and evaluation pipelines, plus Kubric utilities for generating point tracks
- Colab notebooks and an offline/online real-time demo for quick experimentation
- RoboTAP benchmark and clustering demo for robotics manipulation from point tracks
- High-performance models including TAPIR, BootsTAPIR, and TAPNext with JAX and PyTorch checkpoints
- TAP-Vid and TAPVid-3D datasets and evaluation metrics for point tracking