PyTorchVideo is a deep learning library for video understanding, providing modular components and pretrained models for tasks like action recognition, video classification, detection, and self-supervised learning. It is tightly integrated with PyTorch and PyTorch Lightning, offering flexible APIs for building and training spatiotemporal networks. The library includes efficient implementations of state-of-the-art architectures such as SlowFast, X3D, and MViT, optimized for both research prototyping and production inference. It supports video I/O pipelines, data augmentation, distributed training, and mixed precision computation for large-scale experiments. PyTorchVideo also connects seamlessly with other Meta AI tools such as Detectron2 and PyTorch3D for multimodal video analysis. Designed to accelerate research and deployment, it serves as a unified framework for reproducible, high-performance video AI development.
Features
- Modular library for video understanding with PyTorch integration
- Pretrained models for action recognition, detection, and classification
- Efficient data loaders and augmentation pipelines for large datasets
- Optimized implementations of SlowFast, X3D, and MViT architectures
- Distributed training, mixed precision, and production-ready inference tools
- Compatibility with Detectron2 and PyTorch3D for multimodal workflows