The codebase was designed to help researchers and practitioners quickly reproduce FAIR’s results and leverage robust pre-trained backbones for downstream tasks. It also integrates Gradient Blending, an audio-visual modeling method that fuses modalities effectively (available in the Caffe2 implementation). Although VMZ is now archived and no longer actively maintained, it remains a valuable reference for understanding early large-scale video model training, transfer learning, and multimodal integration strategies that influenced modern architectures like SlowFast and X3D.
Features
- Implements R(2+1)D and MCx models for efficient spatiotemporal video representation learning
- Enables reproducibility of FAIR’s published video understanding research
- Built with both Caffe2 and PyTorch backends for flexibility
- Supports Gradient Blending for audio-visual fusion (Caffe2 only)
- Provides pre-trained models on IG-65M, one of the largest weakly-supervised video datasets
- Includes CSN (Channel-Separated Networks) for computationally efficient video recognition
License
Apache License V2.0Follow VMZ (Video Model Zoo)
Other Useful Business Software
$300 in Free Credit Towards Top Cloud Services
Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of VMZ (Video Model Zoo)!