The codebase was designed to help researchers and practitioners quickly reproduce FAIR’s results and leverage robust pre-trained backbones for downstream tasks. It also integrates Gradient Blending, an audio-visual modeling method that fuses modalities effectively (available in the Caffe2 implementation). Although VMZ is now archived and no longer actively maintained, it remains a valuable reference for understanding early large-scale video model training, transfer learning, and multimodal integration strategies that influenced modern architectures like SlowFast and X3D.
Features
- Implements R(2+1)D and MCx models for efficient spatiotemporal video representation learning
- Enables reproducibility of FAIR’s published video understanding research
- Built with both Caffe2 and PyTorch backends for flexibility
- Supports Gradient Blending for audio-visual fusion (Caffe2 only)
- Provides pre-trained models on IG-65M, one of the largest weakly-supervised video datasets
- Includes CSN (Channel-Separated Networks) for computationally efficient video recognition
License
Apache License V2.0Follow VMZ (Video Model Zoo)
Other Useful Business Software
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of VMZ (Video Model Zoo)!