MoCo v3 is a PyTorch reimplementation of Momentum Contrast v3 (MoCo v3), Facebook Research’s state-of-the-art self-supervised learning framework for visual representation learning using ResNet and Vision Transformer (ViT) backbones. Originally developed in TensorFlow for TPUs, this version faithfully reproduces the paper’s results on GPUs while offering an accessible and scalable PyTorch interface. MoCo v3 introduces improvements for training self-supervised ViTs by combining contrastive learning with transformer-based architectures, achieving strong linear and end-to-end fine-tuning performance on ImageNet benchmarks. The repository supports multi-node distributed training, automatic mixed precision, and linear scaling of learning rates for large-batch regimes. It also includes scripts for self-supervised pretraining, linear classification, and fine-tuning within the DeiT framework.
Features
- Compatible with ImageNet and standard vision benchmarks for transfer learning
- Configurable via command-line flags with scalable hyperparameters and batch settings
- Integrated scripts for self-supervised pretraining, linear evaluation, and DeiT fine-tuning
- Achieves strong ImageNet results (e.g., 74.6% linear top-1 on ResNet-50, 83.2% fine-tuned ViT-B)
- Supports large-scale multi-GPU distributed training with mixed precision
- PyTorch implementation of self-supervised MoCo v3 for ResNet and ViT models