ConvNeXt V2 is an evolution of the ConvNeXt architecture that co-designs convolutional networks alongside self-supervised learning. The V2 version introduces a fully convolutional masked autoencoder (FCMAE) framework where parts of the image are masked and the network reconstructs the missing content, marrying convolutional inductive bias with powerful pretraining. A key innovation is a new Global Response Normalization (GRN) layer added to the ConvNeXt backbone, which enhances feature competition across channels. The result is a convnet that competes strongly with transformer architectures on recognition benchmarks while being efficient and hardware-friendly. The repository provides official PyTorch implementations for multiple model sizes (Atto, Femto, Pico, up through Huge), conversion from JAX weights, code for pretraining/fine-tuning, and pretrained checkpoints. It supports both self-supervised pretraining and supervised fine-tuning.
Features
- Fully convolutional masked autoencoder pretraining (FCMAE)
- Global Response Normalization (GRN) to improve channel competition
- Multiple model sizes (Atto, Femto, Pico, Tiny, Base, Large, Huge)
- Support for self-supervised and supervised learning pipelines
- Pretrained checkpoints (converted from JAX) and PyTorch implementation
- Training/fine-tuning utilities and code for both pretrain and eval