Classy Vision is a PyTorch-based framework designed for large-scale training and deployment of state-of-the-art image and video classification models. Developed by Facebook Research, it serves as an end-to-end system that simplifies the process of training at scale, reducing redundancy and friction in moving from research to production. Unlike traditional computer vision libraries that focus solely on modular components, Classy Vision provides a complete and unified framework, featuring distributed training, reproducible experiments, and flexible configuration tools. It offers high performance and scalability—capable of training models like ResNet-50 on ImageNet in just minutes—while remaining accessible to both researchers and production engineers. The library integrates seamlessly with PyTorch Hub for easy access to pretrained models and supports elastic training using PyTorch Elastic, making distributed training robust to changes in cluster resources or hardware failures.
Features
- End-to-end PyTorch framework for large-scale image and video classification
- Modular design for fast setup, flexible configuration, and easy customization
- High-performance distributed training with demonstrated scaling efficiency
- Seamless PyTorch Hub integration for pretrained model access and fine-tuning
- Elastic training support with PyTorch Elastic for resource-adaptive training
- AWS integration for large-scale experiments and smooth research-to-production transition