FairScale
PyTorch extensions for high performance and large scale training
FairScale is a collection of PyTorch performance and scaling primitives that pioneered many of the ideas now used for large-model training. It introduced Fully Sharded Data Parallel (FSDP) style techniques that shard model parameters, gradients, and optimizer states across ranks to fit bigger models into the same memory budget. The library also provides pipeline parallelism, activation checkpointing, mixed precision, optimizer state sharding (OSS), and auto-wrapping policies that reduce boilerplate in complex distributed setups. Its components are modular, so teams can adopt just the sharding optimizer or the pipeline engine without rewriting their training loop. ...