Megatron
Ongoing research training transformer models at scale
...We developed efficient, model-parallel (tensor, sequence, and pipeline), and multi-node pre-training of transformer based models such as GPT, BERT, and T5 using mixed precision.
Megatron is also used in NeMo Megatron, a framework to help enterprises overcome the challenges of building and training sophisticated natural language processing models with billions and trillions of parameters.
Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.