Megatron-LM
Ongoing research training transformer models at scale
...The repository provides both a reference training implementation and Megatron Core, a composable library of high-performance building blocks for custom large-model pipelines. It supports advanced parallelism strategies including tensor, pipeline, data, expert, and context parallelism, enabling training across massive multi-GPU and multi-node clusters. The framework includes mixed-precision training options such as FP16, BF16, FP8, and FP4 to maximize performance and memory efficiency on modern hardware. Megatron-LM is widely used in research and industry for pretraining GPT-, BERT-, T5-, and multimodal-style models, with tooling for checkpoint conversion and interoperability with Hugging Face. ...