Download Latest Version v2.4 source code.tar.gz (2.7 MB)
Email in envelope

Get an email when there's a new version of Transformer Engine

Home / v2.3
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2025-04-28 1.8 kB
v2.3 source code.tar.gz 2025-04-28 2.5 MB
v2.3 source code.zip 2025-04-28 2.8 MB
Totals: 3 Items   5.3 MB 0

Release Notes – Release 2.3

Key Features and Enhancements

  • [PyTorch] Sped up import of transformer_engine module by moving to a lazy compilation of functions using torch.compile.
  • [PyTorch] Enabled FP8 weights when using FSDP.
  • [C][PyTorch] Added support for Float8 block scaling recipe, as used in the Deepseek v3 paper, for Hopper GPUs.
  • [PyTorch] Made miscellaneous fixes to reduce CPU overhead.
  • [PyTorch] Added support for CPU offloading for activation tensors when using FP8 attention.
  • [PyTorch] Enabled MXFP8 recipe for the GroupedLinear module.
  • [PyTorch] Added a feature to support decoupling the weight gradient compute from the backward function of Transformer Engine modules. This allows users to call backward wgrad and gives them finer-grained control over when gradients are called to support certain advanced parallelism/overlap schemes.
  • [PyTorch] Added support for staggered application of rope embedding to a sequence of inputs in a batch, depending on their starting positions.
  • [All] Added support for RTX 5090.

Fixed Issues

  • [PyTorch] Fixed a numerical bug with use of custom DDP from megatron-core.
  • [PyTorch] Fixed a crash when using the checkpoint method for activation recompute on non-Transformer Engine modules.

Known Issues in This Release

There are no known issues in this release.

Breaking Changes in This Release

  • [Jax] Praxis layers have been removed, as PAXML is no longer supported.

Deprecated Features

  • The installation for Transformer Engine now requires use of the –no-build-isolation flag when using PyPI package or building from source. Support for installations with build isolation will be removed in a future release.
  • [PyTorch] CPU offloading weight tensors is deprecated.
Source: README.md, updated 2025-04-28