Megatron Files

Ongoing research training transformer models at scale

This is an exact mirror of the Megatron project, hosted at https://github.com/NVIDIA/Megatron-LM. SourceForge is not affiliated with Megatron. Megatron is a trademark of NVIDIA CORPORATION. For more information, see the SourceForge Open Source Mirror Directory.

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
NVIDIA Megatron Core 0.9.0 source code.tar.gz	2024-10-24	2.7 MB	0
NVIDIA Megatron Core 0.9.0 source code.zip	2024-10-24	3.3 MB	0
README.md	2024-10-24	703 Bytes	0
Totals: 3 Items		6.0 MB	0

Uneven pipeline parallelism
Enable pipeline parallelism where first and last ranks have fewer transformer layers than the intermediate ranks
Per layer CUDAGraph support for GPT training with Transformer Engine modules
Enable different TP sizes for the vision encoder
Enable pipeline parallelism for T5 & Llava models
Support multi-tile multi-image input in Llava models
MoE
FP8 support
Runtime upcycling support
Dispatcher implementation optimizations
Shared expert support with overlapping optimizations
- Qwen Model support
Known Issues
When using sequence parallel, during the transformer block forward pass, dropout is not using the appropriate rng context.

Source: README.md, updated 2024-10-24

Other Useful Business Software

Our Free Plans just got better! | Auth0 Icon

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Build Securely on AWS with Proven Frameworks Icon

Build Securely on AWS with Proven Frameworks

Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now

Recommended Projects

abstract2paper
Auto-generate an entire paper from a prompt or abstract using NLP
Apache OpenOffice
The free and Open Source productivity suite
DeSmuME: Nintendo DS emulator
DeSmuME is a Nintendo DS emulator
Clonezilla
A partition and disk imaging/cloning program
7-Zip
A free file archiver for extremely high compression