Megatron - Browse /core_v0.13.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
NVIDIA Megatron Core 0.13.0 source code.tar.gz	2025-07-25	9.3 MB	0
NVIDIA Megatron Core 0.13.0 source code.zip	2025-07-25	10.2 MB	1
README.md	2025-07-25	7.4 kB	0
Totals: 3 Items		19.6 MB	1

Features
Inference
- Add async support for DynamicInferenceEngine (MR !3187)
- Pad input tensors and enable FP8 weights for FP8 inference (MR !3341)
- Force inference to always gather logits with tensor parallelism (MR !3442)
- Multi batch size CUDA Graphs for Dynamic Inference (MR !3402)
Post-training
- ModelOpt updates (MR !3268)
- Add speculative decoding AR validation feature
- Add DeepSeek and Qwen model configs
Performance
- ModelCommProcessGroup integration (MR !3391)
- Add HyperCommGrid: N-Dimensional Communication Grid for Model Parallelism (MR !3398)
- Flexible creation and management of communication groups
- Add support for Spike No More embedding initializations and weight decay skipping (MR !3500)
Model support
- Add MiMo video VLM train example ([MR !3543)
- Add AVLM for MIMO ([MR !3624)
Ease of use
- Add uv support for source installs (MR !3615)
- Automated weekly prereleases (MR !3574)
Bug fixes
Use mscale_all_dim for softmax_factor (MR !2800)
Fix FP8 param blockwise scaling unit test (MR !3480)
Fix unit test blockwise scaling (MR !3491)
Optimize prefill for token-less requests (MR !3499)
Add default values for Fp8Padding and Fp8Unpadding (MR !3501)
Fix CUDA graph logic for flexible pp layout (MR !3505)
Load FP8 models with strict=False (MR !3508)
Skip rope check for torch \< 1.4.0 (MR !3528)
Disable Apex tests for stability (MR !3539)
Fix typo in parallel_state expert parallelism (MR !3548)
Guard modelopt on macOS (MR !3549)
Retry on CUDA function failure (MR !3554)
Fix NCCL mem pool creation error (MR !3557)
Fix get_rotary_seq_len return type (MR !3559)
Retry on CUDA function failure (MR !3560)
Fix NCCL allocator attribute error (MR !3565)
Ensure multi-prompt inference works (MR !3568)
Fix MD5 on FIPS systems (MR !3577)
Fixes dynamic context and inference bugs (MR !3582)
Fix TE version for interleaved fused RoPE (MR !3586)
Fix MTP with MoE and TP logging (MR !3594)
Guard TE import fix (MR !3596)
Add assertion for NCCL UB case (MR !3599)
Remove Encoder PP related Functions (MR !3604)
Fix segfaults in tests (MR !3605)
Fix TE error in distributed optimizer (MR !3625)
Remove redundant barrier in checkpoint flow (MR !3626)
Support VPP MTP, fix logging (MR !3630)
Retry mechanism for free(): invalid pointer errors (MR !3632)
Fix test_replication.py issues (MR !3633)
Fix typo in parallel_state (MR !3634)
Fix CUDA graph logic determination (MR !3635)
Fix TE installation error (MR !3636)
Ensure correct sharding type in local tests (MR !3643)
Fix cudagraphed backward buffer reuse for last layer (MR !3645)
Set default for packed_seq_params in get_rotary_seq_len (MR !3651)
Fix dynamic example script errors (MR !3653)
Guard TE import fix (MR !3666)
Known issues

Source: README.md, updated 2025-07-25

Megatron Files

Ongoing research training transformer models at scale

Megatron Files

Ongoing research training transformer models at scale

Get an email when there's a new version of Megatron