Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
NVIDIA Neural Modules 2.4.0 source code.tar.gz | 2025-07-25 | 66.0 MB | |
NVIDIA Neural Modules 2.4.0 source code.zip | 2025-07-25 | 70.3 MB | |
README.md | 2025-07-25 | 23.0 kB | |
Totals: 3 Items | 136.3 MB | 4 |
Highlights
- Collections:
- Speech
- Batched beam search for transducers (RNN-T and TDT)
- RNNT/TDT buffered/streaming inference + batched decoding support in cache-aware
- add support for CTC batched beam search with GPU-LM
- Key fixes
- Punctuation Marks in Timestamps
- Fix timestamps when cuda graphs enabled
- Fix masking of \<pad> tokens in AED inference
- TDT streaming inference fix
- LLM
- Qwen 3 235B-A22B Perf Optimized
- DeepSeek V3 Perf Optimized
- Gemma3 support from Google
- Embedding and Reranker models
- MM
- Llama 4
- AVLM
- Training performance (speed)
- NVL sharp + IB sharp for DP/FSDP-communications on H100 and B200
- MXFP8 with TP communication overlap
- MXFP8 with reduced memory allocation
- FP8 sub-channel recipe (128x128 for weight and 1x128 for activation)
- cudnn fused attention for MLA (both Hopper and Blackwell)
- Advanced custom asymmetric pipelining (for MTP, loss func, and embd)
- BF16 optimizer for model memory saving
- CUDA graph fix for fine-tuning benchmarks
- CUDA graph support for LLAMA4