Download Latest Version NVIDIA Neural Modules 2.4.0 source code.tar.gz (66.0 MB)
Email in envelope

Get an email when there's a new version of NVIDIA NeMo

Home / v2.4.0
Name Modified Size InfoDownloads / Week
Parent folder
NVIDIA Neural Modules 2.4.0 source code.tar.gz 2025-07-25 66.0 MB
NVIDIA Neural Modules 2.4.0 source code.zip 2025-07-25 70.3 MB
README.md 2025-07-25 23.0 kB
Totals: 3 Items   136.3 MB 4

Highlights

  • Collections:
  • Speech
    • Batched beam search for transducers (RNN-T and TDT)
    • RNNT/TDT buffered/streaming inference + batched decoding support in cache-aware
    • add support for CTC batched beam search with GPU-LM
    • Key fixes
      • Punctuation Marks in Timestamps
      • Fix timestamps when cuda graphs enabled
      • Fix masking of \<pad> tokens in AED inference
      • TDT streaming inference fix
  • LLM
    • Qwen 3 235B-A22B Perf Optimized
    • DeepSeek V3 Perf Optimized
    • Gemma3 support from Google
    • Embedding and Reranker models
  • MM
    • Llama 4
    • AVLM
  • Training performance (speed)
  • NVL sharp + IB sharp for DP/FSDP-communications on H100 and B200
  • MXFP8 with TP communication overlap
  • MXFP8 with reduced memory allocation
  • FP8 sub-channel recipe (128x128 for weight and 1x128 for activation)
  • cudnn fused attention for MLA (both Hopper and Blackwell)
  • Advanced custom asymmetric pipelining (for MTP, loss func, and embd)
  • BF16 optimizer for model memory saving
  • CUDA graph fix for fine-tuning benchmarks
  • CUDA graph support for LLAMA4

Detailed Changelogs

ASR

Changelog - ci: Fix ASR container by @ko3n1g :: PR: [#13288] - Set L2_Segmentation_Tool_Parallel_ctc_segmentation test to be optional by @chtruong814 :: PR: [#13296] - Revert "WebDataset URL refactoring" by @ko3n1g :: PR: [#13421] - Update flagged docs links by @erastorgueva-nv :: PR: [#13391] - [Docs] Fix incorrectly formatted reference tags by @erastorgueva-nv :: PR: [#13445] - Update CP by @pablo-garay :: PR: [#13532] - Tdt buffered inference fix by @hainan-xv :: PR: [#13500] - Fix transcribe when nbest hypotheses are returned by @lilithgrigoryan :: PR: [#13540] - Set ASR test to be optional by @chtruong814 :: PR: [#13633] - Enabling chunked inference for AED models in asr_evaluator by @melllinia :: PR: [#13674] - Ko3n1g/chore/asr only by @ko3n1g :: PR: [#13704] - decompressing joblib file before checking it by @Ssofja :: PR: [#13732] - Revert "decompressing joblib file before checking it (#13732)" by @chtruong814 :: PR: [#13791] - Punctuation Marks in Timestamps by @monica-sekoyan :: PR: [#13353] - AIStore with Webdataset by @monica-sekoyan :: PR: [#13604] - Update to add default for dataclass variables by @nithinraok :: PR: [#13814] - This PR addresses to known security issues by @Ssofja :: PR: [#13804] - remove model_stride var by @nithinraok :: PR: [#13867] - add CTC batched beam search by @lilithgrigoryan :: PR: [#13337] - Clean up streaming ASR script and tests by @artbataev :: PR: [#13894] - add NGPU-LM fusion during CTC greedy by @lilithgrigoryan :: PR: [#13917]

TTS

Changelog - Revert "WebDataset URL refactoring" by @ko3n1g :: PR: [#13421] - Update flagged docs links by @erastorgueva-nv :: PR: [#13391] - [Docs] Fix incorrectly formatted reference tags by @erastorgueva-nv :: PR: [#13445] - Update CP by @pablo-garay :: PR: [#13532] - fix: vpp stage refactoring to match mcore by @ZhiyuLi-Nvidia :: PR: [#13673] - AIStore with Webdataset by @monica-sekoyan :: PR: [#13604]

NLP / NMT

Changelog - Migrate Hyena to Megatron inference_context. by @cspades :: PR: [#13436] - Update CP by @pablo-garay :: PR: [#13532] - fix broken links by @dimapihtar :: PR: [#13544] - Add nlp import checks by @thomasdhc :: PR: [#13563] - PTQ model support, quant_cfg, and documentation updates by @janekl :: PR: [#13519] - feat - GPTSFTChatDataset alignment with OpenAI Messages, compatibility with packed sequences by @soluwalana :: PR: [#13367] - fix: vpp stage refactoring to match mcore by @ZhiyuLi-Nvidia :: PR: [#13673] - Fix resume with MegatronPretrainingBatchSampler by @ashors1 :: PR: [#13565] - Punctuation Marks in Timestamps by @monica-sekoyan :: PR: [#13353] - Revert `Adding more doc-strings to megatron_parallel.py #12767` by @ko3n1g :: PR: [#13824] - reasoning model evaluation mmlu gpqa by @ruchaa-apte :: PR: [#13880] - Remove unused DynamicRetrievalServer and Bert dataset loader classes by @dimapihtar :: PR: [#14209] - Huvu/avlm qafix cherrypick from by @huvunvidia :: PR: [#14253]

Export

Changelog - Improve Nemo2Exporter for Models Using Custom Modelling Files on HF by @suiyoubi :: PR: [#13400] - Adding more export tests by @oyilmaz-nvidia :: PR: [#13410] - Add Warning to Export when output_path exists by @suiyoubi :: PR: [#13465] - Move libsox-fmt-all from Dockerfile.ci.export_deploy to Dockerfile.ci by @chtruong814 :: PR: [#13452] - ci: Remove trt-llm breakpoint by @ko3n1g :: PR: [#13499] - Add Qwen2VL export_ckpt by @AtsunoriFujita :: PR: [#13398] - Add MLlama export_ckpt by @AtsunoriFujita :: PR: [#13346] - Update vLLMExporter to use vLLM V1 by @janekl :: PR: [#13498] - Add vLLM Mixtral and TRT-LLM qnemo export tests (plus a couple of bugfixes) by @janekl :: PR: [#13697] - Fix Qwen3 export + misc by @cuichenx :: PR: [#13679] - Extra int cast for successful tracing during ONNX export by @janekl :: PR: [#13782] - FP8 lora export by @cuichenx :: PR: [#13748] - Add PEFT export check by @cuichenx :: PR: [#13835] - Update llm api import_ckpt/export_ckpt docstring by @meatybobby :: PR: [#13714] - Use modelopt export and disable dataset calibration for weight only PTQ by @jenchen13 :: PR: [#13756]

Bugfixes

Changelog - [automodel] move liger kernel patching by @akoumpa :: PR: [#13579]

Uncategorized

Changelog - build: various bumps by @ko3n1g :: PR: [#13285] - ci: Fixes to selective triggering by @ko3n1g :: PR: [#13287] - ci: Set timeout by @ko3n1g :: PR: [#13294] - Set L2_NeMo_2_T5_Pretraining test as optional by @chtruong814 :: PR: [#13282] - Add test environment approval step for CI by @chtruong814 :: PR: [#13297] - update num nodes in deepseek v3 finetune recipe by @cuichenx :: PR: [#13314] - ci: Increase cache pool by @ko3n1g :: PR: [#13306] - Rename adam_with_cosine_annealing as adam since cosin LR is not setup by @ShriyaRishab :: PR: [#13315] - ci: Update test queue bot to not assume a workflow is launched from a PR by @chtruong814 :: PR: [#13318] - Fix TE pytorch attention doc link by @thomasdhc :: PR: [#13327] - ci: Add all recent buildcaches to update-buildcache job by @ko3n1g :: PR: [#13289] - Fix neva notebook by @yaoyu-33 :: PR: [#13334] - Fix transformer offline for CI/CD llama4 tests by @yaoyu-33 :: PR: [#13339] - [automodel] convert lm head to full tensor before passing to lce by @yuanzhedong :: PR: [#13319] - ci: No dups in queue by @ko3n1g :: PR: [#13352] - ci(hotfix): VLM CPU unit tests by @ko3n1g :: PR: [#13348] - vLLM==0.8.5 update by @janekl :: PR: [#13350] - ci: Allow bypassing approval by @ko3n1g :: PR: [#13365] - Avoid the need to specify optional attributes for lhotse/nemo reader functions by @pzelasko :: PR: [#13307] - ci: Fix selective-triggering for non-PR events by @ko3n1g :: PR: [#13374] - ci: Revert `no-concurrency-group-on-main` by @ko3n1g :: PR: [#13375] - ci: Improve no-fail-fast mechanism by @ko3n1g :: PR: [#13370] - 2d buckets estimation fix by @monica-sekoyan :: PR: [#13377] - ci: Fix scheduled runs by @ko3n1g :: PR: [#13378] - Ko3n1g/ci/fix nightly runs by @ko3n1g :: PR: [#13382] - [automodel] fix none issue in dataset for qwen model by @yuanzhedong :: PR: [#13311] - update table by @akoumpa :: PR: [#13397] - Improve test coverage for audio modules by @anteju :: PR: [#13333] - Disable failing maxine loss test by @anteju :: PR: [#13361] - Ko3n1g/ci/no notification on cancel by @ko3n1g :: PR: [#13403] - document fp8_recipe by @akoumpa :: PR: [#13405] - Weekly bump main by @ko3n1g :: PR: [#13408] - Handle boolean args for performance scripts and log received config by @guyueh1 :: PR: [#13291] - [automodel] add FirstRankPerNode by @akoumpa :: PR: [#13373] - tests: Disable flaky audio test by @ko3n1g :: PR: [#13429] - ci: Disable flaky audio test by @ko3n1g :: PR: [#13435] - Fix loss compute and reduction by @xrennvidia :: PR: [#13295] - ci: Skip link check on github links by @chtruong814 :: PR: [#13425] - Add NCCL cfg interface to perf scripts by @erhoo82 :: PR: [#13407] - ci: Success only if `Run CICD` label attached by @ko3n1g :: PR: [#13430] - ci: Add tests to selective triggering by @ko3n1g :: PR: [#13404] - ci: Remove jq by @ko3n1g :: PR: [#13440] - ci: Fix deps tree for tests by @ko3n1g :: PR: [#13443] - Ko3n1g/ci/fix dependency tree by @ko3n1g :: PR: [#13448] - Adding additional unit tests for the deploy module by @pthombre :: PR: [#13411] - [Audio] fix a flaky test (and also make some tests run faster) by @racoiaws :: PR: [#13439] - [automodel] ignore tail padding in TPS calculation by @akoumpa :: PR: [#13329] - Ko3n1g/ci/selective triggering 3 by @ko3n1g :: PR: [#13460] - ci: Disable broken neva tests by @ko3n1g :: PR: [#13461] - fix speechlm data module by @stevehuang52 :: PR: [#13362] - ci: Enter queue only with passing linting by @ko3n1g :: PR: [#13462] - Adding tests for Schroedinger Bridge model by @nasretdinovr :: PR: [#13401] - add more detailed description by @dimapihtar :: PR: [#13464] - [Audio] tests for score-based and flow matching enhancement models by @racoiaws :: PR: [#13406] - Use expandable cuda memory segmentation by @erhoo82 :: PR: [#13418] - Fix llava tokenizer caused nan issue by @yaoyu-33 :: PR: [#13466] - Remove cuda method from ModelPT by @erastorgueva-nv :: PR: [#13394] - Fix BNR 2 unit test + input, case where input length was not specified by @nitin9252 :: PR: [#13467] - ci: Do not run any tests if no match is found by @ko3n1g :: PR: [#13479] - Ko3n1g/ci/selective triggering 4 by @ko3n1g :: PR: [#13489] - Fix typo in the performance script by @youngeunkwon0405 :: PR: [#13487] - ci: No runs on main by @ko3n1g :: PR: [#13490] - ci: Upload on schedule by @ko3n1g :: PR: [#13491] - ci: Run selective triggering on dockerfiles and dependencies by @ko3n1g :: PR: [#13493] - [automodel] fallback FP8 + LCE -> FP8 + CE by @akoumpa :: PR: [#13349] - Update changelog for `r2.3.0` by @github-actions[bot] :: PR: [#13501] - Update 2.3.0 changelog by @chtruong814 :: PR: [#13504] - Enabling flash decode for float16 precision only by @pthombre :: PR: [#13471] - Fix changelog formatting by @chtruong814 :: PR: [#13505] - Updating the long context performance number for B200 by @youngeunkwon0405 :: PR: [#13468] - ci: Add more files to filter by @ko3n1g :: PR: [#13517] - Improve error message when HF checkpoint cannot be loaded by @ashors1 :: PR: [#13513] - Add Resume_path to llama_nemotron models by @suiyoubi :: PR: [#13515] - Add Llama4 GHA by @suiyoubi :: PR: [#13442] - add memory profile interface to perf scripts by @erhoo82 :: PR: [#13413] - Add fp8_param argument back to mixed precision plugin for backward compatibility by @guyueh1 :: PR: [#13522] - [automodel] add find_unused_parameters=True for DDP by @akoumpa :: PR: [#13366] - ci: Update success message by @ko3n1g :: PR: [#13541] - [Audio] TransformerUNet: predictive model support added by @nasretdinovr :: PR: [#13470] - Test Hyena mixer CP equivalency by @farhadrgh :: PR: [#13330] - use null tokenizer by @malay-nagda :: PR: [#13480] - ci: Remove optional marker by @ko3n1g :: PR: [#13469] - Update extra_requires and requirements by @thomasdhc :: PR: [#13359] - Fix default config for LlamaNemotron Ultra by @suiyoubi :: PR: [#13542] - [audio] Improve test coverage for audio losses by @anteju :: PR: [#13309] - deepseek finetuning callback error change by @SDcodehub :: PR: [#13483] - ci(fix): Add `__init__` to selective-triggering by @ko3n1g :: PR: [#13577] - nsys profile filename ranks info by @malay-nagda :: PR: [#13576] - chore: Update setup.py by @ko3n1g :: PR: [#13566] - Fix Llama importer by @suiyoubi :: PR: [#13583] - [automodel] fix --mbs/gbs dtype and chat-template by @akoumpa :: PR: [#13602] - Reconfigure 'limit_<train|val>_batches' by @maanug-nv :: PR: [#13523] - ci: Optional speech tests by @ko3n1g :: PR: [#13606] - [Automodel] Fix CP device_mesh issue, use PTL distsampler by @BoxiangW :: PR: [#13473] - [automodel] fix log message by @akoumpa :: PR: [#13612] - Tests for evaluation with NVIDIA Evals Factory by @chtruong814 :: PR: [#13627] - Fix ptl import in notebooks by @maanug-nv :: PR: [#13608] - [automodel] dist.abort -> dist.destroy_process_group by @akoumpa :: PR: [#13578] - Skip eval unit test by @chtruong814 :: PR: [#13635] - Fix image_processor config in Energon path by @AtsunoriFujita :: PR: [#13618] - Add Gemma3 VL model by @xiangxu-google :: PR: [#13536] - Set L2_NeMo_2_EVAL as optional by @chtruong814 :: PR: [#13644] - Update install to use pip install by @thomasdhc :: PR: [#13605] - Multi node settings for evaluation nemo-run script by @athitten :: PR: [#13568] - [Llama4] Fix the missing args in the recipe by @gdengk :: PR: [#13649] - Bump nvidia-modelopt to 0.29.0 by @AAnoosheh :: PR: [#13599] - Update README.md for 25.04 release by @snowmanwwg :: PR: [#13654] - [automodel] consolidate sft peft scripts by @akoumpa :: PR: [#13634] - Qwen3 by @cuichenx :: PR: [#13554] - Set env variables for eval tests by @marta-sd :: PR: [#13658] - build: multimodal-only by @ko3n1g :: PR: [#13665] - [Audio] TransformerUNet: predictive model tests added by @nasretdinovr :: PR: [#13648] - [automodel] consolidate vllm scripts by @akoumpa :: PR: [#13670] - build: Pin transformers by @ko3n1g :: PR: [#13675] - ci: Enable codecov checks by @ko3n1g :: PR: [#13497] - ci: Add `init-file-checker` by @ko3n1g :: PR: [#13684] - Add use_sharp and use user buffer registration args in perf scripts by @youngeunkwon0405 :: PR: [#13521] - Remove is-optional marker for L2_NeMo_2_EVAL by @marta-sd :: PR: [#13669] - gpu type and #devices CLI args by @malay-nagda :: PR: [#13620] - perf scripts updates by @malay-nagda :: PR: [#13456] - Use audio codec without discriminators in SpeechLM2 tests by @pzelasko :: PR: [#13711] - Update changelog for `r2.3.1` by @github-actions[bot] :: PR: [#13719] - Recipe default value fix for Llama4 by @suiyoubi :: PR: [#13696] - build: Lift numba by @ko3n1g :: PR: [#13735] - New key override for timestamps by @melllinia :: PR: [#13743] - Fixed Mllama Energon config by @AtsunoriFujita :: PR: [#13574] - Update convert_to_tarred_audio_dataset.py by @ssh-meister :: PR: [#13755] - Enable dropout recompute in LoRA by @michal2409 :: PR: [#13745] - Address VDR feedback for NeMo FW evaluations by @athitten :: PR: [#13701] - remove blocks unused to increase coverage by @romanbrickie :: PR: [#13511] - Fix Flux Recipe for FSDP/DDP by @suiyoubi :: PR: [#13715] - Try soften protobuf version requirement by @pablo-garay :: PR: [#13747] - Flux FP8 recipe by @Victor49152 :: PR: [#13584] - Gemma3 Fix and Tests by @suiyoubi :: PR: [#13661] - Disable local gradient checker in performance scripts by @erhoo82 :: PR: [#13768] - [Audio] Tests: training for mask, pred and SB models by @nasretdinovr :: PR: [#13736] - Refactor MSC integration in exp manager by @shunjiad :: PR: [#13626] - [fix] vpp error in Gemma3 by @ZhiyuLi-Nvidia :: PR: [#13784] - ci: Ensure approval queue fetches all CICD workflows using pagnation by @chtruong814 :: PR: [#13798] - ci: make_request in approval test queue appends next url for status checks only by @chtruong814 :: PR: [#13802] - Remove guard for masking tests and improve coverage by @anteju :: PR: [#13787] - fix: After mcore bump by @ko3n1g :: PR: [#13781] - Fix Gemma3VL training bugs by @sharanmayank :: PR: [#13766] - [NeMo 2.0] Remove the restriction of load_model_state_dict for cfsdp by @shjwudp :: PR: [#13512] - Add option to construct Llama model with Transformer Engine op fuser by @timmoon10 :: PR: [#13776] - [Evaluation] Add support for simple-evals and tasks that require logprobs by @marta-sd :: PR: [#13647] - remove stale section by @akoumpa :: PR: [#13759] - fix moe_router_pre_softmax for Mixtral by @akoumpa :: PR: [#13678] - fix: improve sequence length handling to fix nan in loss when turning on cudagraph by @katec846 :: PR: [#13779] - Gemma3 Energon Dataset by @suiyoubi :: PR: [#13813] - Rectify BLEU evaluation by @ankitapasad :: PR: [#13762] - ci: Moved workflows by @ko3n1g :: PR: [#13828] - ci: Moved templates by @ko3n1g :: PR: [#13830] - [Build] Bump bitsandbytes dependency to 0.45.5 (ubuntu 22.04 compatibility) by @pramodk :: PR: [#13789] - update for `PYTORCH_CUDA_ALLOC_CONF` env var by @malay-nagda :: PR: [#13837] - [Llama4] Enable VLM Dec cudagraph by @gdengk :: PR: [#13767] - Support MSC URL in LLM checkpointing by @shunjiad :: PR: [#13805] - additional metrics by @dimapihtar :: PR: [#13754] - Expand modelopt version range by @chtruong814 :: PR: [#13850] - Alit/nmh4b by @JRD971000 :: PR: [#13481] - [Tutorial] Train your own reasoning model in 48 hours on a single GPU by @Maghoumi :: PR: [#13853] - Enabled C2C-PCie bridge through NCCL by @sanandaraj5597 :: PR: [#13621] - Added safe loading of models by @nithinraok :: PR: [#13607] - Add NemotronH Performance Script by @guyueh1 :: PR: [#13528] - Hyena SE/MR B2B Kernel integration by @farhadrgh :: PR: [#13518] - chore: Destroy buildcache by @ko3n1g :: PR: [#13869] - tests: Fix Qwen test by @ko3n1g :: PR: [#13888] - fix: improve error handling in `is_multistorageclient_url` by @shunjiad :: PR: [#13885] - feat(eval): adds benchmark adapters that allow specisal reasoning models by @agronskiy :: PR: [#13709] - perf scripts 25.07 refactor by @malay-nagda :: PR: [#13875] - Fix E5 and LlamaEmbedding Conversion by @suiyoubi :: PR: [#13890] - Bug fix for NCCL vars by @sanandaraj5597 :: PR: [#13908] - Reranker Model Support by @suiyoubi :: PR: [#13876] - numa cmd in bash by @malay-nagda :: PR: [#13914] - Fix BERT issue with PP by @suiyoubi :: PR: [#13916] - [Llama4] Fix Vp_stage to enable VP for VLM llama4 by @gdengk :: PR: [#13873] - Enable NVTX profiling in MCore by @minitu :: PR: [#13820] - [Qwen3-MoE] Add Qwen3 MoE perf recipe for 30b and 235b by @gdengk :: PR: [#13895] - lazy import bnbconfig by @akoumpa :: PR: [#13919] - Set TRANSFORMERS_OFFLINE=1 and HF_HUB_OFFLINE=1 in CI tests by @chtruong814 :: PR: [#13932] - [peft] align adapter output shape with wrapped module output shape by @guyueh1 :: PR: [#13922] - [automodel] move only lora adapters to cpu by @akoumpa :: PR: [#13931] - Fix vp_stage not found when fsdp by @gautham-kollu :: PR: [#13817] - Fix single optional import if ModelOpt not installed by @AAnoosheh :: PR: [#13923] - Revert "Set TRANSFORMERS_OFFLINE=1 and HF_HUB_OFFLINE=1 in CI tests by @chtruong814 :: PR: [#13938] - Enable LoRA for TELinear layers by @cuichenx :: PR: [#13929] - Freeze tags in in `r2.4.0` by @github-actions[bot] :: PR: [#13945] - Cherry pick `Use jiwer less than 4.0.0 (13997)` into `r2.4.0` by @ko3n1g :: PR: [#13998] - Cherry pick `Remove container license reference (14010)` into `r2.4.0` by @ko3n1g :: PR: [#14017] - move classes to module to use __target__ feature by @nithinraok :: PR: [#14023] - Cherry pick `bf16 grads for bf16 jobs (14016)` into `r2.4.0` by @ko3n1g :: PR: [#14020] - Cherry pick `Remove nemo1 stable diffusion test (14018)` into `r2.4.0` by @ko3n1g :: PR: [#14019] - Version bump to `2.4.0rc1.dev0` by @github-actions[bot] :: PR: [#14047] - Cherry pick `Fix Loading Custom Quantization Config (13934)` into `r2.4.0` by @ko3n1g :: PR: [#13950] - Cherry pick `[automodel] fix sft notebook (14002)` into `r2.4.0` by @ko3n1g :: PR: [#14003] - Cherry pick `Use average reduction in FSDP grad reduce-scatter when grad dtype is … (13981)` into `r2.4.0` by @ko3n1g :: PR: [#14004] - Cherry pick `GPU memory logging update (13982)` into `r2.4.0` by @ko3n1g :: PR: [#14021] - Cherry pick `Remove kaldiio (14006)` into `r2.4.0` by @ko3n1g :: PR: [#14032] - Cherry pick `Set L2_NeMo_2_Flux_Import_Test to be optional (14056)` into `r2.4.0` by @ko3n1g :: PR: [#14058] - Cherry pick `Bump protobuf to 5.29.5 (14045)` into `r2.4.0` by @ko3n1g :: PR: [#14060] - Cherry pick `Detect hardware before enabling DeepEP (14022)` into `r2.4.0` by @ko3n1g :: PR: [#14068] - Version bump to `2.4.0rc2.dev0` by @github-actions[bot] :: PR: [#14115] - Cherry pick `Fix SFT Dataset Bug (13918)` into `r2.4.0` by @ko3n1g :: PR: [#14074] - Cherry pick `Align adapter shape with base linear output shape (14009)` into `r2.4.0` by @ko3n1g :: PR: [#14083] - Cherry pick `[MoE] Update the fp8 precision interface for llama4 and qwen3 (14094)` into `r2.4.0` by @ko3n1g :: PR: [#14104] - Cherry pick `[Llama4] Tokenizer naming update (14114)` into `r2.4.0` by @ko3n1g :: PR: [#14123] - Cherry pick `Bump to pytorch 25.05 container along with TE update (13899)` into `r2.4.0` by @ko3n1g :: PR: [#14145] - Cherry pick `Perf scripts updates (14005)` into `r2.4.0` by @ko3n1g :: PR: [#14129] - Cherry pick `Remove unstructured (14070)` into `r2.4.0` by @ko3n1g :: PR: [#14147] - Version bump to `2.4.0rc3.dev0` by @github-actions[bot] :: PR: [#14165] - Cherry pick `Add checkpoint info for NIM Embedding Expor Tutorial (14177)` into `r2.4.0` by @ko3n1g :: PR: [#14178] - Cherry pick `Fix dsv3 script (14007)` into `r2.4.0` by @ko3n1g :: PR: [#14182] - Cherry pick `405b perf script updates (14176)` into `r2.4.0` by @chtruong814 :: PR: [#14195] - Cherry pick `Fix nemotronh flops calculator (14161)` into `r2.4.0` by @chtruong814 :: PR: [#14202] - Cherry pick `Add option to disable gloo process groups` (#14156) into `r2.4.0` by @chtruong814 :: PR: [#14220] - Cherry pick `Remove g2p_en (14204)` into `r2.4.0` by @chtruong814 :: PR: [#14212] - Cherry pick `diffusion mock data null args (14173)` into `r2.4.0` by @chtruong814 :: PR: [#14217] - Cherry pick `perf-scripts: Change b200 config to EP8 (14207)` into `r2.4.0` by @chtruong814 :: PR: [#14223] - Cherry pick `Change RerankerSpecter Dataset question key (14200)` into `r2.4.0` by @chtruong814 :: PR: [#14224] - Cherry pick `Fix the forward when final_loss_mask is not present (14201)` into `r2.4.0` by @chtruong814 :: PR: [#14225] - Cherry pick `Fix Llama Nemotron Nano Importer (14222)` into `r2.4.0` by @chtruong814 :: PR: [#14226] - Cherry pick `[automodel] fix loss_mask pad token (14150)` into `r2.4.0` by @chtruong814 :: PR: [#14227] - [Performance script] FSDP-UBR related recipe update (#14208) by @youngeunkwon0405 :: PR: [#14233] - Fix for MCore dist ckpt loading [#14229] by @stevehuang52 :: PR: [#14239] - cherry-pick fix eval beam search ctc script by @lilithgrigoryan :: PR: [#14242] - Cherry pick `Moving export security fixes over here (14254)` into `r2.4.0` by @chtruong814 :: PR: [#14261] - Cherry pick `Confidence fix for tutorial (14250)` into `r2.4.0` by @chtruong814 :: PR: [#14266] - Cherry pick `added new models to documentation (14264)` into `r2.4.0` by @chtruong814 :: PR: [#14278] - Cherry-pick `FIx Flux & Flux_Controlnet initialization issue` (#14263) into `r2.4.0` by @chtruong814 :: PR: [#14273] - Cherry pick `update ffmpeg install (14237)` into `r2.4.0` by @chtruong814 :: PR: [#14279]
Source: README.md, updated 2025-07-25