NVIDIA NeMo - Browse /v2.4.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
NVIDIA Neural Modules 2.4.0 source code.tar.gz	2025-07-25	66.0 MB	0
NVIDIA Neural Modules 2.4.0 source code.zip	2025-07-25	70.3 MB	4
README.md	2025-07-25	23.0 kB	0
Totals: 3 Items		136.3 MB	4

Highlights

Collections:
Speech
- Batched beam search for transducers (RNN-T and TDT)
- RNNT/TDT buffered/streaming inference + batched decoding support in cache-aware
- add support for CTC batched beam search with GPU-LM
- Key fixes
  - Punctuation Marks in Timestamps
  - Fix timestamps when cuda graphs enabled
  - Fix masking of \<pad> tokens in AED inference
  - TDT streaming inference fix
LLM
- Qwen 3 235B-A22B Perf Optimized
- DeepSeek V3 Perf Optimized
- Gemma3 support from Google
- Embedding and Reranker models
MM
- Llama 4
- AVLM
Training performance (speed)
NVL sharp + IB sharp for DP/FSDP-communications on H100 and B200
MXFP8 with TP communication overlap
MXFP8 with reduced memory allocation
FP8 sub-channel recipe (128x128 for weight and 1x128 for activation)
cudnn fused attention for MLA (both Hopper and Blackwell)
Advanced custom asymmetric pipelining (for MTP, loss func, and embd)
BF16 optimizer for model memory saving
CUDA graph fix for fine-tuning benchmarks
CUDA graph support for LLAMA4

Detailed Changelogs

ASR

Changelog

- ci: Fix ASR container by @ko3n1g :: PR: [#13288] - Set L2_Segmentation_Tool_Parallel_ctc_segmentation test to be optional by @chtruong814 :: PR: [#13296] - Revert "WebDataset URL refactoring" by @ko3n1g :: PR: [#13421] - Update flagged docs links by @erastorgueva-nv :: PR: [#13391] - [Docs] Fix incorrectly formatted reference tags by @erastorgueva-nv :: PR: [#13445] - Update CP by @pablo-garay :: PR: [#13532] - Tdt buffered inference fix by @hainan-xv :: PR: [#13500] - Fix transcribe when nbest hypotheses are returned by @lilithgrigoryan :: PR: [#13540] - Set ASR test to be optional by @chtruong814 :: PR: [#13633] - Enabling chunked inference for AED models in asr_evaluator by @melllinia :: PR: [#13674] - Ko3n1g/chore/asr only by @ko3n1g :: PR: [#13704] - decompressing joblib file before checking it by @Ssofja :: PR: [#13732] - Revert "decompressing joblib file before checking it (#13732)" by @chtruong814 :: PR: [#13791] - Punctuation Marks in Timestamps by @monica-sekoyan :: PR: [#13353] - AIStore with Webdataset by @monica-sekoyan :: PR: [#13604] - Update to add default for dataclass variables by @nithinraok :: PR: [#13814] - This PR addresses to known security issues by @Ssofja :: PR: [#13804] - remove model_stride var by @nithinraok :: PR: [#13867] - add CTC batched beam search by @lilithgrigoryan :: PR: [#13337] - Clean up streaming ASR script and tests by @artbataev :: PR: [#13894] - add NGPU-LM fusion during CTC greedy by @lilithgrigoryan :: PR: [#13917]

TTS

Changelog

- Revert "WebDataset URL refactoring" by @ko3n1g :: PR: [#13421] - Update flagged docs links by @erastorgueva-nv :: PR: [#13391] - [Docs] Fix incorrectly formatted reference tags by @erastorgueva-nv :: PR: [#13445] - Update CP by @pablo-garay :: PR: [#13532] - fix: vpp stage refactoring to match mcore by @ZhiyuLi-Nvidia :: PR: [#13673] - AIStore with Webdataset by @monica-sekoyan :: PR: [#13604]

NLP / NMT

Changelog

- Migrate Hyena to Megatron inference_context. by @cspades :: PR: [#13436] - Update CP by @pablo-garay :: PR: [#13532] - fix broken links by @dimapihtar :: PR: [#13544] - Add nlp import checks by @thomasdhc :: PR: [#13563] - PTQ model support, quant_cfg, and documentation updates by @janekl :: PR: [#13519] - feat - GPTSFTChatDataset alignment with OpenAI Messages, compatibility with packed sequences by @soluwalana :: PR: [#13367] - fix: vpp stage refactoring to match mcore by @ZhiyuLi-Nvidia :: PR: [#13673] - Fix resume with MegatronPretrainingBatchSampler by @ashors1 :: PR: [#13565] - Punctuation Marks in Timestamps by @monica-sekoyan :: PR: [#13353] - Revert `Adding more doc-strings to megatron_parallel.py #12767` by @ko3n1g :: PR: [#13824] - reasoning model evaluation mmlu gpqa by @ruchaa-apte :: PR: [#13880] - Remove unused DynamicRetrievalServer and Bert dataset loader classes by @dimapihtar :: PR: [#14209] - Huvu/avlm qafix cherrypick from by @huvunvidia :: PR: [#14253]

Export

Changelog

- Improve Nemo2Exporter for Models Using Custom Modelling Files on HF by @suiyoubi :: PR: [#13400] - Adding more export tests by @oyilmaz-nvidia :: PR: [#13410] - Add Warning to Export when output_path exists by @suiyoubi :: PR: [#13465] - Move libsox-fmt-all from Dockerfile.ci.export_deploy to Dockerfile.ci by @chtruong814 :: PR: [#13452] - ci: Remove trt-llm breakpoint by @ko3n1g :: PR: [#13499] - Add Qwen2VL export_ckpt by @AtsunoriFujita :: PR: [#13398] - Add MLlama export_ckpt by @AtsunoriFujita :: PR: [#13346] - Update vLLMExporter to use vLLM V1 by @janekl :: PR: [#13498] - Add vLLM Mixtral and TRT-LLM qnemo export tests (plus a couple of bugfixes) by @janekl :: PR: [#13697] - Fix Qwen3 export + misc by @cuichenx :: PR: [#13679] - Extra int cast for successful tracing during ONNX export by @janekl :: PR: [#13782] - FP8 lora export by @cuichenx :: PR: [#13748] - Add PEFT export check by @cuichenx :: PR: [#13835] - Update llm api import_ckpt/export_ckpt docstring by @meatybobby :: PR: [#13714] - Use modelopt export and disable dataset calibration for weight only PTQ by @jenchen13 :: PR: [#13756]

Bugfixes

Changelog

- [automodel] move liger kernel patching by @akoumpa :: PR: [#13579]

Uncategorized

Changelog

- build: various bumps by @ko3n1g :: PR: [#13285] - ci: Fixes to selective triggering by @ko3n1g :: PR: [#13287] - ci: Set timeout by @ko3n1g :: PR: [#13294] - Set L2_NeMo_2_T5_Pretraining test as optional by @chtruong814 :: PR: [#13282] - Add test environment approval step for CI by @chtruong814 :: PR: [#13297] - update num nodes in deepseek v3 finetune recipe by @cuichenx :: PR: [#13314] - ci: Increase cache pool by @ko3n1g :: PR: [#13306] - Rename adam_with_cosine_annealing as adam since cosin LR is not setup by @ShriyaRishab :: PR: [#13315] - ci: Update test queue bot to not assume a workflow is launched from a PR by @chtruong814 :: PR: [#13318] - Fix TE pytorch attention doc link by @thomasdhc :: PR: [#13327] - ci: Add all recent buildcaches to update-buildcache job by @ko3n1g :: PR: [#13289] - Fix neva notebook by @yaoyu-33 :: PR: [#13334] - Fix transformer offline for CI/CD llama4 tests by @yaoyu-33 :: PR: [#13339] - [automodel] convert lm head to full tensor before passing to lce by @yuanzhedong :: PR: [#13319] - ci: No dups in queue by @ko3n1g :: PR: [#13352] - ci(hotfix): VLM CPU unit tests by @ko3n1g :: PR: [#13348] - vLLM==0.8.5 update by @janekl :: PR: [#13350] - ci: Allow bypassing approval by @ko3n1g :: PR: [#13365] - Avoid the need to specify optional attributes for lhotse/nemo reader functions by @pzelasko :: PR: [#13307] - ci: Fix selective-triggering for non-PR events by @ko3n1g :: PR: [#13374] - ci: Revert `no-concurrency-group-on-main` by @ko3n1g :: PR: [#13375] - ci: Improve no-fail-fast mechanism by @ko3n1g :: PR: [#13370] - 2d buckets estimation fix by @monica-sekoyan :: PR: [#13377] - ci: Fix scheduled runs by @ko3n1g :: PR: [#13378] - Ko3n1g/ci/fix nightly runs by @ko3n1g :: PR: [#13382] - [automodel] fix none issue in dataset for qwen model by @yuanzhedong :: PR: [#13311] - update table by @akoumpa :: PR: [#13397] - Improve test coverage for audio modules by @anteju :: PR: [#13333] - Disable failing maxine loss test by @anteju :: PR: [#13361] - Ko3n1g/ci/no notification on cancel by @ko3n1g :: PR: [#13403] - document fp8_recipe by @akoumpa :: PR: [#13405] - Weekly bump main by @ko3n1g :: PR: [#13408] - Handle boolean args for performance scripts and log received config by @guyueh1 :: PR: [#13291] - [automodel] add FirstRankPerNode by @akoumpa :: PR: [#13373] - tests: Disable flaky audio test by @ko3n1g :: PR: [#13429] - ci: Disable flaky audio test by @ko3n1g :: PR: [#13435] - Fix loss compute and reduction by @xrennvidia :: PR: [#13295] - ci: Skip link check on github links by @chtruong814 :: PR: [#13425] - Add NCCL cfg interface to perf scripts by @erhoo82 :: PR: [#13407] - ci: Success only if `Run CICD` label attached by @ko3n1g :: PR: [#13430] - ci: Add tests to selective triggering by @ko3n1g :: PR: [#13404] - ci: Remove jq by @ko3n1g :: PR: [#13440] - ci: Fix deps tree for tests by @ko3n1g :: PR: [#13443] - Ko3n1g/ci/fix dependency tree by @ko3n1g :: PR: [#13448] - Adding additional unit tests for the deploy module by @pthombre :: PR: [#13411] - [Audio] fix a flaky test (and also make some tests run faster) by @racoiaws :: PR: [#13439] - [automodel] ignore tail padding in TPS calculation by @akoumpa :: PR: [#13329] - Ko3n1g/ci/selective triggering 3 by @ko3n1g :: PR: [#13460] - ci: Disable broken neva tests by @ko3n1g :: PR: [#13461] - fix speechlm data module by @stevehuang52 :: PR: [#13362] - ci: Enter queue only with passing linting by @ko3n1g :: PR: [#13462] - Adding tests for Schroedinger Bridge model by @nasretdinovr :: PR: [#13401] - add more detailed description by @dimapihtar :: PR: [#13464] - [Audio] tests for score-based and flow matching enhancement models by @racoiaws :: PR: [#13406] - Use expandable cuda memory segmentation by @erhoo82 :: PR: [#13418] - Fix llava tokenizer caused nan issue by @yaoyu-33 :: PR: [#13466] - Remove cuda method from ModelPT by @erastorgueva-nv :: PR: [#13394] - Fix BNR 2 unit test + input, case where input length was not specified by @nitin9252 :: PR: [#13467] - ci: Do not run any tests if no match is found by @ko3n1g :: PR: [#13479] - Ko3n1g/ci/selective triggering 4 by @ko3n1g :: PR: [#13489] - Fix typo in the performance script by @youngeunkwon0405 :: PR: [#13487] - ci: No runs on main by @ko3n1g :: PR: [#13490] - ci: Upload on schedule by @ko3n1g :: PR: [#13491] - ci: Run selective triggering on dockerfiles and dependencies by @ko3n1g :: PR: [#13493] - [automodel] fallback FP8 + LCE -> FP8 + CE by @akoumpa :: PR: [#13349] - Update changelog for `r2.3.0` by @github-actions[bot] :: PR: [#13501] - Update 2.3.0 changelog by @chtruong814 :: PR: [#13504] - Enabling flash decode for float16 precision only by @pthombre :: PR: [#13471] - Fix changelog formatting by @chtruong814 :: PR: [#13505] - Updating the long context performance number for B200 by @youngeunkwon0405 :: PR: [#13468] - ci: Add more files to filter by @ko3n1g :: PR: [#13517] - Improve error message when HF checkpoint cannot be loaded by @ashors1 :: PR: [#13513] - Add Resume_path to llama_nemotron models by @suiyoubi :: PR: [#13515] - Add Llama4 GHA by @suiyoubi :: PR: [#13442] - add memory profile interface to perf scripts by @erhoo82 :: PR: [#13413] - Add fp8_param argument back to mixed precision plugin for backward compatibility by @guyueh1 :: PR: [#13522] - [automodel] add find_unused_parameters=True for DDP by @akoumpa :: PR: [#13366] - ci: Update success message by @ko3n1g :: PR: [#13541] - [Audio] TransformerUNet: predictive model support added by @nasretdinovr :: PR: [#13470] - Test Hyena mixer CP equivalency by @farhadrgh :: PR: [#13330] - use null tokenizer by @malay-nagda :: PR: [#13480] - ci: Remove optional marker by @ko3n1g :: PR: [#13469] - Update extra_requires and requirements by @thomasdhc :: PR: [#13359] - Fix default config for LlamaNemotron Ultra by @suiyoubi :: PR: [#13542] - [audio] Improve test coverage for audio losses by @anteju :: PR: [#13309] - deepseek finetuning callback error change by @SDcodehub :: PR: [#13483] - ci(fix): Add `__init__` to selective-triggering by @ko3n1g :: PR: [#13577] - nsys profile filename ranks info by @malay-nagda :: PR: [#13576] - chore: Update setup.py by @ko3n1g :: PR: [#13566] - Fix Llama importer by @suiyoubi :: PR: [#13583] - [automodel] fix --mbs/gbs dtype and chat-template by @akoumpa :: PR: [#13602] - Reconfigure 'limit_<train|val>_batches' by @maanug-nv :: PR: [#13523] - ci: Optional speech tests by @ko3n1g :: PR: [#13606] - [Automodel] Fix CP device_mesh issue, use PTL distsampler by @BoxiangW :: PR: [#13473] - [automodel] fix log message by @akoumpa :: PR: [#13612] - Tests for evaluation with NVIDIA Evals Factory by @chtruong814 :: PR: [#13627] - Fix ptl import in notebooks by @maanug-nv :: PR: [#13608] - [automodel] dist.abort -> dist.destroy_process_group by @akoumpa :: PR: [#13578] - Skip eval unit test by @chtruong814 :: PR: [#13635] - Fix image_processor config in Energon path by @AtsunoriFujita :: PR: [#13618] - Add Gemma3 VL model by @xiangxu-google :: PR: [#13536] - Set L2_NeMo_2_EVAL as optional by @chtruong814 :: PR: [#13644] - Update install to use pip install by @thomasdhc :: PR: [#13605] - Multi node settings for evaluation nemo-run script by @athitten :: PR: [#13568] - [Llama4] Fix the missing args in the recipe by @gdengk :: PR: [#13649] - Bump nvidia-modelopt to 0.29.0 by @AAnoosheh :: PR: [#13599] - Update README.md for 25.04 release by @snowmanwwg :: PR: [#13654] - [automodel] consolidate sft peft scripts by @akoumpa :: PR: [#13634] - Qwen3 by @cuichenx :: PR: [#13554] - Set env variables for eval tests by @marta-sd :: PR: [#13658] - build: multimodal-only by @ko3n1g :: PR: [#13665] - [Audio] TransformerUNet: predictive model tests added by @nasretdinovr :: PR: [#13648] - [automodel] consolidate vllm scripts by @akoumpa :: PR: [#13670] - build: Pin transformers by @ko3n1g :: PR: [#13675] - ci: Enable codecov checks by @ko3n1g :: PR: [#13497] - ci: Add `init-file-checker` by @ko3n1g :: PR: [#13684] - Add use_sharp and use user buffer registration args in perf scripts by @youngeunkwon0405 :: PR: [#13521] - Remove is-optional marker for L2_NeMo_2_EVAL by @marta-sd :: PR: [#13669] - gpu type and #devices CLI args by @malay-nagda :: PR: [#13620] - perf scripts updates by @malay-nagda :: PR: [#13456] - Use audio codec without discriminators in SpeechLM2 tests by @pzelasko :: PR: [#13711] - Update changelog for `r2.3.1` by @github-actions[bot] :: PR: [#13719] - Recipe default value fix for Llama4 by @suiyoubi :: PR: [#13696] - build: Lift numba by @ko3n1g :: PR: [#13735] - New key override for timestamps by @melllinia :: PR: [#13743] - Fixed Mllama Energon config by @AtsunoriFujita :: PR: [#13574] - Update convert_to_tarred_audio_dataset.py by @ssh-meister :: PR: [#13755] - Enable dropout recompute in LoRA by @michal2409 :: PR: [#13745] - Address VDR feedback for NeMo FW evaluations by @athitten :: PR: [#13701] - remove blocks unused to increase coverage by @romanbrickie :: PR: [#13511] - Fix Flux Recipe for FSDP/DDP by @suiyoubi :: PR: [#13715] - Try soften protobuf version requirement by @pablo-garay :: PR: [#13747] - Flux FP8 recipe by @Victor49152 :: PR: [#13584] - Gemma3 Fix and Tests by @suiyoubi :: PR: [#13661] - Disable local gradient checker in performance scripts by @erhoo82 :: PR: [#13768] - [Audio] Tests: training for mask, pred and SB models by @nasretdinovr :: PR: [#13736] - Refactor MSC integration in exp manager by @shunjiad :: PR: [#13626] - [fix] vpp error in Gemma3 by @ZhiyuLi-Nvidia :: PR: [#13784] - ci: Ensure approval queue fetches all CICD workflows using pagnation by @chtruong814 :: PR: [#13798] - ci: make_request in approval test queue appends next url for status checks only by @chtruong814 :: PR: [#13802] - Remove guard for masking tests and improve coverage by @anteju :: PR: [#13787] - fix: After mcore bump by @ko3n1g :: PR: [#13781] - Fix Gemma3VL training bugs by @sharanmayank :: PR: [#13766] - [NeMo 2.0] Remove the restriction of load_model_state_dict for cfsdp by @shjwudp :: PR: [#13512] - Add option to construct Llama model with Transformer Engine op fuser by @timmoon10 :: PR: [#13776] - [Evaluation] Add support for simple-evals and tasks that require logprobs by @marta-sd :: PR: [#13647] - remove stale section by @akoumpa :: PR: [#13759] - fix moe_router_pre_softmax for Mixtral by @akoumpa :: PR: [#13678] - fix: improve sequence length handling to fix nan in loss when turning on cudagraph by @katec846 :: PR: [#13779] - Gemma3 Energon Dataset by @suiyoubi :: PR: [#13813] - Rectify BLEU evaluation by @ankitapasad :: PR: [#13762] - ci: Moved workflows by @ko3n1g :: PR: [#13828] - ci: Moved templates by @ko3n1g :: PR: [#13830] - [Build] Bump bitsandbytes dependency to 0.45.5 (ubuntu 22.04 compatibility) by @pramodk :: PR: [#13789] - update for `PYTORCH_CUDA_ALLOC_CONF` env var by @malay-nagda :: PR: [#13837] - [Llama4] Enable VLM Dec cudagraph by @gdengk :: PR: [#13767] - Support MSC URL in LLM checkpointing by @shunjiad :: PR: [#13805] - additional metrics by @dimapihtar :: PR: [#13754] - Expand modelopt version range by @chtruong814 :: PR: [#13850] - Alit/nmh4b by @JRD971000 :: PR: [#13481] - [Tutorial] Train your own reasoning model in 48 hours on a single GPU by @Maghoumi :: PR: [#13853] - Enabled C2C-PCie bridge through NCCL by @sanandaraj5597 :: PR: [#13621] - Added safe loading of models by @nithinraok :: PR: [#13607] - Add NemotronH Performance Script by @guyueh1 :: PR: [#13528] - Hyena SE/MR B2B Kernel integration by @farhadrgh :: PR: [#13518] - chore: Destroy buildcache by @ko3n1g :: PR: [#13869] - tests: Fix Qwen test by @ko3n1g :: PR: [#13888] - fix: improve error handling in `is_multistorageclient_url` by @shunjiad :: PR: [#13885] - feat(eval): adds benchmark adapters that allow specisal reasoning models by @agronskiy :: PR: [#13709] - perf scripts 25.07 refactor by @malay-nagda :: PR: [#13875] - Fix E5 and LlamaEmbedding Conversion by @suiyoubi :: PR: [#13890] - Bug fix for NCCL vars by @sanandaraj5597 :: PR: [#13908] - Reranker Model Support by @suiyoubi :: PR: [#13876] - numa cmd in bash by @malay-nagda :: PR: [#13914] - Fix BERT issue with PP by @suiyoubi :: PR: [#13916] - [Llama4] Fix Vp_stage to enable VP for VLM llama4 by @gdengk :: PR: [#13873] - Enable NVTX profiling in MCore by @minitu :: PR: [#13820] - [Qwen3-MoE] Add Qwen3 MoE perf recipe for 30b and 235b by @gdengk :: PR: [#13895] - lazy import bnbconfig by @akoumpa :: PR: [#13919] - Set TRANSFORMERS_OFFLINE=1 and HF_HUB_OFFLINE=1 in CI tests by @chtruong814 :: PR: [#13932] - [peft] align adapter output shape with wrapped module output shape by @guyueh1 :: PR: [#13922] - [automodel] move only lora adapters to cpu by @akoumpa :: PR: [#13931] - Fix vp_stage not found when fsdp by @gautham-kollu :: PR: [#13817] - Fix single optional import if ModelOpt not installed by @AAnoosheh :: PR: [#13923] - Revert "Set TRANSFORMERS_OFFLINE=1 and HF_HUB_OFFLINE=1 in CI tests by @chtruong814 :: PR: [#13938] - Enable LoRA for TELinear layers by @cuichenx :: PR: [#13929] - Freeze tags in in `r2.4.0` by @github-actions[bot] :: PR: [#13945] - Cherry pick `Use jiwer less than 4.0.0 (13997)` into `r2.4.0` by @ko3n1g :: PR: [#13998] - Cherry pick `Remove container license reference (14010)` into `r2.4.0` by @ko3n1g :: PR: [#14017] - move classes to module to use __target__ feature by @nithinraok :: PR: [#14023] - Cherry pick `bf16 grads for bf16 jobs (14016)` into `r2.4.0` by @ko3n1g :: PR: [#14020] - Cherry pick `Remove nemo1 stable diffusion test (14018)` into `r2.4.0` by @ko3n1g :: PR: [#14019] - Version bump to `2.4.0rc1.dev0` by @github-actions[bot] :: PR: [#14047] - Cherry pick `Fix Loading Custom Quantization Config (13934)` into `r2.4.0` by @ko3n1g :: PR: [#13950] - Cherry pick `[automodel] fix sft notebook (14002)` into `r2.4.0` by @ko3n1g :: PR: [#14003] - Cherry pick `Use average reduction in FSDP grad reduce-scatter when grad dtype is … (13981)` into `r2.4.0` by @ko3n1g :: PR: [#14004] - Cherry pick `GPU memory logging update (13982)` into `r2.4.0` by @ko3n1g :: PR: [#14021] - Cherry pick `Remove kaldiio (14006)` into `r2.4.0` by @ko3n1g :: PR: [#14032] - Cherry pick `Set L2_NeMo_2_Flux_Import_Test to be optional (14056)` into `r2.4.0` by @ko3n1g :: PR: [#14058] - Cherry pick `Bump protobuf to 5.29.5 (14045)` into `r2.4.0` by @ko3n1g :: PR: [#14060] - Cherry pick `Detect hardware before enabling DeepEP (14022)` into `r2.4.0` by @ko3n1g :: PR: [#14068] - Version bump to `2.4.0rc2.dev0` by @github-actions[bot] :: PR: [#14115] - Cherry pick `Fix SFT Dataset Bug (13918)` into `r2.4.0` by @ko3n1g :: PR: [#14074] - Cherry pick `Align adapter shape with base linear output shape (14009)` into `r2.4.0` by @ko3n1g :: PR: [#14083] - Cherry pick `[MoE] Update the fp8 precision interface for llama4 and qwen3 (14094)` into `r2.4.0` by @ko3n1g :: PR: [#14104] - Cherry pick `[Llama4] Tokenizer naming update (14114)` into `r2.4.0` by @ko3n1g :: PR: [#14123] - Cherry pick `Bump to pytorch 25.05 container along with TE update (13899)` into `r2.4.0` by @ko3n1g :: PR: [#14145] - Cherry pick `Perf scripts updates (14005)` into `r2.4.0` by @ko3n1g :: PR: [#14129] - Cherry pick `Remove unstructured (14070)` into `r2.4.0` by @ko3n1g :: PR: [#14147] - Version bump to `2.4.0rc3.dev0` by @github-actions[bot] :: PR: [#14165] - Cherry pick `Add checkpoint info for NIM Embedding Expor Tutorial (14177)` into `r2.4.0` by @ko3n1g :: PR: [#14178] - Cherry pick `Fix dsv3 script (14007)` into `r2.4.0` by @ko3n1g :: PR: [#14182] - Cherry pick `405b perf script updates (14176)` into `r2.4.0` by @chtruong814 :: PR: [#14195] - Cherry pick `Fix nemotronh flops calculator (14161)` into `r2.4.0` by @chtruong814 :: PR: [#14202] - Cherry pick `Add option to disable gloo process groups` (#14156) into `r2.4.0` by @chtruong814 :: PR: [#14220] - Cherry pick `Remove g2p_en (14204)` into `r2.4.0` by @chtruong814 :: PR: [#14212] - Cherry pick `diffusion mock data null args (14173)` into `r2.4.0` by @chtruong814 :: PR: [#14217] - Cherry pick `perf-scripts: Change b200 config to EP8 (14207)` into `r2.4.0` by @chtruong814 :: PR: [#14223] - Cherry pick `Change RerankerSpecter Dataset question key (14200)` into `r2.4.0` by @chtruong814 :: PR: [#14224] - Cherry pick `Fix the forward when final_loss_mask is not present (14201)` into `r2.4.0` by @chtruong814 :: PR: [#14225] - Cherry pick `Fix Llama Nemotron Nano Importer (14222)` into `r2.4.0` by @chtruong814 :: PR: [#14226] - Cherry pick `[automodel] fix loss_mask pad token (14150)` into `r2.4.0` by @chtruong814 :: PR: [#14227] - [Performance script] FSDP-UBR related recipe update (#14208) by @youngeunkwon0405 :: PR: [#14233] - Fix for MCore dist ckpt loading [#14229] by @stevehuang52 :: PR: [#14239] - cherry-pick fix eval beam search ctc script by @lilithgrigoryan :: PR: [#14242] - Cherry pick `Moving export security fixes over here (14254)` into `r2.4.0` by @chtruong814 :: PR: [#14261] - Cherry pick `Confidence fix for tutorial (14250)` into `r2.4.0` by @chtruong814 :: PR: [#14266] - Cherry pick `added new models to documentation (14264)` into `r2.4.0` by @chtruong814 :: PR: [#14278] - Cherry-pick `FIx Flux & Flux_Controlnet initialization issue` (#14263) into `r2.4.0` by @chtruong814 :: PR: [#14273] - Cherry pick `update ffmpeg install (14237)` into `r2.4.0` by @chtruong814 :: PR: [#14279]

Source: README.md, updated 2025-07-25

NVIDIA NeMo Files

Toolkit for conversational AI

Highlights

Detailed Changelogs

ASR

TTS

NLP / NMT

Export

Bugfixes

Uncategorized

NVIDIA NeMo Files

Toolkit for conversational AI

Get an email when there's a new version of NVIDIA NeMo

Highlights

Detailed Changelogs

ASR

TTS

NLP / NMT

Export

Bugfixes

Uncategorized