verl - Browse /v0.7.1 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2026-03-16	53.2 kB	0
v0.7.1 source code.tar.gz	2026-03-16	1.8 MB	0
v0.7.1 source code.zip	2026-03-16	2.8 MB	0
Totals: 3 Items		4.6 MB	0

Highlight

Model Engine

Megatron

Support R3 router replay with vllm and sglang https://github.com/verl-project/verl/pull/4840 https://github.com/verl-project/verl/pull/4986 https://github.com/verl-project/verl/pull/5185
Support MTP training in SFT/RL https://github.com/verl-project/verl/pull/4981 https://github.com/verl-project/verl/pull/4936
LoRA training enhancement with megatron-bridge: actor/ref share, LoRA adapter only refit, etc https://github.com/verl-project/verl/pull/4673 https://github.com/verl-project/verl/pull/4632
Support Qwen3.5 series training with mbridge https://github.com/verl-project/verl/pull/5381

VeOmni

New veomni training backend with FSDP+SP+EP https://github.com/verl-project/verl/pull/4882

torchtitan

New torchtitan training backend with FSDP+TP+PP+CP+EP, roadmap: https://github.com/verl-project/verl/issues/5306

Rollout Engine

vLLM

Separate model runner from training process and refit weights by cuda ipc https://github.com/verl-project/verl/pull/4280
FP8 rollout enhancement
Upgrade to vllm==0.17.0

SGLang

Support router replay
FP8 rollout enhancement
Upgrade to sglang==0.5.9

TensorRT-LLM

New tensorrt-llm rollout backend, roadmap: https://github.com/verl-project/verl/issues/5042

Checkpoint Engine

Add checkpoint engine manager https://github.com/verl-project/verl/pull/5031
Add hccl, kimi checkpoint engine backend https://github.com/verl-project/verl/pull/4885 https://github.com/verl-project/verl/pull/4954

Trainer

one-step-off/fully async trainer refactor with verl-core
Unify checkpoint engine https://github.com/verl-project/verl/pull/5029
Unify partial rollout agent loop with auto resume https://github.com/verl-project/verl/pull/5487
Ascend NPU support for one-step-off/fully async

What's Changed

[ci] feat: add npu unit test by @yyyy2000 in https://github.com/verl-project/verl/pull/4626
[recipe] fix: workaround for making the one-step off-policy recipe compatible with IPv6 environments on Ascend NPU by @ji-huazhong in https://github.com/verl-project/verl/pull/4782
[fsdp] feat: integrate PrefixGrouper for GRPO training acceleration by @kevssim in https://github.com/verl-project/verl/pull/4368
[rollout] fix: use model_dump() for proper Pydantic serialization in token2text by @yurekami in https://github.com/verl-project/verl/pull/4706
[doc] chore: Change the name of npu unit test workflow by @yyyy2000 in https://github.com/verl-project/verl/pull/4800
[model] feat: support per sample temperature in trainer by @vermouth1992 in https://github.com/verl-project/verl/pull/4787
[tool] fix: add tools in single_turn_agent by @Junxiao-Zhao in https://github.com/verl-project/verl/pull/4798
[recipe] feat: migrate recipe to the dedicated repo verl-recipe as a submodule by @tongyx361 in https://github.com/verl-project/verl/pull/4795
[model] fix: fix temp dtype by @vermouth1992 in https://github.com/verl-project/verl/pull/4813
[vllm, sglang, rollout] fix: Fix a mistake when running run_qwen3_vl-30b-megatron.sh with latest verl and vllm0.12 by @cboss6 in https://github.com/verl-project/verl/pull/4810
[ckpt] feat: add checkpoint-engine abstraction by @wuxibin89 in https://github.com/verl-project/verl/pull/4775
[doc, ci] fix: Update Ascend doc and fix e2e_ascend CI by @FightingZhen in https://github.com/verl-project/verl/pull/4816
[trainer] feat: VeOmniEngine supports qwen3_vl ulysses by @A1waysBeenHere in https://github.com/verl-project/verl/pull/4806
[doc] chore: fix checkpoint engine image link by @wuxibin89 in https://github.com/verl-project/verl/pull/4821
[hardware] fix: automatically set device for SFT case by @A1waysBeenHere in https://github.com/verl-project/verl/pull/4828
[data] feat: TransferQueue - Update TransferQueue version and docs by @0oshowero0 in https://github.com/verl-project/verl/pull/4829
[doc] Update docs about fully_async_policy by @jsfanfanfan in https://github.com/verl-project/verl/pull/4826
[ckpt] fix: FSDP save ckpt after validation by @wdl339 in https://github.com/verl-project/verl/pull/4799
[perf] feat: Add MFU for Qwen3-VL dense by @zhihaofang1017 in https://github.com/verl-project/verl/pull/4753
[tool] fix: avoid nested ToolResponse in SandboxFusionTool by @Winston-Yuan in https://github.com/verl-project/verl/pull/4833
[vllm] fix: fix error in vllm patch for diff vllm version and add ci for moe with fp8 rollout by @Agoniii in https://github.com/verl-project/verl/pull/4824
[algo] feat: add optimal token baseline and variance proxy by @jiawei415 in https://github.com/verl-project/verl/pull/4678
[megatron] fix: Fix error in megatron workers by @zhihaofang1017 in https://github.com/verl-project/verl/pull/4832
[misc] feat: delete unnecessary base class in agent loop worker and vLLMHttpServer by @PeterSH6 in https://github.com/verl-project/verl/pull/4838
[misc] feat: consolidate tensordict before dispatch by @vermouth1992 in https://github.com/verl-project/verl/pull/4830
[training_utils] fix: json encode error in filelogger by @zhuangqh in https://github.com/verl-project/verl/pull/4811
[ckpt] chore: skip saving hf_checkpoint during megatron+lora training & add a separate lora merge script by @Junxiao-Zhao in https://github.com/verl-project/verl/pull/4839
[rollout, vllm] fix: accuracy issue in verl serve mode + vllm-ascend + dp + ep + tp scenarios by @leo-pony in https://github.com/verl-project/verl/pull/4783
[fsdp] feat: add validate process on trainer node when use_trainer_do_validate=True by @chenjiaoAngel in https://github.com/verl-project/verl/pull/4683
[misc] fix: recipe submodule accidentally been removed by @wuxibin89 in https://github.com/verl-project/verl/pull/4843
[worker, training_utils] fix: Engine Metric Aggregation by @JacobHelwig in https://github.com/verl-project/verl/pull/4778
[rollout] fix: configurable agent loop + multimodal data for fully-async by @XChen-Zero in https://github.com/verl-project/verl/pull/4842
[ci] test: switch the vlm rl test case in the npu environment to use the model engine by @ji-huazhong in https://github.com/verl-project/verl/pull/4844
[ckpt] fix: Megatron save ckpt after validation by @wdl339 in https://github.com/verl-project/verl/pull/4841
[megatron] feat: Share actor and ref in LoRA by @HollowMan6 in https://github.com/verl-project/verl/pull/4673
[fsdp, megatron] fix: Engine Rollout Worker LoRA Parameter Update by @JacobHelwig in https://github.com/verl-project/verl/pull/4836
[algo, rollout, sglang] feat: Support router replay with sglang by @moehanabi in https://github.com/verl-project/verl/pull/4840
[perf] feat: Add MFU for Qwen3-VL MoE by @zhihaofang1017 in https://github.com/verl-project/verl/pull/4859
[misc] fix: fix 3d position_ids for train_mini_batch by @wdl339 in https://github.com/verl-project/verl/pull/4860
fix(sft_trainer): Fix global_tokens and total_tokens metrics always showing 0.0 by @khazic in https://github.com/verl-project/verl/pull/4854
[rollout,vllm] feat: support vllm scheduling policy config and generate setting priority by @RobotGF in https://github.com/verl-project/verl/pull/4874
[ckpt] fix: prevent data loss when max_ckpt_to_keep=1 by @jreiml in https://github.com/verl-project/verl/pull/4873
[worker] feat: New engine share actor and ref for LoRA by @HollowMan6 in https://github.com/verl-project/verl/pull/4867
[worker] fix: new engine saves megatron LoRA adapters checkpoints by @HollowMan6 in https://github.com/verl-project/verl/pull/4866
[ckpt] fix: properly handle optimizer offloading for HybridDeviceOptimizer by @jreiml in https://github.com/verl-project/verl/pull/4870
[doc] chore: update verl meetup by @wuxibin89 in https://github.com/verl-project/verl/pull/4884
[vllm] Fix CLI argument serialization for list types by @jreiml in https://github.com/verl-project/verl/pull/4869
[data] fix: build_messages for multi-modal data by @ccilery in https://github.com/verl-project/verl/pull/4864
[rollout] fix: wrong display about Prometheus when using SGLang. by @jsfanfanfan in https://github.com/verl-project/verl/pull/4858
[vllm] fix: pad data_hp to be multiples of block_size by @Agoniii in https://github.com/verl-project/verl/pull/4835
[ci] feat: add ci to automatically submit PR request if precommit fails by @vermouth1992 in https://github.com/verl-project/verl/pull/4878
[doc] chore: async README backticks by @JacobHelwig in https://github.com/verl-project/verl/pull/4898
[doc] fix: correct typo in script comment by @Prozac614 in https://github.com/verl-project/verl/pull/4900
[veomni] refactor: minor refactoring to ensure veomni engine compatibility with forward_only mode by @ji-huazhong in https://github.com/verl-project/verl/pull/4889
[sglang] fix: sglang TP+DP support / port bug by @hustmf in https://github.com/verl-project/verl/pull/4715
Either remove + prefix: 'actor_rollout_ref.model.enable_activation_of… by @Tomsawyerhu in https://github.com/verl-project/verl/pull/4910
[vllm] fix: vllm_config arg gets removed in newer WorkerWrapperBase by @HollowMan6 in https://github.com/verl-project/verl/pull/4915
Correct Attention FLOPS estimation in flops_counter.py by @HaochenYuan in https://github.com/verl-project/verl/pull/4929
[algo, doc] feat: trust region sequence masking - (1) k3 KL avg and (2) veto for max criterion by @szrlee in https://github.com/verl-project/verl/pull/4544
[rollout] feat: use rollout and validate parallel process by @chenjiaoAngel in https://github.com/verl-project/verl/pull/4863
[model] feat: Add qwen3_vl_moe in VL_TYPE2INDEX for image_mask and vedio_mask computation by @A1waysBeenHere in https://github.com/verl-project/verl/pull/4923
[data] feat: TransferQueue - fix rm_score error of TransferQueue by @baymax591 in https://github.com/verl-project/verl/pull/4928
[vllm] fix: Update get_encoding import for vllm versions 0.13.0 and above by @xhx1022 in https://github.com/verl-project/verl/pull/4934
[ci] fix: Add hydra-core to pre-commit installation by @vermouth1992 in https://github.com/verl-project/verl/pull/4892
[data] feat: TransferQueue - Unify the return of reward by @walterchenchn in https://github.com/verl-project/verl/pull/4902
Revert "Correct Attention FLOPS estimation in flops_counter.py" by @vermouth1992 in https://github.com/verl-project/verl/pull/4937
[misc] fix: Correct Docstring arg in main() (PPO trainer) by @rfy48 in https://github.com/verl-project/verl/pull/4943
[misc] fix: resolve pre-commit hook execution errors by @ji-huazhong in https://github.com/verl-project/verl/pull/4941
[model] fix: qwen3-vl-30b npu_patch fix by @bjf-frz in https://github.com/verl-project/verl/pull/4888
[veomni] feat: support offloading/loading the veomni model/optimizer by @ji-huazhong in https://github.com/verl-project/verl/pull/4916
[rollout,mbridge] feat: add metrics for rollout num preempted and fix mbrideg freeze moe by @RobotGF in https://github.com/verl-project/verl/pull/4956
[single_controller] fix: pass max_colocate_count and detached params when merging RayResourcePool by @wdl339 in https://github.com/verl-project/verl/pull/4949
[single_controller] feat: Support dispatch/collect nested tensors with 3 or more dimensions by @JacobHelwig in https://github.com/verl-project/verl/pull/4940
[env] fix: upgrade torch, cudnn and deps versions in vllm image to fix performance issue by @Begunner in https://github.com/verl-project/verl/pull/4960
[training_utils] fix: correctly _resolve_device when not specified by @HollowMan6 in https://github.com/verl-project/verl/pull/4961
[trainer] fix: pass scores device type to group_mean_std call by @HollowMan6 in https://github.com/verl-project/verl/pull/4962
[training_utils] fix: Correct Attention TFLOPS estimation & fix CI by @HaochenYuan in https://github.com/verl-project/verl/pull/4959
[training_utils] A bug that caused device selection in group statistics to fail has been covered by tests. by @JohnConnor123 in https://github.com/verl-project/verl/pull/4967
[doc, data] fix: resolve broken documentation hyperlinks by @aphrodite1028 in https://github.com/verl-project/verl/pull/4970
[sglang, rollout] feat: support sglang as rollout engine in fully async policy by @AniZpZ in https://github.com/verl-project/verl/pull/4191
[megatron] feat: Using MTP in RL Training and Inference by @ArronHZG in https://github.com/verl-project/verl/pull/4936
[megatron] fix: fix megatron sync_weights oom on user_trainer_do_validate mode by @chenjiaoAngel in https://github.com/verl-project/verl/pull/4944
[rollout,vllm] fix: num_preempted metrics fix and typo correction in vllm async server by @RobotGF in https://github.com/verl-project/verl/pull/4976
[ray,rollout,trtllm] feat: Adding tensorrt_llm as new rollout engine by @joyang-nv in https://github.com/verl-project/verl/pull/4665
[data] fix: use lazy import for qwen_vl_utils in vision_utils.py by @Wheeeeeeeeels in https://github.com/verl-project/verl/pull/4991
[recipe,tool] feat: make GSM8K multiturn tool quickstart actually work by @letsgetai in https://github.com/verl-project/verl/pull/4998
[perf] feat: verl profiler system support Agent Loop scenario and integrate torch.profiler by @mengchengTang in https://github.com/verl-project/verl/pull/4320
[vllm] fix: vllm TP+DP suuport bug by @ccilery in https://github.com/verl-project/verl/pull/4969
[fsdp] fix: use module instead of function for fully_shard_module by @moaead in https://github.com/verl-project/verl/pull/5002
[misc] fix: update version in the main branch by @yyDing1 in https://github.com/verl-project/verl/pull/5006
[ckpt] feat: add Hccl ckpt engine backend by @hanhan-networking in https://github.com/verl-project/verl/pull/4885
[reward] fix: conditionally include reward_extra_keys in meta_info based on rm_scores presence by @none0663 in https://github.com/verl-project/verl/pull/5005
[rollout, vllm, sglang] fix: set default max_model_len by @ji-huazhong in https://github.com/verl-project/verl/pull/5018
[ci] fix: fix ci by @vermouth1992 in https://github.com/verl-project/verl/pull/5022
[ckpt] fix: npu load checkpoint by @Li-Yongwen in https://github.com/verl-project/verl/pull/4938
[megatron] fix: patch mcore for MLA support with flash_attn by @HollowMan6 in https://github.com/verl-project/verl/pull/4931
[BREAKING][worker, rollout, vllm] feat: implement vLLM colocated training-inference rollout with process separation by @jianjunzhong in https://github.com/verl-project/verl/pull/4280
[megatron] feat: LoRA adapter only refit (TensorLoRARequest) by @HollowMan6 in https://github.com/verl-project/verl/pull/4632
[veomni] refactor: no long check the attn_implementation/moe_implementation in VeOmniEngineConfig by @A1waysBeenHere in https://github.com/verl-project/verl/pull/5019
[veomni] feat: support model resharding between veomni and rollout engine by @ji-huazhong in https://github.com/verl-project/verl/pull/5033
[trtllm] fix: Fixes for TRTLLM rollout by @hchings in https://github.com/verl-project/verl/pull/5032
[ci] fix: docker transformers==4.57.6 by @yyyy2000 in https://github.com/verl-project/verl/pull/5053
[rollout] feat: set max_model_len by max_model_len or use max_position_embedding by @RobotGF in https://github.com/verl-project/verl/pull/5052
[ckpt] feat: add CheckpointEngineManager by @wuxibin89 in https://github.com/verl-project/verl/pull/5031
[doc] feat: add npu gspo practice by @wucong25 in https://github.com/verl-project/verl/pull/4988
[ci] chore: move to verl-project by @wuxibin89 in https://github.com/verl-project/verl/pull/5059
[vllm, sglang] feat: opt for FP8 rollout memory by @Agoniii in https://github.com/verl-project/verl/pull/4997
[model] feat: add API to support automatically support engine backend by @vermouth1992 in https://github.com/verl-project/verl/pull/5050
[megatron, training_utils] fix: Patch MoEAlltoAllTokenDispatcher.preprocess for router replay by @HollowMan6 in https://github.com/verl-project/verl/pull/4986
[rollout, perf, cfg] fix: Add global step info and support more profile control params for rollout profiling (sglang backend) by @bithighrr in https://github.com/verl-project/verl/pull/5025
[fsdp, megatron] feat: Support fully-async training on Ascend NPU by @acat-rw in https://github.com/verl-project/verl/pull/5043
[doc, trainer] fix: shoudn't use rollout routing replay data for R2 by @HollowMan6 in https://github.com/verl-project/verl/pull/4973
[doc] feat: add dapo multi model optimization practice by @ChibiQuest in https://github.com/verl-project/verl/pull/5044
[ci] chore: fix ci failure by @wuxibin89 in https://github.com/verl-project/verl/pull/5068
[ci] chore: fix npu ci failure by @wucong25 in https://github.com/verl-project/verl/pull/5064
[sglang,ci,doc] feat: Update Ascend Dockerfile and docker build workflow to 8.3.RC1 version for VeRL + Sglang by @xiazhahe in https://github.com/verl-project/verl/pull/5065
[megatron, training_utils] fix: router replay R3 align router replay data with global layer indices by @HollowMan6 in https://github.com/verl-project/verl/pull/5037
[trainer] fix: resolve dataset config in agent loop by @yyDing1 in https://github.com/verl-project/verl/pull/5034
[ckpt,rollout] fix: sleep_replicas before save_ckpt to avoid OOM by @RobotGF in https://github.com/verl-project/verl/pull/5079
[reward, ci] fix: colocate reward model ci break by @yyDing1 in https://github.com/verl-project/verl/pull/5084
[reward] fix: fix reward computation in _validate when use_reward_loop=True and reward_model.enable=True by @none0663 in https://github.com/verl-project/verl/pull/5054
[rollout] fix: fix cpu allocation error in tensorrt_llm rollout manager by @SchumiDing in https://github.com/verl-project/verl/pull/5085
Revert "[reward] fix: fix reward computation in _validate when use_reward_loop=True and reward_model.enable=True" by @wuxibin89 in https://github.com/verl-project/verl/pull/5091
[trtllm] fix: minor fixes to trtllm rollout by @hchings in https://github.com/verl-project/verl/pull/5095
[sglang] feat: add NPU GRPO training scripts for Qwen2.5-32B (FSDP/SGLang backends) by @xiazhahe in https://github.com/verl-project/verl/pull/5062
[doc] chore: update arch image by @wuxibin89 in https://github.com/verl-project/verl/pull/5106
[rollout] feat: automatically resume generation on abort by @wuxibin89 in https://github.com/verl-project/verl/pull/5071
[sglang, doc] feat: add NPU GRPO training scripts for Qwen3-30B (Megaton/SGLang backends) and update doc by @hustmf in https://github.com/verl-project/verl/pull/5060
[megatron] fix: megatron async save ckpt fix by @Leem-Li in https://github.com/verl-project/verl/pull/5016
[model] feat: add NPU GRPO training scripts for Qwen2.5-32B/Qwen3-30B (Megaton/vLLM backends) by @psyloy in https://github.com/verl-project/verl/pull/4984
[data] feat: Add support for Llama3.2-11-b-vision by @SchumiDing in https://github.com/verl-project/verl/pull/5112
[vllm] feat: revert to the default behavior of cudagraph_mode by @vermouth1992 in https://github.com/verl-project/verl/pull/5109
[fsdp] fix: Handle different transformers versions for Vision2Seq models in FSDP model merger by @liangxuZhang in https://github.com/verl-project/verl/pull/5108
[megatron] feat: Support MTP training in SFT by @arvyanh in https://github.com/verl-project/verl/pull/4981
[sglang] fix: update wiki to support speculative decode rollout by @ArronHZG in https://github.com/verl-project/verl/pull/5116
[training_utils] fix: add upcasting for seq_len_effective to avoid potential overflow in calculate_workload by @albertcity in https://github.com/verl-project/verl/pull/5110
[ci] feat: add npu workflow，e2e_sft_llm&model&reward_model_vllm by @yyyy2000 in https://github.com/verl-project/verl/pull/5039
[doc] chore: update readme for SPEAR algorithm by @yuleiqin in https://github.com/verl-project/verl/pull/5124
[vllm] feat: add shared memory support for weight transfer and IPC support checks by @jianjunzhong in https://github.com/verl-project/verl/pull/5089
Revert "[rollout] feat: automatically resume generation on abort" by @PeterSH6 in https://github.com/verl-project/verl/pull/5127
[sglang] fix: skip MoE router layers for FP8 quantization by @eternally-z in https://github.com/verl-project/verl/pull/5122
[vllm] feat: get gpt-oss encoding on demand by @vermouth1992 in https://github.com/verl-project/verl/pull/5131
[rollout,vllm] feat: revert default value of max_num_seqs by @RobotGF in https://github.com/verl-project/verl/pull/5139
[misc] feat: Modify transformers dependency version in requirements by @vermouth1992 in https://github.com/verl-project/verl/pull/5141
[ci] fix: fix ci by downgrade transformers to <5 by @vermouth1992 in https://github.com/verl-project/verl/pull/5143
[misc] chore: rename huggingface-cli to hf to favor transformers v5 by @vermouth1992 in https://github.com/verl-project/verl/pull/5145
[worker,rollout] refactor: remove set_expandable_segments calls in vllm separation mode by @RobotGF in https://github.com/verl-project/verl/pull/5144
[megatron] fix: patched_routing accepts arbitrary args by @HollowMan6 in https://github.com/verl-project/verl/pull/5155
Revert "[worker,rollout] refactor: remove set_expandable_segments calls in vllm separation mode" by @vermouth1992 in https://github.com/verl-project/verl/pull/5156
[model, algo] feat: implement SAC algorithm and support Pi0.5 model by @Miical in https://github.com/verl-project/verl/pull/5118
[training_utils] fix: Resolved bugs and conflicts in the fully async caused by multiple PRs by @ZLiao097 in https://github.com/verl-project/verl/pull/5100
[reward] feat: split reward loop manager and agent loop manager by @yyDing1 in https://github.com/verl-project/verl/pull/5134
[Doc] feat: update README to add new awesome project RuleReasoner by @jacklanda in https://github.com/verl-project/verl/pull/5157
[trainer] feat: move save_ckpt before update_weights and validate by @RobotGF in https://github.com/verl-project/verl/pull/5137
[ci,doc] feat: Add ascend_ci_guide by @yyyy2000 in https://github.com/verl-project/verl/pull/5163
[rollout] fix: remove dtype cast by @vermouth1992 in https://github.com/verl-project/verl/pull/5117
[hardware] fix: update architecture check and CANN toolkit path retrieval in device.py by @jianjunzhong in https://github.com/verl-project/verl/pull/5142
[vllm] fix: build_app() missing 1 required positional argument: 'supported_tasks' by @HollowMan6 in https://github.com/verl-project/verl/pull/5093
[perf] feat: clear megatron global buffer memory by @wuxibin89 in https://github.com/verl-project/verl/pull/5173
[vllm, rollout] fix: Use different seeds for vllm by @victordion in https://github.com/verl-project/verl/pull/5179
[ci] fix: fully async ci break by @yyDing1 in https://github.com/verl-project/verl/pull/5166
[vllm] feat: make seed configurable and different among replicas by @vermouth1992 in https://github.com/verl-project/verl/pull/5181
[data] fix: keyword video_metadata by @sophiayyya in https://github.com/verl-project/verl/pull/5177
[megatron] fix: checkpoints uses fully_reshardable by default when supported by @HollowMan6 in https://github.com/verl-project/verl/pull/5154
[trtllm, rollout] test: add unittest by @hchings in https://github.com/verl-project/verl/pull/5102
[reward] refactor: migrate all reward managers to the new asynchronous reward manager by @yyDing1 in https://github.com/verl-project/verl/pull/5189
[vllm] fix: handle multimodal inputs correctly in full async mode by @Silas-11 in https://github.com/verl-project/verl/pull/5160
[megatron] feat: fused kernel suppport for new model engine by @HollowMan6 in https://github.com/verl-project/verl/pull/5191
[fsdp] feat: Merge lora in fsdp training to speed up rollout by @amzfang in https://github.com/verl-project/verl/pull/5115
[megatron] Add Megatron-Bridge support in fully async policy by @eternally-z in https://github.com/verl-project/verl/pull/5196
[perf] fix: infer server profiler args fix by @mengchengTang in https://github.com/verl-project/verl/pull/5121
[doc, perf] feat: add perf_tuning_on_ascend by @tardis-key in https://github.com/verl-project/verl/pull/5104
[ci] feat: add three npu workflow yml test by @daikang6 in https://github.com/verl-project/verl/pull/4978
[vllm] fix: ignore MoE router layers for FP8 quantization by @zpqiu in https://github.com/verl-project/verl/pull/5107
[worker, training_utils] fix: Metric Aggregation Across DP Ranks by @JacobHelwig in https://github.com/verl-project/verl/pull/5203
[megatron] fix: add protections for logits_processor_args.pop("loss_mask"), which may cause the forward_fn of value net collapse by @albertcity in https://github.com/verl-project/verl/pull/5204
[trtllm] fix: reduce peak mem usage during update_weight() by @hchings in https://github.com/verl-project/verl/pull/5212
[algo] feat: support rollout router replay in MegatronEngine by @xhx1022 in https://github.com/verl-project/verl/pull/5185
[trtllm] fix: add synchronization before resume kv_cache to prevent oom in non-leader ranks by @shuyixiong in https://github.com/verl-project/verl/pull/5208
[perf] feat: add images_seqlens on mfu calculation for engine_worker by @alwaysyiyu in https://github.com/verl-project/verl/pull/5207
[reward] fix: Add assert to prevent reward NaN caused by overlong_cfg.len=0 by @ZLiao097 in https://github.com/verl-project/verl/pull/5216
[recipe] refactor: refactor ray trainer for separate recipe use. (fully async / one step off) by @ArronHZG in https://github.com/verl-project/verl/pull/5184
[BREAKING][reward] refactor: remove reward model worker code and invocation by @yyDing1 in https://github.com/verl-project/verl/pull/5194
[fsdp] fix: Support trust_remote_code during FSDP HugginFace checkpoint save by @thvasilo in https://github.com/verl-project/verl/pull/5200
[worker] feat: Avoid redundant base weight sync when engine doesn't sleep by @JohnConnor123 in https://github.com/verl-project/verl/pull/5147
[ci] chore: fix npu ci by @wucong25 in https://github.com/verl-project/verl/pull/5218
[vllm] fix: apply moe weight loader patch for standard wight loading by @zjchenn in https://github.com/verl-project/verl/pull/5234
[ci] chore: fix npu ci setuptools by @yyyy2000 in https://github.com/verl-project/verl/pull/5238
[ci] chore: fix npu ci setuptools, keep update pip and packaging by @yyyy2000 in https://github.com/verl-project/verl/pull/5239
[reward] fix: preserve input non_tensor_batch in AgentLoopManager when reward_loop_worker_handles is None by @none0663 in https://github.com/verl-project/verl/pull/5195
[perf] fix: fix npu profiling scripts by @tongtong0613 in https://github.com/verl-project/verl/pull/5226
[megatron] feat: use yaml to manage mbridge args by @Kite0011 in https://github.com/verl-project/verl/pull/4584
[algo] feat: reduce routed expert padding via NestedTensor and uint8 dtype by @xhx1022 in https://github.com/verl-project/verl/pull/5240
[ray,trainer] feat: add master port range configuration for port range by @RobotGF in https://github.com/verl-project/verl/pull/5201
[BREAKING][reward] refactor: deprecate batch reward manager by @yyDing1 in https://github.com/verl-project/verl/pull/5237
[fsdp] feat: add script for qwen3next training on npu platform by @zjchenn in https://github.com/verl-project/verl/pull/5236
[doc] fix: Update ascend_sglang_best_practices.rst by @hustmf in https://github.com/verl-project/verl/pull/5261
[vllm, rollout] feat: update abort function with vllm internal pause_generation api by @PeterSH6 in https://github.com/verl-project/verl/pull/5253
[veomni, trainer] feat: add rl support for veomni backend by @ji-huazhong in https://github.com/verl-project/verl/pull/4882
[vllm] fix: run post-load weight processing once after async IPC sync by @zjchenn in https://github.com/verl-project/verl/pull/5235
[doc] chore: version of dapo_multi_model_optimization_practice by @ChibiQuest in https://github.com/verl-project/verl/pull/5263
[rollout] feat: make more rollout flags configurable to trtllm backend by @Superjomn in https://github.com/verl-project/verl/pull/5258
[doc] refactor: update reward documents by @yyDing1 in https://github.com/verl-project/verl/pull/5272
[doc] chore: Ascend retool practice doc by @LeoYao123 in https://github.com/verl-project/verl/pull/5266
[vllm, rollout] fix: auto-downgrade cudagraph_mode to PIECEWISE when DCP is enabled by @Siritao in https://github.com/verl-project/verl/pull/5262
[fsdp, veomni, trainer] fix: restrict npu-patch scope to avoid veomni backend interference by @ji-huazhong in https://github.com/verl-project/verl/pull/5268
[BREAKING][reward] refactor: the full reward configuration by @yyDing1 in https://github.com/verl-project/verl/pull/5255
[ci] chore: delete redundant npu ci by @yyyy2000 in https://github.com/verl-project/verl/pull/5259
[fsdp, megatron] refactor: Refactor Fully Async Implementation via Engine Workers by @ZLiao097 in https://github.com/verl-project/verl/pull/5269
[megatron, model] chore: add example of nemotron nano v3 by @ISEEKYAN in https://github.com/verl-project/verl/pull/5284
[misc] chore: fix veomni_trainer.yaml by @wuxibin89 in https://github.com/verl-project/verl/pull/5285
[megatron] fix: fallback to moe_router_padding_for_fp8 in router replay patch by @xhx1022 in https://github.com/verl-project/verl/pull/5283
[reward] fix: backward compatibility with old reward config by @yyDing1 in https://github.com/verl-project/verl/pull/5287
[reward] fix: reward model args and reward_kwargs bug by @yyDing1 in https://github.com/verl-project/verl/pull/5289
[doc] chore: gspo update config and add version with npu by @chengminhua in https://github.com/verl-project/verl/pull/5279
[fsdp,veomni] fix: remove FSDPUlyssesShardingManager to make eval_mode/train_mode reentrant by @wuxibin89 in https://github.com/verl-project/verl/pull/5305
[veomni] refactor: Modify dp related parameters to align with FSDP backend and remove temporarily unsupported TP/PP/CP parameters by @ChengQianqian in https://github.com/verl-project/verl/pull/5303
[trtllm] feat: use max utilization scheduler by default by @tongyuantongyu in https://github.com/verl-project/verl/pull/5302
[worker, tool] fix: stabilize agent loop extra fields schema by @denismegerle in https://github.com/verl-project/verl/pull/5301
[algo] feat: add NPU SAPO training script for Qwen3-8B (FSDP/vLLM backends) by @Vvictorrrr in https://github.com/verl-project/verl/pull/5257
[fsdp, vllm] feat: add NPU GRPO training scripts for Qwen3-VL-8B (FSDP/VLLM backends) by @zhihaofang1017 in https://github.com/verl-project/verl/pull/5250
[fsdp, vllm] feat: add NPU GRPO training scripts for Qwen3-VL-30B (FSDP/VLLM backends) by @alwaysyiyu in https://github.com/verl-project/verl/pull/5260
[model,cfg] fix: type annotation for Lora target_modules by @thvasilo in https://github.com/verl-project/verl/pull/5223
[megatron] feat: Support LoRA training with FP16 using Megatron-Bridge. by @xichengpro in https://github.com/verl-project/verl/pull/4648
[ci] fix: main pre-commit by @pengwu22 in https://github.com/verl-project/verl/pull/5318
[misc] refactor: delete remaining batch-mode code in single controller by @ji-huazhong in https://github.com/verl-project/verl/pull/5319
[rollout] fix: make skip rollout compatible with async mode by @ChengQianqian in https://github.com/verl-project/verl/pull/5320
[veomni, trainer] fix: padding pixel value with padding_scale for vl model by @A1waysBeenHere in https://github.com/verl-project/verl/pull/5322
[fsdp,algo] feat: add NVFP4 QAT (Quantization-Aware Training) support by @zhangyimi in https://github.com/verl-project/verl/pull/5190
[docs] Add new awesome work using Verl by @MING-ZCH in https://github.com/verl-project/verl/pull/5328
[vllm] feat: remove workers from vLLMHttpServer by @tongyx361 in https://github.com/verl-project/verl/pull/5330
Revert "[vllm] feat: remove workers from vLLMHttpServer" by @PeterSH6 in https://github.com/verl-project/verl/pull/5333
[misc] refactor: remove deprecated codes by @ji-huazhong in https://github.com/verl-project/verl/pull/5336
[misc] fix: include config files for experimental entrypoints in package data by @guillemgt in https://github.com/verl-project/verl/pull/5343
[ci] chore: set torch-npu to 2.7.1.post2 in ascend dockerfile by @ji-huazhong in https://github.com/verl-project/verl/pull/5345
Revert "[ci] chore: set torch-npu to 2.7.1.post2 in ascend dockerfile" by @ji-huazhong in https://github.com/verl-project/verl/pull/5353
[reward] fix: empty class_dict for standalone reward model resource pool by @yyDing1 in https://github.com/verl-project/verl/pull/5348
[trainer] feat: Add Torchtitan as alternative training engine by @acisseJZhong in https://github.com/verl-project/verl/pull/5051
[training_utils] fix: mask out-of-bounds vocab entries fused kernel LCE logsumexp by @EricMarcus-ai in https://github.com/verl-project/verl/pull/5349
[rollout] fix: Include routed_experts in ToolAgentLoop return value to support R3 router replay by @mirrorboat in https://github.com/verl-project/verl/pull/5368
[misc] fix: pass torch dtype when init random model by @HollowMan6 in https://github.com/verl-project/verl/pull/5370
[ci] chore: pin version cupy-cuda12x==13.6.0 by @wuxibin89 in https://github.com/verl-project/verl/pull/5377
[doc] chore: ascend add performance analysis guide and update some version info by @chengminhua in https://github.com/verl-project/verl/pull/5324
[trainer] feat: Support RL trainer with TorchtitanEngine by @acisseJZhong in https://github.com/verl-project/verl/pull/5356
[algo] feat: Exception for agg_loss when dp_size > 1 but global information is absent & fix: correct & consistent loss aggregation for "seq-mean-token-sum-norm" by @tongyx361 in https://github.com/verl-project/verl/pull/5366
[rollout] fix: make run_uvicorn behavior more reliable by @tongyuantongyu in https://github.com/verl-project/verl/pull/5383
[doc] feat: update documentation for The Optimal Token Baseline and Rollout Correction by @jiawei415 in https://github.com/verl-project/verl/pull/5380
[trainer] refactor: remove fsdp_sft_trainer.py by @wuxibin89 in https://github.com/verl-project/verl/pull/5382
[ci] fix: occasional CI failures caused by sglang server port conflicts by @pengwu22 in https://github.com/verl-project/verl/pull/5310
[fsdp] fix: add aggressive_empty_cache at end of init_model to prevent vLLM OOM by @EricMarcus-ai in https://github.com/verl-project/verl/pull/5384
[doc, worker] feat: Enable Megatron-Bridge for MTP by @HollowMan6 in https://github.com/verl-project/verl/pull/5323
[ckpt] feat: add kimi ckpt engine backend by @kip-cxj in https://github.com/verl-project/verl/pull/4954
[misc] feat: ignore pyrightconfig.json to allow users to customize pyrightconfig to fix breaks by @tongyx361 in https://github.com/verl-project/verl/pull/5385
[ci] chore: update triton-ascend and fix npu ut by @yyyy2000 in https://github.com/verl-project/verl/pull/5396
[fsdp, megatron] feat: refactor fully-async and one-step-off training to support multiple checkpoint engine backends by @Shangwei-Li in https://github.com/verl-project/verl/pull/5029
[doc] feat: add fully async and one step off to PR Checklist by @ArronHZG in https://github.com/verl-project/verl/pull/5404
[doc] chore: ascend update gspo optimization practice document by @chengminhua in https://github.com/verl-project/verl/pull/5408
[algo] feat: add DPPO with binary TV or binary KL implementation by @QPHutu in https://github.com/verl-project/verl/pull/5397
[doc] chore: npu best practice doc by @hustmf in https://github.com/verl-project/verl/pull/5415
[algo] fix: seq mean and default scale factor loss_mask.shape[-1] as in seq-mean-token-sum-norm by @tongyx361 in https://github.com/verl-project/verl/pull/5417
[megatron] fix: missing model offload to CPU for forward_only mode by @xhx1022 in https://github.com/verl-project/verl/pull/5406
[megatron] feat: enhance model offloading and loading for frozen parameters by @RobotGF in https://github.com/verl-project/verl/pull/5412
[perf] fix: the overwritten of Torch_profile with multi steps. by @Rhetee in https://github.com/verl-project/verl/pull/5395
[trainer] feat: add padding for tensor alignment in preprocess_thd_no_padding function by @RobotGF in https://github.com/verl-project/verl/pull/5410
[tool] fix: handle empty image inputs in ToolAgentLoop by @denismegerle in https://github.com/verl-project/verl/pull/5420
[rollout, data] fix: honor train_max_samples/val_max_samples in fully async rollouter by @denismegerle in https://github.com/verl-project/verl/pull/5359
[tool] refactor: remove tool schema plumbing from SingleTurnAgentLoop by @denismegerle in https://github.com/verl-project/verl/pull/5425
[misc] feat: Add code for data grouping in no-padding scenario by @Kite0011 in https://github.com/verl-project/verl/pull/5424
[doc] add Dr. MAS to awesome work by @langfengQ in https://github.com/verl-project/verl/pull/5427
[BREAKING][rollout,cfg] refactor: get rid of actor_rollout_ref config from rollout by @wuxibin89 in https://github.com/verl-project/verl/pull/5418
[ci] chore: bump the version of vllm-ascend to v0.11.0 in the ascend dockerfile by @ji-huazhong in https://github.com/verl-project/verl/pull/5431
[doc] chore: fix npu docs by @wucong25 in https://github.com/verl-project/verl/pull/5428
[doc] fix: fix npu retool docs by @LeoYao123 in https://github.com/verl-project/verl/pull/5449
[data] refactor: TransferQueue - retire legacy integration codes by @0oshowero0 in https://github.com/verl-project/verl/pull/5454
[ci] fix: failed trtllm_unit_tests with attribute error by @HollowMan6 in https://github.com/verl-project/verl/pull/5446
[megatron] fix: pass dp_group to rearrange_micro_batches to fix DeepEP timeout by @xhx1022 in https://github.com/verl-project/verl/pull/5451
[rollout] fix: remove unexpected concurrency bound at 1000 by @tongyuantongyu in https://github.com/verl-project/verl/pull/5402
[data] fix: accept jsonl dataset files by @zqzten in https://github.com/verl-project/verl/pull/5456
[single_controller] refactor: use BatchData to simplify concat and chunk in single_controller by @zw0610 in https://github.com/verl-project/verl/pull/5450
[megatron] feat: Support DSA indexer LoRA mappings by @HollowMan6 in https://github.com/verl-project/verl/pull/5462
[doc] fix: fix typo in agentic rl doc by @KevinZeng08 in https://github.com/verl-project/verl/pull/5461
[misc] chore: support transformers 5 by @HollowMan6 in https://github.com/verl-project/verl/pull/5445
[doc] fix: fix dapo multi model practice by @ChibiQuest in https://github.com/verl-project/verl/pull/5453
[trainer] feat: Update trainer API for TorchtitanEngine by @acisseJZhong in https://github.com/verl-project/verl/pull/5457
[rollout] refactor: bucketed transfer utils by @pengwu22 in https://github.com/verl-project/verl/pull/5309
[rollout] feat: update trtllm docker by @Superjomn in https://github.com/verl-project/verl/pull/5386
[doc] fix: fix npu retool doc by @LeoYao123 in https://github.com/verl-project/verl/pull/5467
[ckpt] feat: add mooncake backend by @x1314aq in https://github.com/verl-project/verl/pull/5176
[doc] chore: add ascend backend feature by @wucong25 in https://github.com/verl-project/verl/pull/5466
[megatron] fix: support hybrid dense/MoE models in router replay with PP/VPP by @xhx1022 in https://github.com/verl-project/verl/pull/5452
[megatron] fix: patch support newer mcore version by @HollowMan6 in https://github.com/verl-project/verl/pull/5372
[ci] fix: sanity issue related to Last updated string by @HollowMan6 in https://github.com/verl-project/verl/pull/5477
[rollout] feat: support auto resume on abort in FullyAsyncLLMServerManager by @wuxibin89 in https://github.com/verl-project/verl/pull/5430
[trainer] feat: Support EP with TorchtitanEngine by @acisseJZhong in https://github.com/verl-project/verl/pull/5469
docs: fix typo in kl_penalty docstring by @ZHAOoops in https://github.com/verl-project/verl/pull/5481
[megatron] fix: add FP8 block quantization padding for EngineWorker by @zpqiu in https://github.com/verl-project/verl/pull/5440
[ckpt, model] fix: preserve lora_alpha in model_merger via training meta by @Yatogaii in https://github.com/verl-project/verl/pull/5326
[fsdp,algo] feat: Support QAT (NVFP4) in FSDPEngine for the unified engine_workers architecture by @zhangyimi in https://github.com/verl-project/verl/pull/5411
[doc] feat: add mtp spec log by @ArronHZG in https://github.com/verl-project/verl/pull/5491
[reward] feat: add example scripts for reward model usage by @yyDing1 in https://github.com/verl-project/verl/pull/5486
[BREAKING][trtllm] feat: Add FP8 refit support for trtllm rollout by @shuyixiong in https://github.com/verl-project/verl/pull/5374
[veomni,ci] fix: Modify default setting in veomni test scripts to prevent misunderstanding by @0oshowero0 in https://github.com/verl-project/verl/pull/5484
[ckpt] fix: test issues of kimi and mooncake backend by @x1314aq in https://github.com/verl-project/verl/pull/5500
[doc] chore: update FP8 guide with E2E training section and reorganization by @zpqiu in https://github.com/verl-project/verl/pull/5502
[model,doc] feat: add qwen3 32B megatron 1k to 256k by @ChibiQuest in https://github.com/verl-project/verl/pull/5497
[doc] chore: npu docker support vllm013 by @yyyy2000 in https://github.com/verl-project/verl/pull/5471
[doc] fix: update recipe link to fix 404 not found by @tardis-key in https://github.com/verl-project/verl/pull/5286
[ci] feat: add npu nightly ci by @daikang6 in https://github.com/verl-project/verl/pull/5225
[data] fix: use %-style format placeholders in logger.warning() by @cavities12 in https://github.com/verl-project/verl/pull/5512
[rollout] feat: global request-level load balancer single source routing by @aoshen524 in https://github.com/verl-project/verl/pull/5399
[rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout by @SchumiDing in https://github.com/verl-project/verl/pull/5149
Revert "[rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout" by @wuxibin89 in https://github.com/verl-project/verl/pull/5525
[ckpt] fix: Fix checkpoint engine backend unset error by @ZLiao097 in https://github.com/verl-project/verl/pull/5473
[rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout by @SchumiDing in https://github.com/verl-project/verl/pull/5528
[Megatron] feat: Support routing replay on NPU with performance and compatibility enhancements by @755651978 in https://github.com/verl-project/verl/pull/5298
[rollout] fix: update checkpoint_engine bucket size parameter for Ascend compatibility by @nuerxiati in https://github.com/verl-project/verl/pull/5539
[misc] feat: support dynamic bsz using group size by @Kite0011 in https://github.com/verl-project/verl/pull/5438
[fully_async, one_step_off] feat: support auto resume on abort when using fully_async by @ArronHZG in https://github.com/verl-project/verl/pull/5487
[doc] chore: add note for kimi ckpt engine by @kip-cxj in https://github.com/verl-project/verl/pull/5546
[perf, trainer, training_utils] fix: Try to montior with mlflow up to 3 times, and avoid duplicate key processing in each step. by @sheilaliuxl in https://github.com/verl-project/verl/pull/5548
[trainer] fix: support nsys when using sft_trainer_ray.py by @arvyanh in https://github.com/verl-project/verl/pull/5489
[rollout] fix: reintroduce NCCL_CUMEM_ENABLE for weight synchronization in async rollout environments by @RobotGF in https://github.com/verl-project/verl/pull/5522
[ci] feat: npu nightly ci log is redirected to the specified directory by @daikang6 in https://github.com/verl-project/verl/pull/5557
[ci] fix: sft_trainer_ray ci break by @wuxibin89 in https://github.com/verl-project/verl/pull/5562
[fsdp] fix: wrap embed_tokens/lm_head by name for peft models by @cavities12 in https://github.com/verl-project/verl/pull/5516
[ci] chore: update npu ci to vllm013 by @yyyy2000 in https://github.com/verl-project/verl/pull/5523
[algo] feat: support router replay in MegatronEngine by @xhx1022 in https://github.com/verl-project/verl/pull/5219
[docker] feat: update stable image to vllm==0.17.0, sglang==0.5.9 by @Begunner in https://github.com/verl-project/verl/pull/5542
[megatron, model] feat: qwen3.5 example by @ISEEKYAN in https://github.com/verl-project/verl/pull/5381
[algo] feat: add GDPO (Group reward-Decoupled Normalization Policy Optimization) algorithm by @Rhetee in https://github.com/verl-project/verl/pull/5503
[megatron] feat: model engine support mtp by @ArronHZG in https://github.com/verl-project/verl/pull/5561
[doc] fix: fix te pip install instructions by @TKONIY in https://github.com/verl-project/verl/pull/5501
[rollout] fix: agent loop copy read-only routed_experts before torch conversion by @HollowMan6 in https://github.com/verl-project/verl/pull/5519
[ci] chore: change machine for npu ci by @yyyy2000 in https://github.com/verl-project/verl/pull/5578
[megatron] fix: apply override_transformer_config inside mindspeed engine to avoid confict with other training engine by @ChengQianqian in https://github.com/verl-project/verl/pull/5589
[rollout] fix: fix some compatibility issue with qwen vl seris support of trtllm rollout by @SchumiDing in https://github.com/verl-project/verl/pull/5583
[misc] chore: bump version to 0.7.1 by @wuxibin89 in https://github.com/verl-project/verl/pull/5602