The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2026-01-05	41.4 kB	0
v0.7.0 source code.tar.gz	2026-01-05	2.6 MB	1
v0.7.0 source code.zip	2026-01-05	3.6 MB	1
Totals: 3 Items		6.3 MB	2

v0.7 release

Blog post: verl 0.7 release blog

Highlight

Model Engine

Integrate Megatron-Bridge and support LoRA/PEFT, see blog post: How We Build Trillion Parameter Reasoning RL with 10% GPUs
Support experimental fp8 training for megatron backend
Support new model for megatron backend: GPT-OSS, Qwen3-Next
Comprehensive support for new mode engine, FSDP and Megatron engine are production ready.
Dispatch tensordict with nested tensor instead of padded DataProto
Add TrainingWorker that resembles Tinker-like API
Add VLM support for model engine, SFT and RL trainer
Add model engine based critic model
Implement ActorRolloutRefWorker by TrainingWorker, support different backend in one worker
New VeOmni engine added, still in alpha status.

Rollout Engine

Remove SPMD rollout mode
Support blockwise fp8 rollout for vllm and sglang; support online quant for vllm with torchao
Experimental router replay support for vllm
Optimize multi-modal data fetch and preprocess, support video input
Upgrade to vllm==0.12.0; sglang==0.5.6

Reward

Support hybrid reward scenarios, including generative, discriminative, rule-based rewards, and their combinations.
Refactor reward models into server mode, supporting both colocated and standalone deployments.
Introduce new reward managers to handle more complex scenarios, limited mode for request rate control and remote mode for CPU-intensive tasks.

Algorithm

Add CISPO: Clipped IS-weight Policy Optimization
Add SAPO: Soft Adaptive Policy Optimization

Recipe

[NEW] VLA: add experimental support for VLA model
[NEW] rhymerl: History Rhymes: Accelerating LLM Reinforcement Learning with RhymeRL
TransferQueue: support multiple data partition and optimize tensor zero-copy serialization
One-step-off-policy/Fully async: optimize weight synchronization by checkpoint engine with bucket and pipeline support.

What's Changed

[data] fix: MultiturnSFTDataset handle messages with list args in tool call by @gongyisheng in https://github.com/volcengine/verl/pull/4125
[ci, doc] feat: Update Ascend Dockerfile and docker build workflow to 8.3.RC1 version by @FightingZhen in https://github.com/volcengine/verl/pull/4123
[data] fix: fix global_seqlen metric by @conver334 in https://github.com/volcengine/verl/pull/4129
[ci] fix: Optimize ascend docker build workflow and dockerfile to solve OOM problem by @FightingZhen in https://github.com/volcengine/verl/pull/4137
[ci] fix: fix error limiting MindSpeed cloning depth to one by @FightingZhen in https://github.com/volcengine/verl/pull/4140
[ci] feat: specify torch and torch_npu version into ascend dockerfile by @FightingZhen in https://github.com/volcengine/verl/pull/4141
[ci] fix: move torch and torch_npu install order in ascend dockerfile to ensure installed version correct by @FightingZhen in https://github.com/volcengine/verl/pull/4142
[ci] fix: Correct version relationship between torch and torchvision in ascend dockerfile by @FightingZhen in https://github.com/volcengine/verl/pull/4143
[doc] chore: Add one_step_off_policy support doc of Ascend NPU by @baymax591 in https://github.com/volcengine/verl/pull/4151
[rollout] fix: resource pool name in standalone mode by @PeterSH6 in https://github.com/volcengine/verl/pull/4149
[ci] feat: Update e2e_ascend CI image to 8.3.RC1 version, remove weekly validation workflow by @FightingZhen in https://github.com/volcengine/verl/pull/4146
[doc] chore: add pytorch conference materials by @hongpeng-guo in https://github.com/volcengine/verl/pull/4161
[rollout] fixup load_format=dummy update_weights not do process_weight… by @Annarine in https://github.com/volcengine/verl/pull/4130
[vllm] fix: Change parameter validation to align with vllm validation by @HelloWorldBeginner in https://github.com/volcengine/verl/pull/4153
[trainer] fix: reproducible problem when resume training by @wlhgtc in https://github.com/volcengine/verl/pull/4156
[recipe, tool] feat: support multi-turn and tool call for recipe/fully_async_policy by @sl-1314 in https://github.com/volcengine/verl/pull/4067
[cfg] fix: add rollout_correcton config field with omegaconf.open_dict by @tongyx361 in https://github.com/volcengine/verl/pull/4167
[doc] fix: Misc doc fixes by @kerrickstaley in https://github.com/volcengine/verl/pull/4171
[recipe] feat: add qwen3 8b grpo one_step_off_policy script on ASCEND NPU by @baymax591 in https://github.com/volcengine/verl/pull/4163
[BREAKING][rollout] feat: change rollout to server mode by default by @wuxibin89 in https://github.com/volcengine/verl/pull/4106
[algo] feat: Add RateLimitedRewardLoopManager with three-layer rate limiting for API-based rewards by @JoyboyBrian in https://github.com/volcengine/verl/pull/4107
[megatron] feat: load dist checkpoint with customized prefix for state dict keys. by @shevateng0 in https://github.com/volcengine/verl/pull/4139
[megatron] fix: Use tokenizer path or model path in config by @ashvinnihalani in https://github.com/volcengine/verl/pull/4091
[doc] chore: update docker installation guide by @wuxibin89 in https://github.com/volcengine/verl/pull/4155
[recipe] feat: DeepSeek-R1-Zero on Ascend NPU by @johnjunjun7 in https://github.com/volcengine/verl/pull/3427
[recipe] fix: compatibility with vLLM Qwen3Next model by @zjchenn in https://github.com/volcengine/verl/pull/4184
[recipe] fix: readme in recipe/r1_ascend by @HzZHoO in https://github.com/volcengine/verl/pull/4183
[recipe] fix: ReactAgentLoop error handling for failed LangGraph invocations by @le-czs in https://github.com/volcengine/verl/pull/4182
[ci] chore: Update e2e_ascend CI trigger policy by @FightingZhen in https://github.com/volcengine/verl/pull/4189
[recipe] fix: Qwen3-vl npu patch by @leisuzz in https://github.com/volcengine/verl/pull/4186
[rollout, doc] feat: limit tracing samples by @EricMarcus-ai in https://github.com/volcengine/verl/pull/4185
[worker, sglang] fix: Rename the file sglang_router.py to avoid circular imports by @Shiguang-Guo in https://github.com/volcengine/verl/pull/4187
[megatron, recipe] fix: error of megatron init while detached actor and rollout by @lalala-2 in https://github.com/volcengine/verl/pull/4179
[ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI by @wlf-darkmatter in https://github.com/volcengine/verl/pull/3465
[rollout, vllm] feat: support blockwise fp8 rollout by @Agoniii in https://github.com/volcengine/verl/pull/3519
[ci] feat: Move hf_transfer dependency to requirement file by @FightingZhen in https://github.com/volcengine/verl/pull/4210
[misc] feat: init random model supports custom code in the model by @HollowMan6 in https://github.com/volcengine/verl/pull/4217
[single_controller] feat: support dispatch tensordict by @vermouth1992 in https://github.com/volcengine/verl/pull/4213
[recipe, doc, ckpt] fix: error of ckpt in fully async by @lalala-2 in https://github.com/volcengine/verl/pull/4199
[megatron] feat: FP8 training by @ISEEKYAN in https://github.com/volcengine/verl/pull/4223
[megatron] feat: moe fp16 training by @HaochenYuan in https://github.com/volcengine/verl/pull/4158
[recipe] fix: incorrect reward function in fapo scripts by @yyDing1 in https://github.com/volcengine/verl/pull/4195
[rollout, vllm] feat: support blockwise FP8 rollout for vLLM v0.11 MoE RL by @jQizhang in https://github.com/volcengine/verl/pull/4222
[single_controller] feat: support multiple replicate worker in one resource pool by @yyDing1 in https://github.com/volcengine/verl/pull/4226
[megatron] fix: BF16 mode should use PAO as well by @ashvinnihalani in https://github.com/volcengine/verl/pull/4221
Revert "[megatron] fix: BF16 mode should use PAO as well" by @ISEEKYAN in https://github.com/volcengine/verl/pull/4234
[doc] feat: Add Search Self-Play to awesome work list by @Necolizer in https://github.com/volcengine/verl/pull/4245
[worker] feat: add support for colocate replicas by @yyDing1 in https://github.com/volcengine/verl/pull/4233
[trainer] feat: refactor workers with model engine by @wuxibin89 in https://github.com/volcengine/verl/pull/4211
[single_controller] feat: support resource pool split method by @yyDing1 in https://github.com/volcengine/verl/pull/4251
[recipe] fix: tighten async rollouter task handling by @le-czs in https://github.com/volcengine/verl/pull/4230
Revert "[single_controller] feat: support resource pool split method" by @vermouth1992 in https://github.com/volcengine/verl/pull/4258
Revert "[worker] feat: add support for colocate replicas" by @wuxibin89 in https://github.com/volcengine/verl/pull/4259
Revert "[single_controller] feat: support multiple replicate worker in one resource pool" by @vermouth1992 in https://github.com/volcengine/verl/pull/4260
[ci] fix: Fix triton-ascend unavailable error in Ascend dockerfile by @FightingZhen in https://github.com/volcengine/verl/pull/4254
[ci] fix: Fix error in ascend dockerfile by @FightingZhen in https://github.com/volcengine/verl/pull/4265
[rollout] fix: ensure weight sync regardless of free_cache_engine by @JobQiu in https://github.com/volcengine/verl/pull/4248
[doc] feat: add rollout&train consistency doc for Ascend Platform by @momo609 in https://github.com/volcengine/verl/pull/4166
[recipe] feat: allow customize agent name by @vermouth1992 in https://github.com/volcengine/verl/pull/4269
[ci] fix: Remove redundant uninstall command in e2e_ascend by @FightingZhen in https://github.com/volcengine/verl/pull/4267
[megatron] Fix: fix bugs in mcore backend context-parallel code logic by @Kite0011 in https://github.com/volcengine/verl/pull/4250
[recipe] feat: add Experimental VLA RL Support by @The-Hierophant in https://github.com/volcengine/verl/pull/3918
[recipe, data] feat: TransferQueue - Support managing multiple data partitions for Train/Val/Test in controller by @LLLLxmmm in https://github.com/volcengine/verl/pull/4175
[ci] feat: Increase e2e_sft timeout from 25 to 30 minutes by @vermouth1992 in https://github.com/volcengine/verl/pull/4279
[megatron] feat: Integrate Megatron-Bridge and support LoRA/PEFT by @HollowMan6 in https://github.com/volcengine/verl/pull/4063
[single_controller] feat: support resource_pool split by @yyDing1 in https://github.com/volcengine/verl/pull/4273
[recipe] feat: move recipes to new repository verl-recipe by @wuxibin89 in https://github.com/volcengine/verl/pull/4283
[worker] feat: restore colocate workers based on new splited resource pool by @yyDing1 in https://github.com/volcengine/verl/pull/4282
[misc] feat: Add actor_rollout_ref.actor.calculate_entropy for entropy fwd by @EduardDurech in https://github.com/volcengine/verl/pull/4239
[trainer] feat: Self-Normalized Importance Sampling by @EduardDurech in https://github.com/volcengine/verl/pull/3980
[ci, megatron] fix: add rotary_pos_cos_sin to forward by @HollowMan6 in https://github.com/volcengine/verl/pull/4291
[megatron] fix: pass trust_remote_code to get_generation_config by @jprellberg in https://github.com/volcengine/verl/pull/4196
[misc] fix: support nested datastructure in dataproto to convert to tensordict by @PeterSH6 in https://github.com/volcengine/verl/pull/4296
[ci] fix: use local hf model path by @wuxibin89 in https://github.com/volcengine/verl/pull/4299
[data] feat: TransferQueue - Support AgentLoop performance metrics & minor fix by @0oshowero0 in https://github.com/volcengine/verl/pull/4289
[recipe] feat: support reward_loop for recipe/fully_async_policy by @sl-1314 in https://github.com/volcengine/verl/pull/4224
[misc] fix: fix list conversion in get_tensordict by @PeterSH6 in https://github.com/volcengine/verl/pull/4304
[hardware] fix: Workaround for torch-npu's lack of support for creating nested tensors from NPU tensors. by @ji-huazhong in https://github.com/volcengine/verl/pull/4309
[rollout] fix: some compatibility changes in agent loop and reward by @pengwu22 in https://github.com/volcengine/verl/pull/4293
[worker] fix: do not pass router address and tokenizer is their value is none by @yyDing1 in https://github.com/volcengine/verl/pull/4310
[doc] chore: Update ascend quickstart doc by @FightingZhen in https://github.com/volcengine/verl/pull/4321
[misc] feat: add more utils of tensordict by @vermouth1992 in https://github.com/volcengine/verl/pull/4322
[recipe] fix: Fixed scripts for one_step_off_policy async not implemention by @baymax591 in https://github.com/volcengine/verl/pull/4350
[model] feat: refactor engine folder structure by @vermouth1992 in https://github.com/volcengine/verl/pull/4352
[recipe] feat: move char count recipe to verl-recipe by @vermouth1992 in https://github.com/volcengine/verl/pull/4351
[ci] chore: switch ascend ci calculation resource by @FightingZhen in https://github.com/volcengine/verl/pull/4347
feat(actor): add loss_scale_factor for seq-mean-token-sum-norm mode by @szrlee in https://github.com/volcengine/verl/pull/4360
[misc] refactor: clean up unused sharding managers by @ji-huazhong in https://github.com/volcengine/verl/pull/4361
[worker] feat: Add TrainingWorker that resembles Tinker-like API by @vermouth1992 in https://github.com/volcengine/verl/pull/4371
[vllm] fix: Fix issues that occur during the ACLGraph initialization process in the NPU. by @chengminhua in https://github.com/volcengine/verl/pull/4209
[megatron] feat: support gpt-oss by @ISEEKYAN in https://github.com/volcengine/verl/pull/4323
[megatron] fix: megatron async save ckpt fix by @Leem-Li in https://github.com/volcengine/verl/pull/4253
[misc] feat: Update news section in README.md by @vermouth1992 in https://github.com/volcengine/verl/pull/4385
[misc] fix: handle empty TensorDict in DataProto serialization by @le-czs in https://github.com/volcengine/verl/pull/4379
[trainer,fsdp] feat: enable reproducibility for training by @ji-huazhong in https://github.com/volcengine/verl/pull/4378
[trainer] feat: support ray-based sft trainer by @vermouth1992 in https://github.com/volcengine/verl/pull/4382
[megatron] feat: optimize the mbridge checkpoint saving speed by @ISEEKYAN in https://github.com/volcengine/verl/pull/4386
[rollout] feat: add support for discriminative reward model in reward loop by @yyDing1 in https://github.com/volcengine/verl/pull/4358
[recipe] feat: refactor one step off to support server mode by @ArronHZG in https://github.com/volcengine/verl/pull/4307
[misc] feat: support TensorDict in DataProtoFuture by @vermouth1992 in https://github.com/volcengine/verl/pull/4395
[fsdp] fix: Fixing the error caused by empty tensors in the multi_turn + remove_padding scenario by @nuerxiati in https://github.com/volcengine/verl/pull/4165
[doc] fix: add Geo-RS-Seq-TIS estimators and update documentation by @szrlee in https://github.com/volcengine/verl/pull/4359
[worker] feat: custom master addr port by @tongyx361 in https://github.com/volcengine/verl/pull/4389
[doc] feat: update reward loop document by @yyDing1 in https://github.com/volcengine/verl/pull/4404
[algo] feat: support router replay by @litianjian in https://github.com/volcengine/verl/pull/4101
[recipe] fix: FlowRL actor to pure implementation by @Xuekai-Zhu in https://github.com/volcengine/verl/pull/4397
[doc] feat: add more user instructions to reward loop doc by @yyDing1 in https://github.com/volcengine/verl/pull/4409
[doc] feat: add OneThinker link in readme by @appletea233 in https://github.com/volcengine/verl/pull/4410
[ci] fix: NPU not support router replay by @wuxibin89 in https://github.com/volcengine/verl/pull/4414
[worker] feat: custom reward_manager by @tongyx361 in https://github.com/volcengine/verl/pull/4387
[vllm] feat: retires vllm spmd mode in the codebase by @PeterSH6 in https://github.com/volcengine/verl/pull/4411
[sglang] fix: HTTP server startup issues for Prometheus and Grafana integration by @jsfanfanfan in https://github.com/volcengine/verl/pull/4408
[doc] chore: Update ascend quickstart and docker build guidance doc by @FightingZhen in https://github.com/volcengine/verl/pull/4420
[sglang] feat: retires sglang spmd mode in the codebase by @PeterSH6 in https://github.com/volcengine/verl/pull/4422
[fsdp] feat: update NPU fused kernels for Qwen3 moe block by @icerain-alt in https://github.com/volcengine/verl/pull/4406
[misc] refactor: clean up unused sharding manager by @ji-huazhong in https://github.com/volcengine/verl/pull/4439
[hardware] chore: clean npu_patch by @FightingZhen in https://github.com/volcengine/verl/pull/4436
[misc] fix: fix memory leakage when initializing multiple tools by @PeterSH6 in https://github.com/volcengine/verl/pull/4430
[trainer, vllm, megatron, recipe] feat: one/two step off async on-policy distillation recipe by @moehanabi in https://github.com/volcengine/verl/pull/3975
[misc] feat: optimize performance of index_select_tensor_dict by @vermouth1992 in https://github.com/volcengine/verl/pull/4444
[ci] test: Disable ReMax training test in vllm workflow by @PeterSH6 in https://github.com/volcengine/verl/pull/4445
[rollout] fix: RolloutConfig should support repetition_penalty config… by @Lokiscripter in https://github.com/volcengine/verl/pull/4398
[recipe] feat: add fully async comm between rollout and sim node in disagg mode by @HanlinDu in https://github.com/volcengine/verl/pull/4433
[misc] feat: optimize nested tensor index by @vermouth1992 in https://github.com/volcengine/verl/pull/4447
[model] feat: add qwen3-4b grpo script on ASCEND NPU A3 by @5082459 in https://github.com/volcengine/verl/pull/4432
[megatron] fix: Remove Deprecated Megatron Optimizer Args by @DaizeDong in https://github.com/volcengine/verl/pull/4396
[megatron] fix: respect use_distributed_optimizer in config by @HollowMan6 in https://github.com/volcengine/verl/pull/4392
[recipe, ci] fix: remove batch mode for remote generative reward model by @yyDing1 in https://github.com/volcengine/verl/pull/4448
[misc] feat: optimize rearrange_micro_batches by @vermouth1992 in https://github.com/volcengine/verl/pull/4451
[rollout, sglang] feat: support blockwise fp8 rollout by @Agoniii in https://github.com/volcengine/verl/pull/4415
[trainer] feat: model engine sft trainer support vlm model by @wuxibin89 in https://github.com/volcengine/verl/pull/4403
[trainer] feat: add reward loop config to default config by @yyDing1 in https://github.com/volcengine/verl/pull/4452
[vllm] feat: support abort generating requests in vllm server by @PeterSH6 in https://github.com/volcengine/verl/pull/4453
[ci] chore: cleanup some ci workflow by @wuxibin89 in https://github.com/volcengine/verl/pull/4459
[trainer] feat: allow override for reward_manager_worker in agent loop by @ryxli in https://github.com/volcengine/verl/pull/4423
[model] feat: enhances TrainingWorker by @vermouth1992 in https://github.com/volcengine/verl/pull/4461
[recipe] feat: Modify the way of obtaining default_runtime_env by @xichengpro in https://github.com/volcengine/verl/pull/4468
[rollout] fix: mlflow consecutive slashes by @BaiqingL in https://github.com/volcengine/verl/pull/4446
[fsdp] fix: reward model also reads override config attn_implementation by @pengwu22 in https://github.com/volcengine/verl/pull/4458
[vllm] fix: compatible to vllm0.12 by @ISEEKYAN in https://github.com/volcengine/verl/pull/4473
[model] feat: support manual control load/offload by @vermouth1992 in https://github.com/volcengine/verl/pull/4472
[ci] feat: Update e2e_ascend to improve CI execution efficiency by @FightingZhen in https://github.com/volcengine/verl/pull/4477
[ci] fix: Fix e2e_ascend sft test case error by @FightingZhen in https://github.com/volcengine/verl/pull/4481
[trainer] feat: support moving ppo actor logics to single controller by @vermouth1992 in https://github.com/volcengine/verl/pull/4480
[megatron] fix: correct typo in modeling_qwen2_megatron.py by @study8677 in https://github.com/volcengine/verl/pull/4486
[fsdp] fix: qwen3vlmoe with Monkey patch to fix a bug in transformers 4.57.x by @pengyanai in https://github.com/volcengine/verl/pull/4402
[ci] fix: fix format check error by @ji-huazhong in https://github.com/volcengine/verl/pull/4506
[hardware] feat: Auto set device_name to npu for Ascend NPU by @FightingZhen in https://github.com/volcengine/verl/pull/4489
[trainer] feat: make reward loop disrm default by @yyDing1 in https://github.com/volcengine/verl/pull/4466
[algo,doc] refactor: rollout correction by @szrlee in https://github.com/volcengine/verl/pull/4511
[trainer] feat: enable model engine based critic by @vermouth1992 in https://github.com/volcengine/verl/pull/4507
[vllm, rollout] feat: support reset prefix cache after abort by @PeterSH6 in https://github.com/volcengine/verl/pull/4519
[ci] chore: remove proxy settings in e2e_ascend by @FightingZhen in https://github.com/volcengine/verl/pull/4527
[rollout] fix: correct heap-based load balancing in AsyncLLMServerManager by @hellcatCS in https://github.com/volcengine/verl/pull/4505
[sglang, rollout] feat: delete remaining sglang spmd code by @PeterSH6 in https://github.com/volcengine/verl/pull/4523
[data] feat: TransferQueue - Add zero-copy serialization support & usage improvement by @0oshowero0 in https://github.com/volcengine/verl/pull/4429
[rollout] feat: pass agent_data to tool calling by @wuxibin89 in https://github.com/volcengine/verl/pull/4469
[megatron,ci] chore: update instructions and scripts for LoRA by @HollowMan6 in https://github.com/volcengine/verl/pull/4533
[megatron] chore: clean legacy code path part 1, make engine use mbridge by default by @ISEEKYAN in https://github.com/volcengine/verl/pull/4528
[megatron] chore: clean legacy code path part 2, clean legacy CI by @ISEEKYAN in https://github.com/volcengine/verl/pull/4529
[trainer] fix: model engine vlm multi_modal_inputs to NonTensorStack by @wuxibin89 in https://github.com/volcengine/verl/pull/4492
[ray] chore: Update Ray version dependency in requirements-npu.txt by @FightingZhen in https://github.com/volcengine/verl/pull/4543
[ci] chore: migrate all rm related ci to reward loop by @yyDing1 in https://github.com/volcengine/verl/pull/4520
[algo] fix: Add seq mean mask denominator option by @szrlee in https://github.com/volcengine/verl/pull/4510
[trainer] fix: change name for reward loop worker override by @ryxli in https://github.com/volcengine/verl/pull/4549
[rollout,vllm] feat: disable sleep mode in fully-async mode by @chenjiaoAngel in https://github.com/volcengine/verl/pull/4521
[rollout, trainer] feat: extend agent loop for custom implementations by @JoyboyBrian in https://github.com/volcengine/verl/pull/4548
[rollout] chore: update reward loop file names by @yyDing1 in https://github.com/volcengine/verl/pull/4547
[ci] fix: Add mbridge dependency into e2e_ascend by @FightingZhen in https://github.com/volcengine/verl/pull/4560
[doc] feat: add JupyterLab plugin instructions by @yqsstudy in https://github.com/volcengine/verl/pull/4536
[ci] feat: Increase e2e_sft timeout from 30 to 40 minutes by @vermouth1992 in https://github.com/volcengine/verl/pull/4552
[misc] chore: add "reward" tag to PR template by @yyDing1 in https://github.com/volcengine/verl/pull/4573
[BREAKING][recipe, ckpt] feat: support parameter sync by checkpoint-engine. only for fully_async mode. by @zpltys in https://github.com/volcengine/verl/pull/4427
[training_utils] fix: fix model enum acquire logic error in registry by @FightingZhen in https://github.com/volcengine/verl/pull/4577
[megatron] feat: add script for qwen3next training by @ISEEKYAN in https://github.com/volcengine/verl/pull/4582
[ci] fix: exclude FSDP-related source files from Megatron CI by @zzhbrr in https://github.com/volcengine/verl/pull/4574
[reward,ci] fix: cast by @tongyx361 in https://github.com/volcengine/verl/pull/4594
[vllm] feat: TensorLoRARequest support newer vLLM versions by @HollowMan6 in https://github.com/volcengine/verl/pull/4606
[misc] feat: always use robust get_event_loop by @tongyx361 in https://github.com/volcengine/verl/pull/4603
[trainer] feat: Implemented VeomniEngine as a alternative training backend by @A1waysBeenHere in https://github.com/volcengine/verl/pull/4072
[perf] fix: modify the NPU profiler default configuration by @tardis-key in https://github.com/volcengine/verl/pull/4475
[megatron] feat: support discrete profiling for mindspeed by @tardis-key in https://github.com/volcengine/verl/pull/4271
[doc] chore: update LoRA docs with megatron guidelines by @HollowMan6 in https://github.com/volcengine/verl/pull/4565
[reward] feat: Optimize reward computation when use_reward_loop=True by @none0663 in https://github.com/volcengine/verl/pull/4581
[rollout] chore: rename reward loop class name and update ci by @yyDing1 in https://github.com/volcengine/verl/pull/4572
[log] fix: fix wandb log validate run error on async-tool by @chenjiaoAngel in https://github.com/volcengine/verl/pull/4591
[sglang] fix: warmup_thread_args->warmup_thread_kwargs in aync_sglang_server.py by @EduardDurech in https://github.com/volcengine/verl/pull/4617
[reward] feat: use load_extern_object in get_custom_reward_fn, supporting pkg path by @tongyx361 in https://github.com/volcengine/verl/pull/4615
[vllm] fix: correctly pass params to from_lora_tensors in vLLM 0.12.0 by @HollowMan6 in https://github.com/volcengine/verl/pull/4614
[reward,doc] feat: enrich the reward loop documentation by @yyDing1 in https://github.com/volcengine/verl/pull/4619
[megatron] fix: fix MLA with sequence packing + CP by @wuweiqiang24 in https://github.com/volcengine/verl/pull/4611
[megatron, doc] refactor: update the megatron doc by @ISEEKYAN in https://github.com/volcengine/verl/pull/4630
[reward] feat: add retry to the request post method in the reward loop by @yyDing1 in https://github.com/volcengine/verl/pull/4628
[vllm] fix: LoRAModel import path change for vLLM 0.13.0 by @HollowMan6 in https://github.com/volcengine/verl/pull/4631
[misc] refactor: refactor flops counter by @vermouth1992 in https://github.com/volcengine/verl/pull/4633
[misc] feat: add importlib option to import external reward loop module by @PeterSH6 in https://github.com/volcengine/verl/pull/4635
[rollout] feat: ensure max_new_tokens is set correctly in sampling_params by @yanyc428 in https://github.com/volcengine/verl/pull/4634
[recipe] feat: accelerate rollout via model-free speculative decoding by @He-Jingkai in https://github.com/volcengine/verl/pull/4535
[training_utils] feat: use TMA to load Tiles in linear_cross_entropy kernels by @CtfGo in https://github.com/volcengine/verl/pull/4576
[data] feat: Add multimodal dataset fliter for user-customized results by @Kite0011 in https://github.com/volcengine/verl/pull/4608
[vllm] feat: Support online quant for rollout with torchao by @jerryzh168 in https://github.com/volcengine/verl/pull/3084
[misc] feat: Update news section in README.md by @vermouth1992 in https://github.com/volcengine/verl/pull/4646
[algo] feat: add cispo by @xvlincaigou in https://github.com/volcengine/verl/pull/4508
[data] feat: TransferQueue - remove redundant data collect for both TQ and DataProto by @0oshowero0 in https://github.com/volcengine/verl/pull/4618
[recipe, perf] feat: add nsys profiler support for env worker by @chenchaoxu7575 in https://github.com/volcengine/verl/pull/4463
[worker] fix: Add profiler initialization for ActorRolloutRefWorker in engine_worker by @pqhgit in https://github.com/volcengine/verl/pull/4586
[recipe, megatron, fsdp] fix: checkpoint-engine fix trainer param offload in fully-async mode by @zpltys in https://github.com/volcengine/verl/pull/4655
[doc] feat: Add fine-grained profiling tutorial for FSDP and Megatron on Ascend by @mengchengTang in https://github.com/volcengine/verl/pull/4610
[misc] feat: .git-blame-ignore-revs for large but non-informative commits by @tongyx361 in https://github.com/volcengine/verl/pull/4661
[doc] feat: Add OpenTinker to awesome work list by @zhusq20 in https://github.com/volcengine/verl/pull/4669
[fsdp] feat: Support zero2 optional feature for FSDP1 by @ZLiao097 in https://github.com/volcengine/verl/pull/4659
[rollout] fix: delete problematic assert for max_tokens <= response_length in multi-turn scenario by @PeterSH6 in https://github.com/volcengine/verl/pull/4668
[data] feat: TransferQueue - Support sync TransferQueue client & optimize clear interface and validation procedure by @0oshowero0 in https://github.com/volcengine/verl/pull/4660
[misc] fix: .git-blame-ignore-revs file is invalid by @HollowMan6 in https://github.com/volcengine/verl/pull/4674
[training_utils] fix: no allocator set when using TMA for kernels by @HollowMan6 in https://github.com/volcengine/verl/pull/4676
[fsdp] fix: replicate ref compute_log_prob (disable calculate_entropy ...) in LoRA by @HollowMan6 in https://github.com/volcengine/verl/pull/4675
[algo] SAPO algo by Qwen by @BounharAbdelaziz in https://github.com/volcengine/verl/pull/4345
[megatron] fix: megatron async save ckpt fix by @Leem-Li in https://github.com/volcengine/verl/pull/4638
[ci] fix: fix config by @vermouth1992 in https://github.com/volcengine/verl/pull/4685
Revert "[rollout] fix: delete problematic assert for max_tokens <= response_length in multi-turn scenario" by @vermouth1992 in https://github.com/volcengine/verl/pull/4687
[trainer, fsdp, megatron] feat: support one_step_off_policy on Ascend NPU by @baymax591 in https://github.com/volcengine/verl/pull/4686
[ci] test: add one step off policy test cases for npu by @ji-huazhong in https://github.com/volcengine/verl/pull/4485
fix(lora): use TOKEN_CLS task type for Critic model by @yurekami in https://github.com/volcengine/verl/pull/4695
fix: correct enable_activation_offload config parameter name by @yurekami in https://github.com/volcengine/verl/pull/4692
[misc] fix: deprecate rollout.mode config option by @yurekami in https://github.com/volcengine/verl/pull/4690
[ci] feat: Set Megatron related environment variable with ENV in Ascend dockerfile by @FightingZhen in https://github.com/volcengine/verl/pull/4699
[docker] feat: update stable image to vllm==0.12.0, sglang==0.5.6 by @Begunner in https://github.com/volcengine/verl/pull/4653
[rollout] fix: use configured response_length as default max_tokens in vLLM async server by @yurekami in https://github.com/volcengine/verl/pull/4703
[megatron] fix: set model to eval during compute_log_prob/compute_values by @HollowMan6 in https://github.com/volcengine/verl/pull/4708
[trainer] fix: fallback vision tower to flash_attention_2 for Qwen2.5-VL when u… by @aoshen524 in https://github.com/volcengine/verl/pull/4670
[docker] fix: new images for sgl056 and vllm012 have compatibility issues by @Begunner in https://github.com/volcengine/verl/pull/4714
[docs] feat: improve docstrings in tensordict_utils.py (#1345) by @yurekami in https://github.com/volcengine/verl/pull/4732
[rollout,docs] fix: improve error message (#4682) and docstrings (#1345) by @yurekami in https://github.com/volcengine/verl/pull/4729
docs: fix typos in code comments and messages by @yurekami in https://github.com/volcengine/verl/pull/4724
[training_utils] fix: RM extra scaling in KL/PG losses by @JacobHelwig in https://github.com/volcengine/verl/pull/4711
[deployment] feat: support build docker image with aarch64 platform by @rainj-me in https://github.com/volcengine/verl/pull/4605
[megatron] fix: Bump Megatron-Bridge commit for PEFT recompute by @HollowMan6 in https://github.com/volcengine/verl/pull/4702
[docs] feat: improve docstrings in seqlen_balancing.py (#1345) by @yurekami in https://github.com/volcengine/verl/pull/4731
[doc] feat: improve docstrings in torch_functional.py (#1345) by @yurekami in https://github.com/volcengine/verl/pull/4730
[reward] fix: make RateLimitedRewardManager accept legacy kwargs by @JoyboyBrian in https://github.com/volcengine/verl/pull/4739
[perf] feat: support profiler in model engine and sft trainer by @vermouth1992 in https://github.com/volcengine/verl/pull/4749
[ci] test: move cpu tests to volcengine machines by @Begunner in https://github.com/volcengine/verl/pull/4738
[trainer,megatron] fix: super tiny fix the issue of repeatedly importing the mindspeed patch by @ji-huazhong in https://github.com/volcengine/verl/pull/4751
[perf]feat: GPT-OSS mfu compute support by @mikequan0425 in https://github.com/volcengine/verl/pull/4750
[tool] fix: attach to existing MLflow run when MLFLOW_RUN_ID is set by @dubin555 in https://github.com/volcengine/verl/pull/4740
[trainer] fix: use dp_size instead of world_size in _balance_batch by @yurekami in https://github.com/volcengine/verl/pull/4697
[rollout] feat: Add vllm logprob mode and default processed_logprob by @RobotGF in https://github.com/volcengine/verl/pull/4755
[ci] fix: fix precommit by @vermouth1992 in https://github.com/volcengine/verl/pull/4760
[ci] test: migrate sft test cases on npu to model engine implementation by @ji-huazhong in https://github.com/volcengine/verl/pull/4762
[doc, cfg] fix: correct typos in training and docker configurations by @Racktic in https://github.com/volcengine/verl/pull/4767
[vllm] fix: use packaging.version for correct semantic version comparison by @Racktic in https://github.com/volcengine/verl/pull/4768
Revert "[algo] fix: Add seq mean mask denominator option" by @wuxibin89 in https://github.com/volcengine/verl/pull/4769
[data] feat: major refactor RLHFDataset for multi-modal data by @wuxibin89 in https://github.com/volcengine/verl/pull/4759
[perf] feat: add remote reward manager and fix math verify issue by @yyDing1 in https://github.com/volcengine/verl/pull/4752
[trainer] feat: enable ray-based sft trainer on ascend npu by @ji-huazhong in https://github.com/volcengine/verl/pull/4764
[training_utils] fix: Nested tensor micro-batching by @JacobHelwig in https://github.com/volcengine/verl/pull/4776
[worker] fix: Config for PPO batch size by @JacobHelwig in https://github.com/volcengine/verl/pull/4773
[ci] fix: fix cpu unit test by @vermouth1992 in https://github.com/volcengine/verl/pull/4774
[cfg] chore: remove redundant fields and fix typo by @JoyboyBrian in https://github.com/volcengine/verl/pull/4754
[worker] fix: Model engine parameter offload by @JacobHelwig in https://github.com/volcengine/verl/pull/4777
[fsdp] feat: integrate TiledMLP for memory-efficient MLP computation by @kevssim in https://github.com/volcengine/verl/pull/4649
[doc] chore: Update ascend_quick_start.rst by @wucong25 in https://github.com/volcengine/verl/pull/4609
[sglang, vllm, rollout] fix: use model's max_position_embeddings for max_model_len by @PeterSH6 in https://github.com/volcengine/verl/pull/4779
[doc] fix: reward_loop enable flag name by @zhuangqh in https://github.com/volcengine/verl/pull/4788
[doc] feat: add v0.7 release blog by @wuxibin89 in https://github.com/volcengine/verl/pull/4796