The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-10-15	118.6 kB	0
v0.6.0_ model engine, rollout server, composability source code.tar.gz	2025-10-15	1.4 MB	0
v0.6.0_ model engine, rollout server, composability source code.zip	2025-10-15	2.2 MB	0
Totals: 3 Items		3.7 MB	0

Highlights

Model Engine

As noted in https://github.com/volcengine/verl/issues/3624, model engine is a service that provides APIs for manipulation of a parallel and distributed model using single controller. This release provides a prototype for such idea using FSDP + ulysses backend and megatron core backend. The implementation is under https://github.com/volcengine/verl/tree/main/verl/workers/engine. Currently, we only implement SFT trainer using model engine. In the following releases, we will start to implement RL trainer using model engine. Please refer to https://verl.readthedocs.io/en/latest/workers/model_engine.html for the design and instructions to add more model engine backends.

Rollout Server

As agentic reinforcement learning emerges as a predominant research area, verl rollout is transitioning from SPMD mode to server mode, which is more efficient for multi-turn rollout and tool calling. In version 0.6, we made several major changes to rollout servers:

SGLang: https://github.com/volcengine/verl/pull/3090 completely separates the SGLang process from the trainer process in SPMD mode and introduces a server adapter to synchronize weights between the trainer and SGLang server. Furthermore, https://github.com/volcengine/verl/pull/3456 migrates SGLang to native server mode, enabling full-fledged features and optimizations for online serving.
vLLM: While the vLLM model_runner remains within the trainer process, https://github.com/volcengine/verl/pull/3456 also transitions vLLM to native server mode. We may explore completely separating the vLLM process from the trainer process in future releases.

By switching to native server mode, https://github.com/volcengine/verl/pull/3530 adds DP+EP support for large MoE models.

To improve extensibility, https://github.com/volcengine/verl/pull/3285 refactors the BaseRollout interface and deprecates all sharding managers. This refactor ensures the training engine remains agnostic of the inference engine during weight synchronization, making it easier to integrate new inference engines (e.g., TensorRT-LLM) without modifying the training engine.

Newly Supported Models

Qwen3 VL
GPT OSS

Algorithm

GSPO
Token-level TIS: https://github.com/volcengine/verl/pull/2953 introduces token-level importance sampling to mitigate the gap between rollout and training.
Sequence-level TIS: https://github.com/volcengine/verl/pull/3694 add more comprehensive metrics to monitor distribution mismatch between rollout and training, and introduces sequence-level importance sampling.

Recipe

Some awesome recipes have been added in v0.6:

Breaking changes and deprecations

nD Dispatch method

Previously, we implemented a set of predefined dispatch method including ONE_TO_ALL, DP_COMPUTE_DATA_PROTO, MEGATRON_COMPUTE_DATA_PROTO, etc,. DP_COMPUTE_DATA_PROTO and MEGATRON_COMPUTE_DATA_PROTO are strongly correlated to the underlying distributed strategies. Writing a separate dispatch method for each strategy is not scalable. In this release, we propose a new API to to unify all distributed strategies. The general steps are

Define device meshes or process groups
register dispatch and collect info by calling _register_dispatch_collect_info inside the worker
Add registration for methods using @register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name=mesh_name))

Please refer to https://github.com/volcengine/verl/blob/main/tests/single_controller/test_device_mesh_register.py as an example.

ShardingManager

ShardingManager is deprecated and will be removed in next release.

Importance bug fixes

Fix hang issue when mixing text and images data training in VLMs (e.g., Qwen VL)
Fix DataProto getstate bug

What's Changed

[cfg] refactor: add ActorConfig, EngineConfig, and ActorWorker unit test, refactor validation code by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2621
[ci] test: add CriticWorker unit test, make some util CPU friendly by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2717
[ray] feat: RayWorkerGroup support set worker env by @NKcqx in https://github.com/volcengine/verl/pull/2685
[sglang] fix: Adding strict naming sanity for sglang by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2719
[misc] chore: bump main branch version to v0.5.0.dev by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2718
[megatron] fix: resolve backward propagation error in megatron_actor due to shared logits tensor in-place modification by @HelloWorld686 in https://github.com/volcengine/verl/pull/2484
[tool] fix: geo3k create return by @nanjiangwill in https://github.com/volcengine/verl/pull/2714
[doc] feat: Add agent-lightning in the list of "awesome works using verl by @wizardlancet in https://github.com/volcengine/verl/pull/2726
[ci] fix: checkpoint_convertor ci miss a hf model download by @ETOgaosion in https://github.com/volcengine/verl/pull/2730
[recipe] chore: add retool training script by @wuxibin89 in https://github.com/volcengine/verl/pull/2732
[ci] fix: release ascend test time, fix one step off-policy CI by @ETOgaosion in https://github.com/volcengine/verl/pull/2731
[doc] feat: add resizable sidebar and improve layout by @Tingberer in https://github.com/volcengine/verl/pull/2577
[docker] feat: upgrade to torch 2.7, sglang 0.4.8 by @ETOgaosion in https://github.com/volcengine/verl/pull/2617
[megatron] feat: a bunch of optimzation on vram, sequence packing by @ISEEKYAN in https://github.com/volcengine/verl/pull/2678
[CI] feat: add mypy to pre-commit by @frrad in https://github.com/volcengine/verl/pull/2614
[doc] style: change resize handle from gradient to plain color by @Tingberer in https://github.com/volcengine/verl/pull/2746
refactor: Make sure to keep the type checking by @YeonwooSung in https://github.com/volcengine/verl/pull/2634
[rollout] feat: remove chat scheduler by @wuxibin89 in https://github.com/volcengine/verl/pull/2725
[perf] feat: add optional role selection in discrete mode for NPU Profiler by @tongtong0613 in https://github.com/volcengine/verl/pull/2750
[doc] feat: add retool blog by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2761
[algo] refactor: don't special-case compute_policy_loss by @frrad in https://github.com/volcengine/verl/pull/2701
[BREAKING] [rollout] chore: remove default rollout selection by @vermouth1992 in https://github.com/volcengine/verl/pull/2757
[misc] fix: Handle N-D arrays and complex objects in union_numpy_dict by @MikeDean2367 in https://github.com/volcengine/verl/pull/2768
[recipe] fix: fix retool SFT dataset by @vermouth1992 in https://github.com/volcengine/verl/pull/2764
[doc] fix: fix typo in agentic RL documentation by @kibitzing in https://github.com/volcengine/verl/pull/2777
[cfg] fix: fix failing rollout config test on main by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2771
[docker] feat: upgrade vllm to 0.9.1 by @ETOgaosion in https://github.com/volcengine/verl/pull/2747
[recipe] fix: fix issue when running split ppo by @as12138 in https://github.com/volcengine/verl/pull/2745
[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process by @none0663 in https://github.com/volcengine/verl/pull/2739
[recipe] feat: add QWen2.5-7b-instruct retool by @vermouth1992 in https://github.com/volcengine/verl/pull/2800
[recipe] feat: @register_policy_loss("geo_mean"); Geometric-Mean Policy Optimization by @MzeroMiko in https://github.com/volcengine/verl/pull/2795
[tool] fix: Typo fix -- Rename to_openai_function_tool_schema to get_openai_tool_schema by @wizeng23 in https://github.com/volcengine/verl/pull/2806
[perf] feat: Padding before batch post-process in agent-loop to save time by @PopSoda2002 in https://github.com/volcengine/verl/pull/2773
[vllm,rollout] fix: vllm rollout lock file permission by @clearhanhui in https://github.com/volcengine/verl/pull/2805
[training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn by @kibitzing in https://github.com/volcengine/verl/pull/2741
[vllm] fix: verl + vllm-ascend(version 0.9.1) running failed issue by @leo-pony in https://github.com/volcengine/verl/pull/2782
Revert "[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process" by @ETOgaosion in https://github.com/volcengine/verl/pull/2813
[algo] feat: add GSPO-token policy loss computation function by @0x404 in https://github.com/volcengine/verl/pull/2775
[sglang] fix: support the configuration of attention_backend in sglang by @tardis-key in https://github.com/volcengine/verl/pull/2818
[rollout] feat: pass all dataset fields to agent loop run by @wuxibin89 in https://github.com/volcengine/verl/pull/2810
[docker] feat: Upgrade sglang 0.4.9 + transformers 4.53.2 by @ETOgaosion in https://github.com/volcengine/verl/pull/2794
[sglang] fix: fix missing engine_kwargs by @vermouth1992 in https://github.com/volcengine/verl/pull/2823
[perf, doc] feat: Add profiling continous steps in one database by @davidmlw in https://github.com/volcengine/verl/pull/2695
[ci] fix: vllm no dataset by @ETOgaosion in https://github.com/volcengine/verl/pull/2831
[tool] fix: load MCP tools in async rollout mode by @mathewjhan in https://github.com/volcengine/verl/pull/2821
[rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset by @vllbc in https://github.com/volcengine/verl/pull/2740
[CI] feat: update npu image to vLLM-ascend-v0.7.3.post1+mindspeed0.12.1 by @Crispig in https://github.com/volcengine/verl/pull/2838
[training_utils] feat: Support assert_case for sandbox fusion by @HollowMan6 in https://github.com/volcengine/verl/pull/2374
[recipe] feat: support qwen3-8B/14B DAPO training on ASCEND NPU by @zhihe-wang in https://github.com/volcengine/verl/pull/2836
[doc] feat: add verl multinode SkyPilot example by @panf2333 in https://github.com/volcengine/verl/pull/2849
[megatron] feat: Add MindSpeed support on the NPU device by @CurryRice233 in https://github.com/volcengine/verl/pull/2707
[misc] feat: optimize GRPO-family algorithms with torch.stack and improve tensor creation consistency by @chi2liu in https://github.com/volcengine/verl/pull/2827
[fsdp] feat: optimize fsdp2 by @vermouth1992 in https://github.com/volcengine/verl/pull/2843
[recipe] feat: modify dapo deepseek megatron script by @vermouth1992 in https://github.com/volcengine/verl/pull/2711
[megatron] fix: remove the demising critic.model.enable_gradient_checkpointing flags in the scripts by @HollowMan6 in https://github.com/volcengine/verl/pull/2864
[fsdp,megatron,sglang] feat: Accelerate and Simplify Update weights logic and bump SGLang to 0.4.9.post6 by @hebiao064 in https://github.com/volcengine/verl/pull/2720
[ci] fix: fix fsdp test in transformers 4.54.1 by @vermouth1992 in https://github.com/volcengine/verl/pull/2874
[trainer, hardware] chore: add pin_memory_device when pin_memory is enabled by @zheliuyu in https://github.com/volcengine/verl/pull/2871
[data] feat: dump train/test example as JSON by @wantbook-book in https://github.com/volcengine/verl/pull/2666
[misc] refactor: Add AbstractRewardManager abstract class by @frrad in https://github.com/volcengine/verl/pull/2763
[doc] fix: Fix the role assignment error in the interaction demo file and doc. by @Qiao0124 in https://github.com/volcengine/verl/pull/2476
[trainer, ci] fix: fix error variable in new engine impl and add ci test by @ShareLer in https://github.com/volcengine/verl/pull/2647
[misc] feat: add nccl timeout configuration to fsdp workers by @shinytang6 in https://github.com/volcengine/verl/pull/2321
[trainer] fix: move UID generation before batch processing for future conditioning support by @nanjiangwill in https://github.com/volcengine/verl/pull/2880
[sglang] chore: bump transformer formers 4.54.0 and fix QWen VL issues by @hebiao064 in https://github.com/volcengine/verl/pull/2869
[doc] fix: multi turn argument is not available by @techkang in https://github.com/volcengine/verl/pull/2883
[tool, sglang] feat: add tool create info by @nanjiangwill in https://github.com/volcengine/verl/pull/2870
[trainer] chore: Add ground truth data to generation dumps in RayPPOTrainer by @looput in https://github.com/volcengine/verl/pull/2353
[ci] fix: retry type check on cpu by @ETOgaosion in https://github.com/volcengine/verl/pull/2887
[fsdp, trainer] fix: save config parameters to wandb in SFT by @EasonZhong668 in https://github.com/volcengine/verl/pull/2884
[misc] feat: support logging rollout prob vs. actor probs in multi-turn for debugging purpose, follow up of [#1712] by @TomQunChao in https://github.com/volcengine/verl/pull/2808
[FSDP] feat: Allows specifying a different reference model by @ethen8181 in https://github.com/volcengine/verl/pull/2050
[rollout] feat: add rollout_skip to skip rollout by reusing previously generated sequences by @wlf-darkmatter in https://github.com/volcengine/verl/pull/2602
[ray] feat: support directly register dispatch device mesh by @vermouth1992 in https://github.com/volcengine/verl/pull/2893
[doc] fix: Specify rollout engine in quickstart.rst by @TonyLianLong in https://github.com/volcengine/verl/pull/2905
[BREAKING] [ray, megatron] feat: remove RayMegatronWorker by @vermouth1992 in https://github.com/volcengine/verl/pull/2895
[megatron] refactor: simplify module init in megatron_workers, extract common operations by @ETOgaosion in https://github.com/volcengine/verl/pull/2400
[rollout, sglang] fix: fix encoding logic bug by @nanjiangwill in https://github.com/volcengine/verl/pull/2901
[megatron] fix: qwen2vl megatron fused forward param bug by @Yangruipis in https://github.com/volcengine/verl/pull/2595
[sglang] fix: remove unnecessary maybe_set_triton_cache_manager by @hebiao064 in https://github.com/volcengine/verl/pull/2926
[misc] refactor: deprecate sharding manager (part 1) by @vermouth1992 in https://github.com/volcengine/verl/pull/2912
[megatron] feat: support for pipeline layout with vpp in mcore 0.13.0 by @yzlnew in https://github.com/volcengine/verl/pull/2749
[fsdp] fix: call reshard() to resolve no shard attribute by @weifengpy in https://github.com/volcengine/verl/pull/2941
[megatron] chore: update example 671B script, no offline dist-ckpt needed any more by @ISEEKYAN in https://github.com/volcengine/verl/pull/2945
[tool] feat: handle cases when func calling without params by @Tavish9 in https://github.com/volcengine/verl/pull/2936
[sglang] feat: add dapo multi-turn as alternative baseline by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2952
[megatron] fix: retain MLA config in mcore config converter by @Yangruipis in https://github.com/volcengine/verl/pull/2933
[ci] fix: limit e2e_one_step_off_policy timeout by @ETOgaosion in https://github.com/volcengine/verl/pull/2964
[rollout] fix: Fix local rank binding issue when setting RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES by @Crispig in https://github.com/volcengine/verl/pull/2967
[doc] fix: fix typo in docs/preparation/prepare_data.rst by @nariaki3551 in https://github.com/volcengine/verl/pull/2957
[misc] fix: fix DataProto getstate bug by @vermouth1992 in https://github.com/volcengine/verl/pull/2962
[sglang] fix: Fix No command 'hf' found for dapo multi-turn as alternative baseline by @none0663 in https://github.com/volcengine/verl/pull/2973
[megatron] feat: Allow override optimizer config by @ETOgaosion in https://github.com/volcengine/verl/pull/2959
[rollout] feat: add cudagraph_capture_sizes option to customize cuda graph memory by @chenhaiq in https://github.com/volcengine/verl/pull/2956
[trainer] refactor: make main_ppo TaskRunner more modular by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2885
[data] fix: fix bug of '_io.BytesIO' object has no attribute 'startswith' by @xylcbd in https://github.com/volcengine/verl/pull/2430
[trainer] fix: only load memory in micro batch by @chenhaiq in https://github.com/volcengine/verl/pull/2908
[misc] feat: Added: "tensorboard" to the requirements.txt by @RasulAlakbarli in https://github.com/volcengine/verl/pull/2900
[ray, trainer] fix: fix working_dir when launching via uv by @Tavish9 in https://github.com/volcengine/verl/pull/2859
[rollout,vllm] fix: max_num_seqs not take effect by @wuxibin89 in https://github.com/volcengine/verl/pull/2960
[rollout,trainer] feat: offload param before wake up inference engine by @chenhaiq in https://github.com/volcengine/verl/pull/2977
[doc] feat: update contact and news by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2993
[rollout] fix: avoid repeated multiplication by n for GRPO by @zdhNarsil in https://github.com/volcengine/verl/pull/2881
[BREAKING] [perf] refactor: Profiler api refactor by @ETOgaosion in https://github.com/volcengine/verl/pull/2894
[ray] fix: Fix function name in worker helper by @MrAta in https://github.com/volcengine/verl/pull/2868
[model] fix: Handle flash_attn_supports_top_left_mask import for older transformers by @liqiongyu in https://github.com/volcengine/verl/pull/2985
[trainer] feat: Specify apply_chat_template_kwargs from config by @HollowMan6 in https://github.com/volcengine/verl/pull/2998
[rollout,vllm] feat: unify vllm and sglang method to async by @wuxibin89 in https://github.com/volcengine/verl/pull/2982
[sglang]fix: Reduce memory footprint during rollout by adding load_grad=False when loading megatron weights. by @HaochenYuan in https://github.com/volcengine/verl/pull/3007
[perf] refactor: part 2 - Profiler ci test and fixes by @ETOgaosion in https://github.com/volcengine/verl/pull/3001
[recipe] feat: add deepeyes recipe by @Maxwell-Jia in https://github.com/volcengine/verl/pull/2398
[trainer] fix: reduce memory footprint by moving data to the device only in mini batch by @ji-huazhong in https://github.com/volcengine/verl/pull/3011
[ci] fix: add flash_attn_supports_top_left_mask to ignore list by @vermouth1992 in https://github.com/volcengine/verl/pull/3004
[misc] feat: Support trackio by @yzlnew in https://github.com/volcengine/verl/pull/3017
[perf] feat: Add rollout longtail observation metrics by @ETOgaosion in https://github.com/volcengine/verl/pull/3009
[rollout] fix: Add soft node affinity to the agent loop workers by @JoostvDoorn in https://github.com/volcengine/verl/pull/3006
[misc] chore: add gpu memory to deepseek script by @vermouth1992 in https://github.com/volcengine/verl/pull/3022
[misc] chore: add GPU memory to names that train large models by @vermouth1992 in https://github.com/volcengine/verl/pull/3023
[rollout] feat: add rollout config by @vermouth1992 in https://github.com/volcengine/verl/pull/3010
[hardware, recipe] chore: support retool sft &update peft sft perf on npu by @zheliuyu in https://github.com/volcengine/verl/pull/3000
[trainer,rollout,doc] feat: reduce minimum gpus to 96 for deepseek-v3 by @techkang in https://github.com/volcengine/verl/pull/3019
[recipe] fix: make LangGraph agent example runnable out-of-the-box by @philippnormann in https://github.com/volcengine/verl/pull/3029
[ci] fix: try fix vllm test network issue by @ETOgaosion in https://github.com/volcengine/verl/pull/3031
[fsdp] fix: set _set_allocator_settings to True to avoid fsdp2 oom by @chenhaiq in https://github.com/volcengine/verl/pull/3020
[doc] feat: Add VTool-R1 in the list of "awesome works using verl by @JingchengYang4 in https://github.com/volcengine/verl/pull/3036
[misc] feat: add B200 and GB200 flops count by @vermouth1992 in https://github.com/volcengine/verl/pull/3041
[rollout] feat: support over sampling rollout in SGLang Rollout by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2929
[doc] feat: add benchmark for deepseek by @techkang in https://github.com/volcengine/verl/pull/3046
[rollout] feat: remove over-catched errors in SGLang rollout by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3047
[rollout,vllm] feat: support multi-modal in agent loop by @wuxibin89 in https://github.com/volcengine/verl/pull/3016
[hardware] add flops count support for A3 device by @codemayq in https://github.com/volcengine/verl/pull/3053
[trainer] fix: Remove redundant 'data.to()' codes by @A1waysBeenHere in https://github.com/volcengine/verl/pull/3051
[BREAKING][rollout] feat: allow users pass all vllm/sglang engine args by @techkang in https://github.com/volcengine/verl/pull/3037
[doc] fix: optimize ascend docs by @zheliuyu in https://github.com/volcengine/verl/pull/3063
[ray] feat: remove worker group register center by @wuxibin89 in https://github.com/volcengine/verl/pull/3066
[tool] fix: support non-ascii characters in search results by @Necolizer in https://github.com/volcengine/verl/pull/3044
[ray] feat: add support for ray init kwargs by @Tavish9 in https://github.com/volcengine/verl/pull/3049
[rollout] fix: vllm sleep level=2 bug by @techkang in https://github.com/volcengine/verl/pull/3082
[fsdp] fix: add missing mixed precision configuration to FSDPEngineConfig by @xxrjun in https://github.com/volcengine/verl/pull/3068
[fsdp] fix: patch fsdp2 to support hf transformer==4.54.0 and above by @weifengpy in https://github.com/volcengine/verl/pull/3072
[sglang] fix: Qwen VLM Baseline by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3083
Update ray_trainer.py by @zlH518 in https://github.com/volcengine/verl/pull/3092
[sglang] fix: Qwen VLM Baseline and sgl CI by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3101
[BREAKING] [rollout] feat: add a separate rollout worker by @vermouth1992 in https://github.com/volcengine/verl/pull/3071
[recipe] fix: checkpoint in last step might be ignored to save in dapo by @syt-nju in https://github.com/volcengine/verl/pull/3034
[fsdp, trainer, ckpt] feat: support custom model init and merging for FSDP by @Tavish9 in https://github.com/volcengine/verl/pull/3012
[perf] fix: fix npu profiler and add mstx UT by @tongtong0613 in https://github.com/volcengine/verl/pull/3052
[doc] feat: Add Kimina-Prover-RL to awesome work by @thibautbar in https://github.com/volcengine/verl/pull/3108
[misc] fix: fix precommit by @vermouth1992 in https://github.com/volcengine/verl/pull/3109
[doc, perf] feat: add profiling doc by @ETOgaosion in https://github.com/volcengine/verl/pull/3113
[trainer, worker] fix: setting old log probs equal to log probs for on policy training by @sahilpatelsp in https://github.com/volcengine/verl/pull/3119
Fix python version by @Zzhiter in https://github.com/volcengine/verl/pull/3103
[trainer] fix: only load memory in micro batch for megatron backend by @none0663 in https://github.com/volcengine/verl/pull/3106
[rollout] feat: use rollout worker in MegatronWorker by @vermouth1992 in https://github.com/volcengine/verl/pull/3111
[rollout] feat: compute reward score in agent loop by @wuxibin89 in https://github.com/volcengine/verl/pull/3055
[ci] fix: fix precommit by @vermouth1992 in https://github.com/volcengine/verl/pull/3128
[trainer] fix: only load memory in micro batch for compute_log_prob, compute_values and update_critic by @none0663 in https://github.com/volcengine/verl/pull/3094
[trainer] fix: move testing out of step timings by @Tialo in https://github.com/volcengine/verl/pull/3117
[megatron] fix: add temperature parameter for logits scaling by @gxy-gxy in https://github.com/volcengine/verl/pull/3133
[megatron] fix: mbridge save/load by @ETOgaosion in https://github.com/volcengine/verl/pull/2519
[recipe] fix: make compute of step consistent across all trainers by @Tialo in https://github.com/volcengine/verl/pull/3132
[misc] fix: update peft's version in requirements-npu.txt by @zheliuyu in https://github.com/volcengine/verl/pull/3127
[rollout] fix: numpy.int64 serialization error in Weave tracing during validation by @U-rara in https://github.com/volcengine/verl/pull/3112
[sglang] feat: make sglang properly handle the max_num_seqs configuration by @binary-husky in https://github.com/volcengine/verl/pull/3134
[doc] feat: documentation Update, Ray Job Management Commands by @none0663 in https://github.com/volcengine/verl/pull/3131
[ci] fix: model tests, transformers 4.55 has troubles with backward by @ETOgaosion in https://github.com/volcengine/verl/pull/3139
[megatron] fix: fix megatron micro_batch_size assertion by @vermouth1992 in https://github.com/volcengine/verl/pull/3142
[rollout] fix: KeyError "CPU" init agent loop workers by @KivenChen in https://github.com/volcengine/verl/pull/3141
[fsdp, sglang] fix: Using Agreesive Empty Cache instead by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3136
[recipe] feat: support qwen2.5-32B DAPO training script on ASCEND NPU by @ZLiao097 in https://github.com/volcengine/verl/pull/3146
[rollout] feat: add response token logprobs in agent loop output by @wuxibin89 in https://github.com/volcengine/verl/pull/3151
[fsdp, trainer, tool] feat: add memory snapshot & visualization support for debugging GPU memory leaks by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3099
[sglang] fix: fall back to default FSDP1 by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3156
[sglang] fix: remove unused padding in SGLang rollout by @PopSoda2002 in https://github.com/volcengine/verl/pull/3138
[doc] fix: add qwen3moe-30b script and fix error in qwen3-235b by @chenhaiq in https://github.com/volcengine/verl/pull/3174
[misc] feat: Add L40S and A40 flop counts by @fjosw in https://github.com/volcengine/verl/pull/3177
[megatron] feat: set_expandable_segments for megatron by @vermouth1992 in https://github.com/volcengine/verl/pull/3181
[WIP]: Setting DAPO baseline in SGLang multi-turn RL by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3175
[Optimize]Safe tool parameter access standardization in SGLang rollout by @Zzhiter in https://github.com/volcengine/verl/pull/3196
[misc] feat: Add RL-PLUS to awesome work list by @YihongDong in https://github.com/volcengine/verl/pull/3197
[rollout] feat: use dummy load_format when init AsyncServer by @vermouth1992 in https://github.com/volcengine/verl/pull/3184
[rollout, sglang] feat: Add sync mode for bash by @PopSoda2002 in https://github.com/volcengine/verl/pull/3186
[rollout] fix: add missing extra_reward_info to AgentLoopOuput by @wuxibin89 in https://github.com/volcengine/verl/pull/3194
[doc] fix: set use_dist_checkpointing to False for ref model in qwen3moe-30b script by @none0663 in https://github.com/volcengine/verl/pull/3198
[env] fix: Improve License Check Hook Flexibility by @slimfrkha in https://github.com/volcengine/verl/pull/3202
Revert "[rollout] feat: use dummy load_format when init AsyncServer" by @vermouth1992 in https://github.com/volcengine/verl/pull/3207
[recipe] feat: Add Qwen3 30B MoE NPU recipe by @Shangwei-Li in https://github.com/volcengine/verl/pull/3189
[perf] fix: fix profiler discrete mode unavailability by @tongtong0613 in https://github.com/volcengine/verl/pull/3188
[docker] feat: update to vllm 0.10.0, mcore 0.13, transformers 4.55.4 by @ETOgaosion in https://github.com/volcengine/verl/pull/3192
[data] fix: update parquet_files type check to support multi-file input by @looput in https://github.com/volcengine/verl/pull/3211
[rollout] fix: apply copy_to_local before init hf config by @ZornWang in https://github.com/volcengine/verl/pull/3204
[doc] fix: fix a documentation typo for nsys by @davidmlw in https://github.com/volcengine/verl/pull/3214
[trainer] refactor: PPO config validation fast fail by @slimfrkha in https://github.com/volcengine/verl/pull/3187
[megatron] refactor: refactor MegatronPPOActor by @vermouth1992 in https://github.com/volcengine/verl/pull/3206
[env, sglang] feat: Bump new sglang version to fix vlm OOM by @PopSoda2002 in https://github.com/volcengine/verl/pull/3216
[ci] fix: fix type convergence check by @ETOgaosion in https://github.com/volcengine/verl/pull/3219
[rollout] fix: Restore the parameter 'limit_images' in RolloutConfig by @sty-yyj in https://github.com/volcengine/verl/pull/3217
[BREAKING][vllm, fsdp] feat: add Rollout-Training Mismatch Fix -- Truncated importance sampling by @yaof20 in https://github.com/volcengine/verl/pull/2953
[doc] fix: fix slack invitation link by @eric-haibin-lin in https://github.com/volcengine/verl/pull/3230
[trainer] fix: Unified use of the def to() in Class DataProto by @A1waysBeenHere in https://github.com/volcengine/verl/pull/3227
[fsdp, training_utils] Fix: LoRA w/ VLMs when Using Layered Summon by @kfallah in https://github.com/volcengine/verl/pull/3231
[recipe] feat: Add InfiGUI-G1 recipe for MLLM GUI grounding by @Sirius-L1 in https://github.com/volcengine/verl/pull/3242
[sglang] feat: add native sgl server by @ChangyiYang in https://github.com/volcengine/verl/pull/3090
[megatron, model] feat: add MegatronEngine, MegatronEngineForCausalLM by @vermouth1992 in https://github.com/volcengine/verl/pull/3235
[hardware] fix: Call synchronization when using the td.to("cpu") operation on NPU to avoid potential precision issues by @ji-huazhong in https://github.com/volcengine/verl/pull/3222
[ckpt] fix: TypeError when save VL model ckpt by @Maxwell-Jia in https://github.com/volcengine/verl/pull/3268
[recipe] fix: Remove redundant parameters to resolve errors in the script caused by the latest Verl main branch. by @ZLiao097 in https://github.com/volcengine/verl/pull/3252
add gptoss grpo example script by @rich-junwang in https://github.com/volcengine/verl/pull/3212
[worker] fix: Fix missing rollout_log_probs argument in policy loss functions by @kAIto47802 in https://github.com/volcengine/verl/pull/3274
[data] fix: None has no attribute get when extra_info in Parquet is NaN by @Mighten in https://github.com/volcengine/verl/pull/3272
[misc] fix: use uid for grouping in validation to avoid prompt confusion in multimodal tasks by @Maxwell-Jia in https://github.com/volcengine/verl/pull/3280
[training_utils] fix: allow empty image_key/video_key in rl dataset by @HollowMan6 in https://github.com/volcengine/verl/pull/3281
[hardware] fix: update source in dockerfile.rocm by @yushengsu-thu in https://github.com/volcengine/verl/pull/3284
[fsdp] feat: add NPU fusion kernels for Qwen3 MoE by @Shangwei-Li in https://github.com/volcengine/verl/pull/3221
[fsdp, model] feat: support FSDP model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3270
[rollout] feat: Refactor agentloop multiturn by @plutoZZZZ in https://github.com/volcengine/verl/pull/3171
[perf] feat: add npu silu &expand the scope of patch models by @zheliuyu in https://github.com/volcengine/verl/pull/3260
[doc] fix: add rStar2-Agent as work using verl by @feifeibear in https://github.com/volcengine/verl/pull/3298
[BREAKING][rollout] feat: Added asynchronous reward model calculation in agent loop by @echo-rain in https://github.com/volcengine/verl/pull/3152
[ci, model] feat: add qwen3 CI testcase on ASCEND NPU by @tardis-key in https://github.com/volcengine/verl/pull/3300
[single_controller, ray] fix: shut ray down after initializes it by @lantian7 in https://github.com/volcengine/verl/pull/3317
[rollout] feat: deprecate all rollout sharding manager by @wuxibin89 in https://github.com/volcengine/verl/pull/3285
[trainer] fix: ray.state.available_resources_per_node is deprecated by @HollowMan6 in https://github.com/volcengine/verl/pull/3313
[trainer] fix: Correct off-by-one error in SFT loss mask slicing by @BlankCheng in https://github.com/volcengine/verl/pull/3287
[doc] feat: Adding PACS to the Awesome work by @Geaming2002 in https://github.com/volcengine/verl/pull/3327
[model] feat: polish model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3321
[misc] feat: create issue template for verl by @techkang in https://github.com/volcengine/verl/pull/3330
[doc]Update README.md, add related works by @The-Hierophant in https://github.com/volcengine/verl/pull/3331
[recipe] fix: bugfix of refactor omissions by @baymax591 in https://github.com/volcengine/verl/pull/3328
[training_utils] fix: Using a non-tuple sequence for multidimensional indexing is deprecated by @HollowMan6 in https://github.com/volcengine/verl/pull/3314
[recipe] fix: (dapo_ray_trainer) use global_steps to determine is_last_step when resuming (gen_steps not restored) by @zpqiu in https://github.com/volcengine/verl/pull/3336
[deployment, doc] feat: Add SkyPilot integration examples by @alex000kim in https://github.com/volcengine/verl/pull/3333
[doc] fix: Update skypilot_examples.rst by @vermouth1992 in https://github.com/volcengine/verl/pull/3344
[trainer] feat: support sft_trainer with model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3341
[model] feat: support ByteDance Seed-OSS 36B model by @chenhaiq in https://github.com/volcengine/verl/pull/3347
[vllm] fix: verl + vllm-ascend(version 0.9.1) running failed issue by @baymax591 in https://github.com/volcengine/verl/pull/3345
[rollout, vllm, sglang] fix: allow user customization of repetition_penalty to avoid watchdog timeout during GRPO rollout by @Mighten in https://github.com/volcengine/verl/pull/3309
[doc] fix: fix typo in skypilot_examples.rst by @alex000kim in https://github.com/volcengine/verl/pull/3368
[trainer] feat: add CI for accuracy alignment of SFT trainer with model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3363
[worker,sglang] refactor: deprecate fsdp/megatron reward model with server mode by @yyDing1 in https://github.com/volcengine/verl/pull/3352
[training_utils] fix: stop using math naming under reward score" by @HollowMan6 in https://github.com/volcengine/verl/pull/3378
[misc] fix: set default value of ETP to 1 by @vermouth1992 in https://github.com/volcengine/verl/pull/3371
[deployment] Fix deepseek671B grpo script by @HaochenYuan in https://github.com/volcengine/verl/pull/3383
[trainer] fix: Fix ClearML logging by @Tialo in https://github.com/volcengine/verl/pull/3384
[model, megatron] feat: Add glm air support and make new model directly use mbridge by @ETOgaosion in https://github.com/volcengine/verl/pull/3359
[ci] fix: cpu unit test, etp config breaking change by @ETOgaosion in https://github.com/volcengine/verl/pull/3390
[model] refactor: polishing FSDP model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3394
[model] feat: polish megatron engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3401
[trainer] fix: avoid loading duplicated custom reward function to fix issue [#3150] by @fshp971 in https://github.com/volcengine/verl/pull/3404
[doc] fix: edit one step off policy readme with original work by @mnoukhov in https://github.com/volcengine/verl/pull/3414
[ci] refactor: add ci test for refactored reward worker and add some args to GenRM config by @yyDing1 in https://github.com/volcengine/verl/pull/3385
[vllm] fix: use VLLM_SLEEP_LEVEL=1 on ASCEND NPU by @Roseisrosie in https://github.com/volcengine/verl/pull/3355
[rollout] chore: Add enable_prefix_caching into config by @wlf-darkmatter in https://github.com/volcengine/verl/pull/3395
[rollout] fix: raise error if processing multimodal data without vlm processor by @techkang in https://github.com/volcengine/verl/pull/3370
[recipe] fix: Add gts argument for recipe _dump_generations by @moehanabi in https://github.com/volcengine/verl/pull/3348
[misc] feat: prototype deprecate DataProto and replace with Tensordict: part 1 by @vermouth1992 in https://github.com/volcengine/verl/pull/2733
[fsdp, recipe] feat: add grpo reward model example using HH-RLHF dataset by @ccclyu in https://github.com/volcengine/verl/pull/3417
[model] feat: replace DataProto with TensorDict in engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3422
[tool] feat: support local gsm8k dataset in example/data_preprocess by @tardis-key in https://github.com/volcengine/verl/pull/3362
[worker] refactor: move the implementation of rm to workers.roles and polish by @yyDing1 in https://github.com/volcengine/verl/pull/3423
[doc] feat: add SimpleVLA-RL link in readme by @feifeibear in https://github.com/volcengine/verl/pull/3433
[doc] fix: table column in document by @chenhaiq in https://github.com/volcengine/verl/pull/3430
[worker, sglang] feat: support generative reward model (server mode) by @yyDing1 in https://github.com/volcengine/verl/pull/3441
[trainer] feat: VL support freeze vision model by @maijia-cwh in https://github.com/volcengine/verl/pull/3178
[trainer] fix: Loss calculations for grad accumulation steps by @puneeshkhanna in https://github.com/volcengine/verl/pull/3332
[worker] fix: respect free_cache_engine flag by @HollowMan6 in https://github.com/volcengine/verl/pull/3442
[ci] feat: move more tests to volcano engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3455
[sglang, tool] fix: fix text only bug by @nanjiangwill in https://github.com/volcengine/verl/pull/3448
[trainer, fsdp, megatron] feat: Support one step off async rl on Ascend NPU by @ji-huazhong in https://github.com/volcengine/verl/pull/2924
[trainer,rollout] fix: model weights will not be loaded when vllm_sleep_level=2 and using lora by @techkang in https://github.com/volcengine/verl/pull/3461
[model] feat: Add Apertus by @EduardDurech in https://github.com/volcengine/verl/pull/3295
[model] feat: add FSDP/Megatron critic worker with model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3439
[megatron,recipe] feat: support Qwen3-30B (MoE) DAPO training on ASCEND NPU by @wlf-darkmatter in https://github.com/volcengine/verl/pull/3203
[sglang, rollout] feat: enable token-in-token-out for SGLang engine by @nanjiangwill in https://github.com/volcengine/verl/pull/2759
[model] feat: add qwen3 grpo 8b/32b script on ASCEND NPU by @tardis-key in https://github.com/volcengine/verl/pull/3310
[ci] chore: add codeowner by @tardis-key in https://github.com/volcengine/verl/pull/3473
[rollout] fix: make agent loop reward worker thread-safe by @wuxibin89 in https://github.com/volcengine/verl/pull/3454
[perf, megatron] chore: bind NUMA by @conver334 in https://github.com/volcengine/verl/pull/3471
[ray] refactor: Accelerate Tensor serialization by converting to np.ndarray by @baymax591 in https://github.com/volcengine/verl/pull/3425
[1/N][rollout] feat: support vllm/sglang native http server by @wuxibin89 in https://github.com/volcengine/verl/pull/3456
[perf] fix: Init some attrs earlier in Profiler by @moehanabi in https://github.com/volcengine/verl/pull/3482
[perf, megatron] fix: bugfix if nvml can not import by @baymax591 in https://github.com/volcengine/verl/pull/3490
[ray, single_controller] refactor: Accelerate ray.put with thread by @baymax591 in https://github.com/volcengine/verl/pull/3495
[model] fix: fix device by @vermouth1992 in https://github.com/volcengine/verl/pull/3500
[training_utils] refactor: extract checkpoint handler into a separate file for reuse by @vermouth1992 in https://github.com/volcengine/verl/pull/3505
[recipe] feat: Add qwen2.5-7b DAPO NPU example script by @FightingZhen in https://github.com/volcengine/verl/pull/3501
[model, ci] feat: add qwen3-8b ppo script on ASCEND NPU by @xvxuopop in https://github.com/volcengine/verl/pull/3502
[data] feat: support customizable loss mask in multi-turn sft dataset by @vermouth1992 in https://github.com/volcengine/verl/pull/3507
[model] fix: refactor qwen2vl patches & support no-image input for fsdp by @hiyouga in https://github.com/volcengine/verl/pull/3496
[megatron] Add TIS support to megatron backend by @sharonyu-115 in https://github.com/volcengine/verl/pull/3513
[model] fix: qwen2vl for transformers 4.52.* by @hiyouga in https://github.com/volcengine/verl/pull/3524
[doc] fix: Update Qwen3-30B-A3B info in ascend_quick_start.rst by @tardis-key in https://github.com/volcengine/verl/pull/3514
[doc] chore: Update ascend quick start document by @FightingZhen in https://github.com/volcengine/verl/pull/3527
[doc] chore: Update owners for ascend_tutorial documents by @FightingZhen in https://github.com/volcengine/verl/pull/3528
[worker] fix: get all multi_modal_inputs keys with in a microbatch by @HollowMan6 in https://github.com/volcengine/verl/pull/3315
[ci] feat: using local dataset to avoid network issue by @vermouth1992 in https://github.com/volcengine/verl/pull/3533
[Megatron] fix: compatible to mcore0.15 by @ISEEKYAN in https://github.com/volcengine/verl/pull/3534
[chore] fix typo by @1195343015 in https://github.com/volcengine/verl/pull/3535
[ci] feat: fix more ci by @vermouth1992 in https://github.com/volcengine/verl/pull/3537
[recipe] fix: Fix main_spin.py bugs by @NuoJohnChen in https://github.com/volcengine/verl/pull/3543
[model] feat: support parameter generator for model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3529
[megatron] chore: add a docker image for with mcore0.15 and TE2.7 by @ISEEKYAN in https://github.com/volcengine/verl/pull/3540
[ci] feat: update ci by @vermouth1992 in https://github.com/volcengine/verl/pull/3552
[recipe] fix: spin fsdp_workers.py bugs by @NuoJohnChen in https://github.com/volcengine/verl/pull/3544
[recipe] fix: init self.model_config in fsdp worker of one-step-off policy by @zlwang-cs in https://github.com/volcengine/verl/pull/3556
[docker] feat: dockerfile rocm7 initial commit by @vickytsang in https://github.com/volcengine/verl/pull/3547
[trainer,rollout] fix: ensure LoRA weights are loaded when vllm_sleep_level=2 and without using layerd_summon by @piood in https://github.com/volcengine/verl/pull/3541
[ci] fix: fix more ci by pin transformers version by @vermouth1992 in https://github.com/volcengine/verl/pull/3582
[trainer] refactor: move rollout log to inheritable trainer by @ccclyu in https://github.com/volcengine/verl/pull/3576
[CI] chore: Update e2e_ascend CI config by @FightingZhen in https://github.com/volcengine/verl/pull/3532
[misc] feat: remove redundant default params by @techkang in https://github.com/volcengine/verl/pull/3577
[megatron] fix: fix bug when holding empty parameters with custom pipeline layout by @HaochenYuan in https://github.com/volcengine/verl/pull/3565
[ci] fix: fix e2e_sppo ci by @vermouth1992 in https://github.com/volcengine/verl/pull/3587
[sglang] fix: Support SGLang>=0.5.2 by @EduardDurech in https://github.com/volcengine/verl/pull/3526
[megatron] feat: use flash as default attention_backend by @ISEEKYAN in https://github.com/volcengine/verl/pull/3578
[doc] fix: add faq doc to avoid vllm issue 22103 by @chenhaiq in https://github.com/volcengine/verl/pull/3595
[misc] chore: Update CODEOWNERS by @vermouth1992 in https://github.com/volcengine/verl/pull/3594
[megatron] fix: revert megatron actor refactor by @vermouth1992 in https://github.com/volcengine/verl/pull/3553
[misc] feat: prototype deprecate DataProto and replace with Tensordict: part 2 by @houminz in https://github.com/volcengine/verl/pull/3567
[algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup by @EduardDurech in https://github.com/volcengine/verl/pull/3555
[CI] chore: reopen ppo test in e2e_ascend CI by @FightingZhen in https://github.com/volcengine/verl/pull/3588
[trainer] fix: Import flash attn utils for Transformers higher than 4.55.0 by @A1waysBeenHere in https://github.com/volcengine/verl/pull/3596
[recipe] feat: CollabLLM integration for multiturn training by @Wuyxin in https://github.com/volcengine/verl/pull/3574
[doc] feat: add model engine doc by @vermouth1992 in https://github.com/volcengine/verl/pull/3611
[ci] chore: Use local dataset and models in e2e_ascend CI by @FightingZhen in https://github.com/volcengine/verl/pull/3601
[rollout] fix: remove code responsible for tool response duplication by @mgilmore-relace in https://github.com/volcengine/verl/pull/3604
[doc] fix: fix doc by @vermouth1992 in https://github.com/volcengine/verl/pull/3614
[worker] fix: correctly determine is_vlm_model if sp > 1 by @HollowMan6 in https://github.com/volcengine/verl/pull/3282
[rollout, tool] feat: export rollout rewards to total rewards by @Tavish9 in https://github.com/volcengine/verl/pull/3563
[ci] fix: use local models/configs/datasets to increase stability by @vermouth1992 in https://github.com/volcengine/verl/pull/3616
[ci] fix: fix sanity ci by @vermouth1992 in https://github.com/volcengine/verl/pull/3626
[doc] feat: Adding Table-R1 to the Awesome work by @FlowRays in https://github.com/volcengine/verl/pull/3627
[ci] feat: upgrade sglang to 0.5.2 by @wuxibin89 in https://github.com/volcengine/verl/pull/3613
[ci] feat: increase timeout of e2e_sft by @vermouth1992 in https://github.com/volcengine/verl/pull/3630
[tool] feat: support load local datasets when preparing datasets by @ji-huazhong in https://github.com/volcengine/verl/pull/3621
[CI] fix: changed the model used in the PPO test case to Qwen2.5-0.5B to avoid the huggingface download error by @ji-huazhong in https://github.com/volcengine/verl/pull/3631
[recipe] fix: Fix a Typo in One_Step_Off_Policy and Add async of Generative Reward Model in Response Generation by @ZhichaoWang970201 in https://github.com/volcengine/verl/pull/3369
[megatron] feat: add mindspeed engine and support sft by @ji-huazhong in https://github.com/volcengine/verl/pull/3599
[rollout] refactor: Update rollout and reward configs to reuse vllm/sglang replicas by @yyDing1 in https://github.com/volcengine/verl/pull/3625
[trainer] fix: Ref to [#3596]. More import fix for transformers version higher than 4.55.0 by @A1waysBeenHere in https://github.com/volcengine/verl/pull/3608
[2/N][rollout] feat: support vllm/sglang DP+EP in server mode by @wuxibin89 in https://github.com/volcengine/verl/pull/3530
[model] feat: add glm4v by @lambertwjh in https://github.com/volcengine/verl/pull/3291
[algo, perf] feat: Vectorize GRPO Advantage Estimator - 13～26x Speedup by @CedricHwong in https://github.com/volcengine/verl/pull/3635
[megatron, worker] fix: use extract_multi_modal_inputs method for handling multi_modal_inputs by @HollowMan6 in https://github.com/volcengine/verl/pull/3641
[rollout,vllm] fix: Add LoRA Loading to Async vLLM by @kfallah in https://github.com/volcengine/verl/pull/3639
[megatron, training_utils] fix: encoder pp is removed in mcore >= 0.14 by @HollowMan6 in https://github.com/volcengine/verl/pull/3640
[recipe] feat: add multiturn scripts for vllm backend; fix progess bar in dapo by @jiaqiw09 in https://github.com/volcengine/verl/pull/3644
[sglang] feat: adapt for sglang+verl by @lbk-sys in https://github.com/volcengine/verl/pull/3506
[model] fix: stuck issue with mixed text-image data by @HollowMan6 in https://github.com/volcengine/verl/pull/3670
[ci] fix: disable workflows with self-host machines to run on fork by @HollowMan6 in https://github.com/volcengine/verl/pull/3677
[rollout] fix: qwen2_vl position_ids shape mismatch by @m-Just in https://github.com/volcengine/verl/pull/3653
[model] feat: add qwen3vl by @hiyouga in https://github.com/volcengine/verl/pull/3681
[ci] fix: merge pre-commit-full into pre-commit by @HollowMan6 in https://github.com/volcengine/verl/pull/3684
[ci] fix: fix checkpoint converter ci by @vermouth1992 in https://github.com/volcengine/verl/pull/3685
[model] fix: qwen3vl patch by @hiyouga in https://github.com/volcengine/verl/pull/3686
[trainer] feat: Enabled fused adamw by @puneeshkhanna in https://github.com/volcengine/verl/pull/3692
[worker] fix: support for vllm V0 deprecation version by @HollowMan6 in https://github.com/volcengine/verl/pull/3687
[rollout,sglang] fix: get_tool_call_parser_type for gpt-oss models in sglang rollout by @HJSang in https://github.com/volcengine/verl/pull/3661
[rollout] fix: add batch_data_id default value check in AsyncRolloutRequest by @pandengyao in https://github.com/volcengine/verl/pull/3657
[rollout] fix: Add LoRA datatype based on rollout model type to the LoRA config by @mgilmore-relace in https://github.com/volcengine/verl/pull/3675
[rollout] feat: support async mode for multimodal data inference by @xichengpro in https://github.com/volcengine/verl/pull/3702
[worker] refactor: Add kwargs to checkpoint related functions in BaseEngine and its subclasses by @hongpeng-guo in https://github.com/volcengine/verl/pull/3662
[worker] fix: create a new event loop if none exists by @ji-huazhong in https://github.com/volcengine/verl/pull/3703
[recipe] fix: move all collabllm files into recipe directory by @chenhaiq in https://github.com/volcengine/verl/pull/3706
[megatron, model] fix: VLMs using mbridge together with fused kernels by @HollowMan6 in https://github.com/volcengine/verl/pull/3700
[data] fix: merge metrics from all workers in DataProto.concat() by @szrlee in https://github.com/volcengine/verl/pull/3699
[misc] fix: model reassign to inner model in vllm patch file by @ccclyu in https://github.com/volcengine/verl/pull/3668
[misc] fix: Allow HF model ID with use_shm by @EduardDurech in https://github.com/volcengine/verl/pull/3663
[megatron] feat: add ascend megatron merge support by @jiaqiw09 in https://github.com/volcengine/verl/pull/3722
[fsdp] fix: Handle dict type for per_tensor_param in LoRA weight sync by @pourion in https://github.com/volcengine/verl/pull/3712
[rollout] feat: Add gpt-oss tool parser to enable agent loop training for gpt-oss models by @HJSang in https://github.com/volcengine/verl/pull/3705
[rollout] feat: add default agent name for agent loop by @wuxibin89 in https://github.com/volcengine/verl/pull/3716
[rollout] chore: Misc changes for extending internal compatibility by @pengwu22 in https://github.com/volcengine/verl/pull/3701
[misc] feat: support build DataProto from TensordDict by @ji-huazhong in https://github.com/volcengine/verl/pull/3726
[misc] feat: support offline generation with server mode by @vermouth1992 in https://github.com/volcengine/verl/pull/3732
[misc] feat: prototype deprecate DataProto and replace with Tensordict: part 3 by @houminz in https://github.com/volcengine/verl/pull/3600
[ci] feat: increase sft e2e time by @vermouth1992 in https://github.com/volcengine/verl/pull/3738
[model] fix: qwen3vl training stuck with mixed text-image data by @HollowMan6 in https://github.com/volcengine/verl/pull/3734
[model] fix: qwen3vl models shape mismatch error with SP by @HollowMan6 in https://github.com/volcengine/verl/pull/3735
[fsdp,doc] refactor: rename warmup_style@FSDPOptimizerConfig -> lr_scheduler_type by @bxyang in https://github.com/volcengine/verl/pull/3739
[BREAKING][rollout, trainer, algo] feat: comprehensive rollout importance sampling implementation by @szrlee in https://github.com/volcengine/verl/pull/3694
[rollout] refactor: rename "clip" mode back to "mask" mode by @szrlee in https://github.com/volcengine/verl/pull/3750
[sglang, recipe] feat: add SGLang as rollout engine for one-step-off-policy by @KAMiPan in https://github.com/volcengine/verl/pull/3531
Add ARES and Revisual-R1 two awesome multimodal reasoning work using verl. by @CSfufu in https://github.com/volcengine/verl/pull/3755
Add Meta-Bandit-LLM, a long-horizon multiturn interative awesome use case of verl by @sanxing-chen in https://github.com/volcengine/verl/pull/3756

New Contributors

@NKcqx made their first contribution in https://github.com/volcengine/verl/pull/2685
@HelloWorld686 made their first contribution in https://github.com/volcengine/verl/pull/2484
@wizardlancet made their first contribution in https://github.com/volcengine/verl/pull/2726
@Tingberer made their first contribution in https://github.com/volcengine/verl/pull/2577
@MikeDean2367 made their first contribution in https://github.com/volcengine/verl/pull/2768
@kibitzing made their first contribution in https://github.com/volcengine/verl/pull/2777
@MzeroMiko made their first contribution in https://github.com/volcengine/verl/pull/2795
@clearhanhui made their first contribution in https://github.com/volcengine/verl/pull/2805
@panf2333 made their first contribution in https://github.com/volcengine/verl/pull/2849
@chi2liu made their first contribution in https://github.com/volcengine/verl/pull/2827
@wantbook-book made their first contribution in https://github.com/volcengine/verl/pull/2666
@Qiao0124 made their first contribution in https://github.com/volcengine/verl/pull/2476
@techkang made their first contribution in https://github.com/volcengine/verl/pull/2883
@looput made their first contribution in https://github.com/volcengine/verl/pull/2353
@EasonZhong668 made their first contribution in https://github.com/volcengine/verl/pull/2884
@TomQunChao made their first contribution in https://github.com/volcengine/verl/pull/2808
@ethen8181 made their first contribution in https://github.com/volcengine/verl/pull/2050
@wlf-darkmatter made their first contribution in https://github.com/volcengine/verl/pull/2602
@nariaki3551 made their first contribution in https://github.com/volcengine/verl/pull/2957
@xylcbd made their first contribution in https://github.com/volcengine/verl/pull/2430
@RasulAlakbarli made their first contribution in https://github.com/volcengine/verl/pull/2900
@zdhNarsil made their first contribution in https://github.com/volcengine/verl/pull/2881
@MrAta made their first contribution in https://github.com/volcengine/verl/pull/2868
@liqiongyu made their first contribution in https://github.com/volcengine/verl/pull/2985
@HaochenYuan made their first contribution in https://github.com/volcengine/verl/pull/3007
@Maxwell-Jia made their first contribution in https://github.com/volcengine/verl/pull/2398
@philippnormann made their first contribution in https://github.com/volcengine/verl/pull/3029
@JingchengYang4 made their first contribution in https://github.com/volcengine/verl/pull/3036
@codemayq made their first contribution in https://github.com/volcengine/verl/pull/3053
@A1waysBeenHere made their first contribution in https://github.com/volcengine/verl/pull/3051
@xxrjun made their first contribution in https://github.com/volcengine/verl/pull/3068
@zlH518 made their first contribution in https://github.com/volcengine/verl/pull/3092
@syt-nju made their first contribution in https://github.com/volcengine/verl/pull/3034
@sahilpatelsp made their first contribution in https://github.com/volcengine/verl/pull/3119
@Zzhiter made their first contribution in https://github.com/volcengine/verl/pull/3103
@Tialo made their first contribution in https://github.com/volcengine/verl/pull/3117
@gxy-gxy made their first contribution in https://github.com/volcengine/verl/pull/3133
@binary-husky made their first contribution in https://github.com/volcengine/verl/pull/3134
@KivenChen made their first contribution in https://github.com/volcengine/verl/pull/3141
@ZLiao097 made their first contribution in https://github.com/volcengine/verl/pull/3146
@fjosw made their first contribution in https://github.com/volcengine/verl/pull/3177
@YihongDong made their first contribution in https://github.com/volcengine/verl/pull/3197
@slimfrkha made their first contribution in https://github.com/volcengine/verl/pull/3202
@Shangwei-Li made their first contribution in https://github.com/volcengine/verl/pull/3189
@ZornWang made their first contribution in https://github.com/volcengine/verl/pull/3204
@sty-yyj made their first contribution in https://github.com/volcengine/verl/pull/3217
@yaof20 made their first contribution in https://github.com/volcengine/verl/pull/2953
@kfallah made their first contribution in https://github.com/volcengine/verl/pull/3231
@Sirius-L1 made their first contribution in https://github.com/volcengine/verl/pull/3242
@ChangyiYang made their first contribution in https://github.com/volcengine/verl/pull/3090
@rich-junwang made their first contribution in https://github.com/volcengine/verl/pull/3212
@kAIto47802 made their first contribution in https://github.com/volcengine/verl/pull/3274
@Mighten made their first contribution in https://github.com/volcengine/verl/pull/3272
@echo-rain made their first contribution in https://github.com/volcengine/verl/pull/3152
@lantian7 made their first contribution in https://github.com/volcengine/verl/pull/3317
@BlankCheng made their first contribution in https://github.com/volcengine/verl/pull/3287
@baymax591 made their first contribution in https://github.com/volcengine/verl/pull/3328
@alex000kim made their first contribution in https://github.com/volcengine/verl/pull/3333
@fshp971 made their first contribution in https://github.com/volcengine/verl/pull/3404
@mnoukhov made their first contribution in https://github.com/volcengine/verl/pull/3414
@Roseisrosie made their first contribution in https://github.com/volcengine/verl/pull/3355
@moehanabi made their first contribution in https://github.com/volcengine/verl/pull/3348
@maijia-cwh made their first contribution in https://github.com/volcengine/verl/pull/3178
@puneeshkhanna made their first contribution in https://github.com/volcengine/verl/pull/3332
@EduardDurech made their first contribution in https://github.com/volcengine/verl/pull/3295
@xvxuopop made their first contribution in https://github.com/volcengine/verl/pull/3502
@sharonyu-115 made their first contribution in https://github.com/volcengine/verl/pull/3513
@1195343015 made their first contribution in https://github.com/volcengine/verl/pull/3535
@NuoJohnChen made their first contribution in https://github.com/volcengine/verl/pull/3543
@zlwang-cs made their first contribution in https://github.com/volcengine/verl/pull/3556
@piood made their first contribution in https://github.com/volcengine/verl/pull/3541
@houminz made their first contribution in https://github.com/volcengine/verl/pull/3567
@Wuyxin made their first contribution in https://github.com/volcengine/verl/pull/3574
@mgilmore-relace made their first contribution in https://github.com/volcengine/verl/pull/3604
@FlowRays made their first contribution in https://github.com/volcengine/verl/pull/3627
@ZhichaoWang970201 made their first contribution in https://github.com/volcengine/verl/pull/3369
@lambertwjh made their first contribution in https://github.com/volcengine/verl/pull/3291
@CedricHwong made their first contribution in https://github.com/volcengine/verl/pull/3635
@jiaqiw09 made their first contribution in https://github.com/volcengine/verl/pull/3644
@lbk-sys made their first contribution in https://github.com/volcengine/verl/pull/3506
@m-Just made their first contribution in https://github.com/volcengine/verl/pull/3653
@HJSang made their first contribution in https://github.com/volcengine/verl/pull/3661
@pandengyao made their first contribution in https://github.com/volcengine/verl/pull/3657
@szrlee made their first contribution in https://github.com/volcengine/verl/pull/3699
@pourion made their first contribution in https://github.com/volcengine/verl/pull/3712
@pengwu22 made their first contribution in https://github.com/volcengine/verl/pull/3701
@bxyang made their first contribution in https://github.com/volcengine/verl/pull/3739
@KAMiPan made their first contribution in https://github.com/volcengine/verl/pull/3531
@CSfufu made their first contribution in https://github.com/volcengine/verl/pull/3755
@sanxing-chen made their first contribution in https://github.com/volcengine/verl/pull/3756

Full Changelog: https://github.com/volcengine/verl/compare/v0.5.0...v0.6.0

What's Changed

[cfg] refactor: add ActorConfig, EngineConfig, and ActorWorker unit test, refactor validation code by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2621
[ci] test: add CriticWorker unit test, make some util CPU friendly by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2717
[ray] feat: RayWorkerGroup support set worker env by @NKcqx in https://github.com/volcengine/verl/pull/2685
[sglang] fix: Adding strict naming sanity for sglang by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2719
[misc] chore: bump main branch version to v0.5.0.dev by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2718
[megatron] fix: resolve backward propagation error in megatron_actor due to shared logits tensor in-place modification by @HelloWorld686 in https://github.com/volcengine/verl/pull/2484
[tool] fix: geo3k create return by @nanjiangwill in https://github.com/volcengine/verl/pull/2714
[doc] feat: Add agent-lightning in the list of "awesome works using verl by @wizardlancet in https://github.com/volcengine/verl/pull/2726
[ci] fix: checkpoint_convertor ci miss a hf model download by @ETOgaosion in https://github.com/volcengine/verl/pull/2730
[recipe] chore: add retool training script by @wuxibin89 in https://github.com/volcengine/verl/pull/2732
[ci] fix: release ascend test time, fix one step off-policy CI by @ETOgaosion in https://github.com/volcengine/verl/pull/2731
[doc] feat: add resizable sidebar and improve layout by @Tingberer in https://github.com/volcengine/verl/pull/2577
[docker] feat: upgrade to torch 2.7, sglang 0.4.8 by @ETOgaosion in https://github.com/volcengine/verl/pull/2617
[megatron] feat: a bunch of optimzation on vram, sequence packing by @ISEEKYAN in https://github.com/volcengine/verl/pull/2678
[CI] feat: add mypy to pre-commit by @frrad in https://github.com/volcengine/verl/pull/2614
[doc] style: change resize handle from gradient to plain color by @Tingberer in https://github.com/volcengine/verl/pull/2746
refactor: Make sure to keep the type checking by @YeonwooSung in https://github.com/volcengine/verl/pull/2634
[rollout] feat: remove chat scheduler by @wuxibin89 in https://github.com/volcengine/verl/pull/2725
[perf] feat: add optional role selection in discrete mode for NPU Profiler by @tongtong0613 in https://github.com/volcengine/verl/pull/2750
[doc] feat: add retool blog by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2761
[algo] refactor: don't special-case compute_policy_loss by @frrad in https://github.com/volcengine/verl/pull/2701
[BREAKING] [rollout] chore: remove default rollout selection by @vermouth1992 in https://github.com/volcengine/verl/pull/2757
[misc] fix: Handle N-D arrays and complex objects in union_numpy_dict by @MikeDean2367 in https://github.com/volcengine/verl/pull/2768
[recipe] fix: fix retool SFT dataset by @vermouth1992 in https://github.com/volcengine/verl/pull/2764
[doc] fix: fix typo in agentic RL documentation by @kibitzing in https://github.com/volcengine/verl/pull/2777
[cfg] fix: fix failing rollout config test on main by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2771
[docker] feat: upgrade vllm to 0.9.1 by @ETOgaosion in https://github.com/volcengine/verl/pull/2747
[recipe] fix: fix issue when running split ppo by @as12138 in https://github.com/volcengine/verl/pull/2745
[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process by @none0663 in https://github.com/volcengine/verl/pull/2739
[recipe] feat: add QWen2.5-7b-instruct retool by @vermouth1992 in https://github.com/volcengine/verl/pull/2800
[recipe] feat: @register_policy_loss("geo_mean"); Geometric-Mean Policy Optimization by @MzeroMiko in https://github.com/volcengine/verl/pull/2795
[tool] fix: Typo fix -- Rename to_openai_function_tool_schema to get_openai_tool_schema by @wizeng23 in https://github.com/volcengine/verl/pull/2806
[perf] feat: Padding before batch post-process in agent-loop to save time by @PopSoda2002 in https://github.com/volcengine/verl/pull/2773
[vllm,rollout] fix: vllm rollout lock file permission by @clearhanhui in https://github.com/volcengine/verl/pull/2805
[training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn by @kibitzing in https://github.com/volcengine/verl/pull/2741
[vllm] fix: verl + vllm-ascend(version 0.9.1) running failed issue by @leo-pony in https://github.com/volcengine/verl/pull/2782
Revert "[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process" by @ETOgaosion in https://github.com/volcengine/verl/pull/2813
[algo] feat: add GSPO-token policy loss computation function by @0x404 in https://github.com/volcengine/verl/pull/2775
[sglang] fix: support the configuration of attention_backend in sglang by @tardis-key in https://github.com/volcengine/verl/pull/2818
[rollout] feat: pass all dataset fields to agent loop run by @wuxibin89 in https://github.com/volcengine/verl/pull/2810
[docker] feat: Upgrade sglang 0.4.9 + transformers 4.53.2 by @ETOgaosion in https://github.com/volcengine/verl/pull/2794
[sglang] fix: fix missing engine_kwargs by @vermouth1992 in https://github.com/volcengine/verl/pull/2823
[perf, doc] feat: Add profiling continous steps in one database by @davidmlw in https://github.com/volcengine/verl/pull/2695
[ci] fix: vllm no dataset by @ETOgaosion in https://github.com/volcengine/verl/pull/2831
[tool] fix: load MCP tools in async rollout mode by @mathewjhan in https://github.com/volcengine/verl/pull/2821
[rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset by @vllbc in https://github.com/volcengine/verl/pull/2740
[CI] feat: update npu image to vLLM-ascend-v0.7.3.post1+mindspeed0.12.1 by @Crispig in https://github.com/volcengine/verl/pull/2838
[training_utils] feat: Support assert_case for sandbox fusion by @HollowMan6 in https://github.com/volcengine/verl/pull/2374
[recipe] feat: support qwen3-8B/14B DAPO training on ASCEND NPU by @zhihe-wang in https://github.com/volcengine/verl/pull/2836
[doc] feat: add verl multinode SkyPilot example by @panf2333 in https://github.com/volcengine/verl/pull/2849
[megatron] feat: Add MindSpeed support on the NPU device by @CurryRice233 in https://github.com/volcengine/verl/pull/2707
[misc] feat: optimize GRPO-family algorithms with torch.stack and improve tensor creation consistency by @chi2liu in https://github.com/volcengine/verl/pull/2827
[fsdp] feat: optimize fsdp2 by @vermouth1992 in https://github.com/volcengine/verl/pull/2843
[recipe] feat: modify dapo deepseek megatron script by @vermouth1992 in https://github.com/volcengine/verl/pull/2711
[megatron] fix: remove the demising critic.model.enable_gradient_checkpointing flags in the scripts by @HollowMan6 in https://github.com/volcengine/verl/pull/2864
[fsdp,megatron,sglang] feat: Accelerate and Simplify Update weights logic and bump SGLang to 0.4.9.post6 by @hebiao064 in https://github.com/volcengine/verl/pull/2720
[ci] fix: fix fsdp test in transformers 4.54.1 by @vermouth1992 in https://github.com/volcengine/verl/pull/2874
[trainer, hardware] chore: add pin_memory_device when pin_memory is enabled by @zheliuyu in https://github.com/volcengine/verl/pull/2871
[data] feat: dump train/test example as JSON by @wantbook-book in https://github.com/volcengine/verl/pull/2666
[misc] refactor: Add AbstractRewardManager abstract class by @frrad in https://github.com/volcengine/verl/pull/2763
[doc] fix: Fix the role assignment error in the interaction demo file and doc. by @Qiao0124 in https://github.com/volcengine/verl/pull/2476
[trainer, ci] fix: fix error variable in new engine impl and add ci test by @ShareLer in https://github.com/volcengine/verl/pull/2647
[misc] feat: add nccl timeout configuration to fsdp workers by @shinytang6 in https://github.com/volcengine/verl/pull/2321
[trainer] fix: move UID generation before batch processing for future conditioning support by @nanjiangwill in https://github.com/volcengine/verl/pull/2880
[sglang] chore: bump transformer formers 4.54.0 and fix QWen VL issues by @hebiao064 in https://github.com/volcengine/verl/pull/2869
[doc] fix: multi turn argument is not available by @techkang in https://github.com/volcengine/verl/pull/2883
[tool, sglang] feat: add tool create info by @nanjiangwill in https://github.com/volcengine/verl/pull/2870
[trainer] chore: Add ground truth data to generation dumps in RayPPOTrainer by @looput in https://github.com/volcengine/verl/pull/2353
[ci] fix: retry type check on cpu by @ETOgaosion in https://github.com/volcengine/verl/pull/2887
[fsdp, trainer] fix: save config parameters to wandb in SFT by @EasonZhong668 in https://github.com/volcengine/verl/pull/2884
[misc] feat: support logging rollout prob vs. actor probs in multi-turn for debugging purpose, follow up of [#1712] by @TomQunChao in https://github.com/volcengine/verl/pull/2808
[FSDP] feat: Allows specifying a different reference model by @ethen8181 in https://github.com/volcengine/verl/pull/2050
[rollout] feat: add rollout_skip to skip rollout by reusing previously generated sequences by @wlf-darkmatter in https://github.com/volcengine/verl/pull/2602
[ray] feat: support directly register dispatch device mesh by @vermouth1992 in https://github.com/volcengine/verl/pull/2893
[doc] fix: Specify rollout engine in quickstart.rst by @TonyLianLong in https://github.com/volcengine/verl/pull/2905
[BREAKING] [ray, megatron] feat: remove RayMegatronWorker by @vermouth1992 in https://github.com/volcengine/verl/pull/2895
[megatron] refactor: simplify module init in megatron_workers, extract common operations by @ETOgaosion in https://github.com/volcengine/verl/pull/2400
[rollout, sglang] fix: fix encoding logic bug by @nanjiangwill in https://github.com/volcengine/verl/pull/2901
[megatron] fix: qwen2vl megatron fused forward param bug by @Yangruipis in https://github.com/volcengine/verl/pull/2595
[sglang] fix: remove unnecessary maybe_set_triton_cache_manager by @hebiao064 in https://github.com/volcengine/verl/pull/2926
[misc] refactor: deprecate sharding manager (part 1) by @vermouth1992 in https://github.com/volcengine/verl/pull/2912
[megatron] feat: support for pipeline layout with vpp in mcore 0.13.0 by @yzlnew in https://github.com/volcengine/verl/pull/2749
[fsdp] fix: call reshard() to resolve no shard attribute by @weifengpy in https://github.com/volcengine/verl/pull/2941
[megatron] chore: update example 671B script, no offline dist-ckpt needed any more by @ISEEKYAN in https://github.com/volcengine/verl/pull/2945
[tool] feat: handle cases when func calling without params by @Tavish9 in https://github.com/volcengine/verl/pull/2936
[sglang] feat: add dapo multi-turn as alternative baseline by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2952
[megatron] fix: retain MLA config in mcore config converter by @Yangruipis in https://github.com/volcengine/verl/pull/2933
[ci] fix: limit e2e_one_step_off_policy timeout by @ETOgaosion in https://github.com/volcengine/verl/pull/2964
[rollout] fix: Fix local rank binding issue when setting RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES by @Crispig in https://github.com/volcengine/verl/pull/2967
[doc] fix: fix typo in docs/preparation/prepare_data.rst by @nariaki3551 in https://github.com/volcengine/verl/pull/2957
[misc] fix: fix DataProto getstate bug by @vermouth1992 in https://github.com/volcengine/verl/pull/2962
[sglang] fix: Fix No command 'hf' found for dapo multi-turn as alternative baseline by @none0663 in https://github.com/volcengine/verl/pull/2973
[megatron] feat: Allow override optimizer config by @ETOgaosion in https://github.com/volcengine/verl/pull/2959
[rollout] feat: add cudagraph_capture_sizes option to customize cuda graph memory by @chenhaiq in https://github.com/volcengine/verl/pull/2956
[trainer] refactor: make main_ppo TaskRunner more modular by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2885
[data] fix: fix bug of '_io.BytesIO' object has no attribute 'startswith' by @xylcbd in https://github.com/volcengine/verl/pull/2430
[trainer] fix: only load memory in micro batch by @chenhaiq in https://github.com/volcengine/verl/pull/2908
[misc] feat: Added: "tensorboard" to the requirements.txt by @RasulAlakbarli in https://github.com/volcengine/verl/pull/2900
[ray, trainer] fix: fix working_dir when launching via uv by @Tavish9 in https://github.com/volcengine/verl/pull/2859
[rollout,vllm] fix: max_num_seqs not take effect by @wuxibin89 in https://github.com/volcengine/verl/pull/2960
[rollout,trainer] feat: offload param before wake up inference engine by @chenhaiq in https://github.com/volcengine/verl/pull/2977
[doc] feat: update contact and news by @eric-haibin-lin in https://github.com/volcengine/verl/pull/2993
[rollout] fix: avoid repeated multiplication by n for GRPO by @zdhNarsil in https://github.com/volcengine/verl/pull/2881
[BREAKING] [perf] refactor: Profiler api refactor by @ETOgaosion in https://github.com/volcengine/verl/pull/2894
[ray] fix: Fix function name in worker helper by @MrAta in https://github.com/volcengine/verl/pull/2868
[model] fix: Handle flash_attn_supports_top_left_mask import for older transformers by @liqiongyu in https://github.com/volcengine/verl/pull/2985
[trainer] feat: Specify apply_chat_template_kwargs from config by @HollowMan6 in https://github.com/volcengine/verl/pull/2998
[rollout,vllm] feat: unify vllm and sglang method to async by @wuxibin89 in https://github.com/volcengine/verl/pull/2982
[sglang]fix: Reduce memory footprint during rollout by adding load_grad=False when loading megatron weights. by @HaochenYuan in https://github.com/volcengine/verl/pull/3007
[perf] refactor: part 2 - Profiler ci test and fixes by @ETOgaosion in https://github.com/volcengine/verl/pull/3001
[recipe] feat: add deepeyes recipe by @Maxwell-Jia in https://github.com/volcengine/verl/pull/2398
[trainer] fix: reduce memory footprint by moving data to the device only in mini batch by @ji-huazhong in https://github.com/volcengine/verl/pull/3011
[ci] fix: add flash_attn_supports_top_left_mask to ignore list by @vermouth1992 in https://github.com/volcengine/verl/pull/3004
[misc] feat: Support trackio by @yzlnew in https://github.com/volcengine/verl/pull/3017
[perf] feat: Add rollout longtail observation metrics by @ETOgaosion in https://github.com/volcengine/verl/pull/3009
[rollout] fix: Add soft node affinity to the agent loop workers by @JoostvDoorn in https://github.com/volcengine/verl/pull/3006
[misc] chore: add gpu memory to deepseek script by @vermouth1992 in https://github.com/volcengine/verl/pull/3022
[misc] chore: add GPU memory to names that train large models by @vermouth1992 in https://github.com/volcengine/verl/pull/3023
[rollout] feat: add rollout config by @vermouth1992 in https://github.com/volcengine/verl/pull/3010
[hardware, recipe] chore: support retool sft &update peft sft perf on npu by @zheliuyu in https://github.com/volcengine/verl/pull/3000
[trainer,rollout,doc] feat: reduce minimum gpus to 96 for deepseek-v3 by @techkang in https://github.com/volcengine/verl/pull/3019
[recipe] fix: make LangGraph agent example runnable out-of-the-box by @philippnormann in https://github.com/volcengine/verl/pull/3029
[ci] fix: try fix vllm test network issue by @ETOgaosion in https://github.com/volcengine/verl/pull/3031
[fsdp] fix: set _set_allocator_settings to True to avoid fsdp2 oom by @chenhaiq in https://github.com/volcengine/verl/pull/3020
[doc] feat: Add VTool-R1 in the list of "awesome works using verl by @JingchengYang4 in https://github.com/volcengine/verl/pull/3036
[misc] feat: add B200 and GB200 flops count by @vermouth1992 in https://github.com/volcengine/verl/pull/3041
[rollout] feat: support over sampling rollout in SGLang Rollout by @zhaochenyang20 in https://github.com/volcengine/verl/pull/2929
[doc] feat: add benchmark for deepseek by @techkang in https://github.com/volcengine/verl/pull/3046
[rollout] feat: remove over-catched errors in SGLang rollout by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3047
[rollout,vllm] feat: support multi-modal in agent loop by @wuxibin89 in https://github.com/volcengine/verl/pull/3016
[hardware] add flops count support for A3 device by @codemayq in https://github.com/volcengine/verl/pull/3053
[trainer] fix: Remove redundant 'data.to()' codes by @A1waysBeenHere in https://github.com/volcengine/verl/pull/3051
[BREAKING][rollout] feat: allow users pass all vllm/sglang engine args by @techkang in https://github.com/volcengine/verl/pull/3037
[doc] fix: optimize ascend docs by @zheliuyu in https://github.com/volcengine/verl/pull/3063
[ray] feat: remove worker group register center by @wuxibin89 in https://github.com/volcengine/verl/pull/3066
[tool] fix: support non-ascii characters in search results by @Necolizer in https://github.com/volcengine/verl/pull/3044
[ray] feat: add support for ray init kwargs by @Tavish9 in https://github.com/volcengine/verl/pull/3049
[rollout] fix: vllm sleep level=2 bug by @techkang in https://github.com/volcengine/verl/pull/3082
[fsdp] fix: add missing mixed precision configuration to FSDPEngineConfig by @xxrjun in https://github.com/volcengine/verl/pull/3068
[fsdp] fix: patch fsdp2 to support hf transformer==4.54.0 and above by @weifengpy in https://github.com/volcengine/verl/pull/3072
[sglang] fix: Qwen VLM Baseline by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3083
Update ray_trainer.py by @zlH518 in https://github.com/volcengine/verl/pull/3092
[sglang] fix: Qwen VLM Baseline and sgl CI by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3101
[BREAKING] [rollout] feat: add a separate rollout worker by @vermouth1992 in https://github.com/volcengine/verl/pull/3071
[recipe] fix: checkpoint in last step might be ignored to save in dapo by @syt-nju in https://github.com/volcengine/verl/pull/3034
[fsdp, trainer, ckpt] feat: support custom model init and merging for FSDP by @Tavish9 in https://github.com/volcengine/verl/pull/3012
[perf] fix: fix npu profiler and add mstx UT by @tongtong0613 in https://github.com/volcengine/verl/pull/3052
[doc] feat: Add Kimina-Prover-RL to awesome work by @thibautbar in https://github.com/volcengine/verl/pull/3108
[misc] fix: fix precommit by @vermouth1992 in https://github.com/volcengine/verl/pull/3109
[doc, perf] feat: add profiling doc by @ETOgaosion in https://github.com/volcengine/verl/pull/3113
[trainer, worker] fix: setting old log probs equal to log probs for on policy training by @sahilpatelsp in https://github.com/volcengine/verl/pull/3119
Fix python version by @Zzhiter in https://github.com/volcengine/verl/pull/3103
[trainer] fix: only load memory in micro batch for megatron backend by @none0663 in https://github.com/volcengine/verl/pull/3106
[rollout] feat: use rollout worker in MegatronWorker by @vermouth1992 in https://github.com/volcengine/verl/pull/3111
[rollout] feat: compute reward score in agent loop by @wuxibin89 in https://github.com/volcengine/verl/pull/3055
[ci] fix: fix precommit by @vermouth1992 in https://github.com/volcengine/verl/pull/3128
[trainer] fix: only load memory in micro batch for compute_log_prob, compute_values and update_critic by @none0663 in https://github.com/volcengine/verl/pull/3094
[trainer] fix: move testing out of step timings by @Tialo in https://github.com/volcengine/verl/pull/3117
[megatron] fix: add temperature parameter for logits scaling by @gxy-gxy in https://github.com/volcengine/verl/pull/3133
[megatron] fix: mbridge save/load by @ETOgaosion in https://github.com/volcengine/verl/pull/2519
[recipe] fix: make compute of step consistent across all trainers by @Tialo in https://github.com/volcengine/verl/pull/3132
[misc] fix: update peft's version in requirements-npu.txt by @zheliuyu in https://github.com/volcengine/verl/pull/3127
[rollout] fix: numpy.int64 serialization error in Weave tracing during validation by @U-rara in https://github.com/volcengine/verl/pull/3112
[sglang] feat: make sglang properly handle the max_num_seqs configuration by @binary-husky in https://github.com/volcengine/verl/pull/3134
[doc] feat: documentation Update, Ray Job Management Commands by @none0663 in https://github.com/volcengine/verl/pull/3131
[ci] fix: model tests, transformers 4.55 has troubles with backward by @ETOgaosion in https://github.com/volcengine/verl/pull/3139
[megatron] fix: fix megatron micro_batch_size assertion by @vermouth1992 in https://github.com/volcengine/verl/pull/3142
[rollout] fix: KeyError "CPU" init agent loop workers by @KivenChen in https://github.com/volcengine/verl/pull/3141
[fsdp, sglang] fix: Using Agreesive Empty Cache instead by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3136
[recipe] feat: support qwen2.5-32B DAPO training script on ASCEND NPU by @ZLiao097 in https://github.com/volcengine/verl/pull/3146
[rollout] feat: add response token logprobs in agent loop output by @wuxibin89 in https://github.com/volcengine/verl/pull/3151
[fsdp, trainer, tool] feat: add memory snapshot & visualization support for debugging GPU memory leaks by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3099
[sglang] fix: fall back to default FSDP1 by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3156
[sglang] fix: remove unused padding in SGLang rollout by @PopSoda2002 in https://github.com/volcengine/verl/pull/3138
[doc] fix: add qwen3moe-30b script and fix error in qwen3-235b by @chenhaiq in https://github.com/volcengine/verl/pull/3174
[misc] feat: Add L40S and A40 flop counts by @fjosw in https://github.com/volcengine/verl/pull/3177
[megatron] feat: set_expandable_segments for megatron by @vermouth1992 in https://github.com/volcengine/verl/pull/3181
[WIP]: Setting DAPO baseline in SGLang multi-turn RL by @zhaochenyang20 in https://github.com/volcengine/verl/pull/3175
[Optimize]Safe tool parameter access standardization in SGLang rollout by @Zzhiter in https://github.com/volcengine/verl/pull/3196
[misc] feat: Add RL-PLUS to awesome work list by @YihongDong in https://github.com/volcengine/verl/pull/3197
[rollout] feat: use dummy load_format when init AsyncServer by @vermouth1992 in https://github.com/volcengine/verl/pull/3184
[rollout, sglang] feat: Add sync mode for bash by @PopSoda2002 in https://github.com/volcengine/verl/pull/3186
[rollout] fix: add missing extra_reward_info to AgentLoopOuput by @wuxibin89 in https://github.com/volcengine/verl/pull/3194
[doc] fix: set use_dist_checkpointing to False for ref model in qwen3moe-30b script by @none0663 in https://github.com/volcengine/verl/pull/3198
[env] fix: Improve License Check Hook Flexibility by @slimfrkha in https://github.com/volcengine/verl/pull/3202
Revert "[rollout] feat: use dummy load_format when init AsyncServer" by @vermouth1992 in https://github.com/volcengine/verl/pull/3207
[recipe] feat: Add Qwen3 30B MoE NPU recipe by @Shangwei-Li in https://github.com/volcengine/verl/pull/3189
[perf] fix: fix profiler discrete mode unavailability by @tongtong0613 in https://github.com/volcengine/verl/pull/3188
[docker] feat: update to vllm 0.10.0, mcore 0.13, transformers 4.55.4 by @ETOgaosion in https://github.com/volcengine/verl/pull/3192
[data] fix: update parquet_files type check to support multi-file input by @looput in https://github.com/volcengine/verl/pull/3211
[rollout] fix: apply copy_to_local before init hf config by @ZornWang in https://github.com/volcengine/verl/pull/3204
[doc] fix: fix a documentation typo for nsys by @davidmlw in https://github.com/volcengine/verl/pull/3214
[trainer] refactor: PPO config validation fast fail by @slimfrkha in https://github.com/volcengine/verl/pull/3187
[megatron] refactor: refactor MegatronPPOActor by @vermouth1992 in https://github.com/volcengine/verl/pull/3206
[env, sglang] feat: Bump new sglang version to fix vlm OOM by @PopSoda2002 in https://github.com/volcengine/verl/pull/3216
[ci] fix: fix type convergence check by @ETOgaosion in https://github.com/volcengine/verl/pull/3219
[rollout] fix: Restore the parameter 'limit_images' in RolloutConfig by @sty-yyj in https://github.com/volcengine/verl/pull/3217
[BREAKING][vllm, fsdp] feat: add Rollout-Training Mismatch Fix -- Truncated importance sampling by @yaof20 in https://github.com/volcengine/verl/pull/2953
[doc] fix: fix slack invitation link by @eric-haibin-lin in https://github.com/volcengine/verl/pull/3230
[trainer] fix: Unified use of the def to() in Class DataProto by @A1waysBeenHere in https://github.com/volcengine/verl/pull/3227
[fsdp, training_utils] Fix: LoRA w/ VLMs when Using Layered Summon by @kfallah in https://github.com/volcengine/verl/pull/3231
[recipe] feat: Add InfiGUI-G1 recipe for MLLM GUI grounding by @Sirius-L1 in https://github.com/volcengine/verl/pull/3242
[sglang] feat: add native sgl server by @ChangyiYang in https://github.com/volcengine/verl/pull/3090
[megatron, model] feat: add MegatronEngine, MegatronEngineForCausalLM by @vermouth1992 in https://github.com/volcengine/verl/pull/3235
[hardware] fix: Call synchronization when using the td.to("cpu") operation on NPU to avoid potential precision issues by @ji-huazhong in https://github.com/volcengine/verl/pull/3222
[ckpt] fix: TypeError when save VL model ckpt by @Maxwell-Jia in https://github.com/volcengine/verl/pull/3268
[recipe] fix: Remove redundant parameters to resolve errors in the script caused by the latest Verl main branch. by @ZLiao097 in https://github.com/volcengine/verl/pull/3252
add gptoss grpo example script by @rich-junwang in https://github.com/volcengine/verl/pull/3212
[worker] fix: Fix missing rollout_log_probs argument in policy loss functions by @kAIto47802 in https://github.com/volcengine/verl/pull/3274
[data] fix: None has no attribute get when extra_info in Parquet is NaN by @Mighten in https://github.com/volcengine/verl/pull/3272
[misc] fix: use uid for grouping in validation to avoid prompt confusion in multimodal tasks by @Maxwell-Jia in https://github.com/volcengine/verl/pull/3280
[training_utils] fix: allow empty image_key/video_key in rl dataset by @HollowMan6 in https://github.com/volcengine/verl/pull/3281
[hardware] fix: update source in dockerfile.rocm by @yushengsu-thu in https://github.com/volcengine/verl/pull/3284
[fsdp] feat: add NPU fusion kernels for Qwen3 MoE by @Shangwei-Li in https://github.com/volcengine/verl/pull/3221
[fsdp, model] feat: support FSDP model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3270
[rollout] feat: Refactor agentloop multiturn by @plutoZZZZ in https://github.com/volcengine/verl/pull/3171
[perf] feat: add npu silu &expand the scope of patch models by @zheliuyu in https://github.com/volcengine/verl/pull/3260
[doc] fix: add rStar2-Agent as work using verl by @feifeibear in https://github.com/volcengine/verl/pull/3298
[BREAKING][rollout] feat: Added asynchronous reward model calculation in agent loop by @echo-rain in https://github.com/volcengine/verl/pull/3152
[ci, model] feat: add qwen3 CI testcase on ASCEND NPU by @tardis-key in https://github.com/volcengine/verl/pull/3300
[single_controller, ray] fix: shut ray down after initializes it by @lantian7 in https://github.com/volcengine/verl/pull/3317
[rollout] feat: deprecate all rollout sharding manager by @wuxibin89 in https://github.com/volcengine/verl/pull/3285
[trainer] fix: ray.state.available_resources_per_node is deprecated by @HollowMan6 in https://github.com/volcengine/verl/pull/3313
[trainer] fix: Correct off-by-one error in SFT loss mask slicing by @BlankCheng in https://github.com/volcengine/verl/pull/3287
[doc] feat: Adding PACS to the Awesome work by @Geaming2002 in https://github.com/volcengine/verl/pull/3327
[model] feat: polish model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3321
[misc] feat: create issue template for verl by @techkang in https://github.com/volcengine/verl/pull/3330
[doc]Update README.md, add related works by @The-Hierophant in https://github.com/volcengine/verl/pull/3331
[recipe] fix: bugfix of refactor omissions by @baymax591 in https://github.com/volcengine/verl/pull/3328
[training_utils] fix: Using a non-tuple sequence for multidimensional indexing is deprecated by @HollowMan6 in https://github.com/volcengine/verl/pull/3314
[recipe] fix: (dapo_ray_trainer) use global_steps to determine is_last_step when resuming (gen_steps not restored) by @zpqiu in https://github.com/volcengine/verl/pull/3336
[deployment, doc] feat: Add SkyPilot integration examples by @alex000kim in https://github.com/volcengine/verl/pull/3333
[doc] fix: Update skypilot_examples.rst by @vermouth1992 in https://github.com/volcengine/verl/pull/3344
[trainer] feat: support sft_trainer with model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3341
[model] feat: support ByteDance Seed-OSS 36B model by @chenhaiq in https://github.com/volcengine/verl/pull/3347
[vllm] fix: verl + vllm-ascend(version 0.9.1) running failed issue by @baymax591 in https://github.com/volcengine/verl/pull/3345
[rollout, vllm, sglang] fix: allow user customization of repetition_penalty to avoid watchdog timeout during GRPO rollout by @Mighten in https://github.com/volcengine/verl/pull/3309
[doc] fix: fix typo in skypilot_examples.rst by @alex000kim in https://github.com/volcengine/verl/pull/3368
[trainer] feat: add CI for accuracy alignment of SFT trainer with model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3363
[worker,sglang] refactor: deprecate fsdp/megatron reward model with server mode by @yyDing1 in https://github.com/volcengine/verl/pull/3352
[training_utils] fix: stop using math naming under reward score" by @HollowMan6 in https://github.com/volcengine/verl/pull/3378
[misc] fix: set default value of ETP to 1 by @vermouth1992 in https://github.com/volcengine/verl/pull/3371
[deployment] Fix deepseek671B grpo script by @HaochenYuan in https://github.com/volcengine/verl/pull/3383
[trainer] fix: Fix ClearML logging by @Tialo in https://github.com/volcengine/verl/pull/3384
[model, megatron] feat: Add glm air support and make new model directly use mbridge by @ETOgaosion in https://github.com/volcengine/verl/pull/3359
[ci] fix: cpu unit test, etp config breaking change by @ETOgaosion in https://github.com/volcengine/verl/pull/3390
[model] refactor: polishing FSDP model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3394
[model] feat: polish megatron engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3401
[trainer] fix: avoid loading duplicated custom reward function to fix issue [#3150] by @fshp971 in https://github.com/volcengine/verl/pull/3404
[doc] fix: edit one step off policy readme with original work by @mnoukhov in https://github.com/volcengine/verl/pull/3414
[ci] refactor: add ci test for refactored reward worker and add some args to GenRM config by @yyDing1 in https://github.com/volcengine/verl/pull/3385
[vllm] fix: use VLLM_SLEEP_LEVEL=1 on ASCEND NPU by @Roseisrosie in https://github.com/volcengine/verl/pull/3355
[rollout] chore: Add enable_prefix_caching into config by @wlf-darkmatter in https://github.com/volcengine/verl/pull/3395
[rollout] fix: raise error if processing multimodal data without vlm processor by @techkang in https://github.com/volcengine/verl/pull/3370
[recipe] fix: Add gts argument for recipe _dump_generations by @moehanabi in https://github.com/volcengine/verl/pull/3348
[misc] feat: prototype deprecate DataProto and replace with Tensordict: part 1 by @vermouth1992 in https://github.com/volcengine/verl/pull/2733
[fsdp, recipe] feat: add grpo reward model example using HH-RLHF dataset by @ccclyu in https://github.com/volcengine/verl/pull/3417
[model] feat: replace DataProto with TensorDict in engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3422
[tool] feat: support local gsm8k dataset in example/data_preprocess by @tardis-key in https://github.com/volcengine/verl/pull/3362
[worker] refactor: move the implementation of rm to workers.roles and polish by @yyDing1 in https://github.com/volcengine/verl/pull/3423
[doc] feat: add SimpleVLA-RL link in readme by @feifeibear in https://github.com/volcengine/verl/pull/3433
[doc] fix: table column in document by @chenhaiq in https://github.com/volcengine/verl/pull/3430
[worker, sglang] feat: support generative reward model (server mode) by @yyDing1 in https://github.com/volcengine/verl/pull/3441
[trainer] feat: VL support freeze vision model by @maijia-cwh in https://github.com/volcengine/verl/pull/3178
[trainer] fix: Loss calculations for grad accumulation steps by @puneeshkhanna in https://github.com/volcengine/verl/pull/3332
[worker] fix: respect free_cache_engine flag by @HollowMan6 in https://github.com/volcengine/verl/pull/3442
[ci] feat: move more tests to volcano engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3455
[sglang, tool] fix: fix text only bug by @nanjiangwill in https://github.com/volcengine/verl/pull/3448
[trainer, fsdp, megatron] feat: Support one step off async rl on Ascend NPU by @ji-huazhong in https://github.com/volcengine/verl/pull/2924
[trainer,rollout] fix: model weights will not be loaded when vllm_sleep_level=2 and using lora by @techkang in https://github.com/volcengine/verl/pull/3461
[model] feat: Add Apertus by @EduardDurech in https://github.com/volcengine/verl/pull/3295
[model] feat: add FSDP/Megatron critic worker with model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3439
[megatron,recipe] feat: support Qwen3-30B (MoE) DAPO training on ASCEND NPU by @wlf-darkmatter in https://github.com/volcengine/verl/pull/3203
[sglang, rollout] feat: enable token-in-token-out for SGLang engine by @nanjiangwill in https://github.com/volcengine/verl/pull/2759
[model] feat: add qwen3 grpo 8b/32b script on ASCEND NPU by @tardis-key in https://github.com/volcengine/verl/pull/3310
[ci] chore: add codeowner by @tardis-key in https://github.com/volcengine/verl/pull/3473
[rollout] fix: make agent loop reward worker thread-safe by @wuxibin89 in https://github.com/volcengine/verl/pull/3454
[perf, megatron] chore: bind NUMA by @conver334 in https://github.com/volcengine/verl/pull/3471
[ray] refactor: Accelerate Tensor serialization by converting to np.ndarray by @baymax591 in https://github.com/volcengine/verl/pull/3425
[1/N][rollout] feat: support vllm/sglang native http server by @wuxibin89 in https://github.com/volcengine/verl/pull/3456
[perf] fix: Init some attrs earlier in Profiler by @moehanabi in https://github.com/volcengine/verl/pull/3482
[perf, megatron] fix: bugfix if nvml can not import by @baymax591 in https://github.com/volcengine/verl/pull/3490
[ray, single_controller] refactor: Accelerate ray.put with thread by @baymax591 in https://github.com/volcengine/verl/pull/3495
[model] fix: fix device by @vermouth1992 in https://github.com/volcengine/verl/pull/3500
[training_utils] refactor: extract checkpoint handler into a separate file for reuse by @vermouth1992 in https://github.com/volcengine/verl/pull/3505
[recipe] feat: Add qwen2.5-7b DAPO NPU example script by @FightingZhen in https://github.com/volcengine/verl/pull/3501
[model, ci] feat: add qwen3-8b ppo script on ASCEND NPU by @xvxuopop in https://github.com/volcengine/verl/pull/3502
[data] feat: support customizable loss mask in multi-turn sft dataset by @vermouth1992 in https://github.com/volcengine/verl/pull/3507
[model] fix: refactor qwen2vl patches & support no-image input for fsdp by @hiyouga in https://github.com/volcengine/verl/pull/3496
[megatron] Add TIS support to megatron backend by @sharonyu-115 in https://github.com/volcengine/verl/pull/3513
[model] fix: qwen2vl for transformers 4.52.* by @hiyouga in https://github.com/volcengine/verl/pull/3524
[doc] fix: Update Qwen3-30B-A3B info in ascend_quick_start.rst by @tardis-key in https://github.com/volcengine/verl/pull/3514
[doc] chore: Update ascend quick start document by @FightingZhen in https://github.com/volcengine/verl/pull/3527
[doc] chore: Update owners for ascend_tutorial documents by @FightingZhen in https://github.com/volcengine/verl/pull/3528
[worker] fix: get all multi_modal_inputs keys with in a microbatch by @HollowMan6 in https://github.com/volcengine/verl/pull/3315
[ci] feat: using local dataset to avoid network issue by @vermouth1992 in https://github.com/volcengine/verl/pull/3533
[Megatron] fix: compatible to mcore0.15 by @ISEEKYAN in https://github.com/volcengine/verl/pull/3534
[chore] fix typo by @1195343015 in https://github.com/volcengine/verl/pull/3535
[ci] feat: fix more ci by @vermouth1992 in https://github.com/volcengine/verl/pull/3537
[recipe] fix: Fix main_spin.py bugs by @NuoJohnChen in https://github.com/volcengine/verl/pull/3543
[model] feat: support parameter generator for model engine by @vermouth1992 in https://github.com/volcengine/verl/pull/3529
[megatron] chore: add a docker image for with mcore0.15 and TE2.7 by @ISEEKYAN in https://github.com/volcengine/verl/pull/3540
[ci] feat: update ci by @vermouth1992 in https://github.com/volcengine/verl/pull/3552
[recipe] fix: spin fsdp_workers.py bugs by @NuoJohnChen in https://github.com/volcengine/verl/pull/3544
[recipe] fix: init self.model_config in fsdp worker of one-step-off policy by @zlwang-cs in https://github.com/volcengine/verl/pull/3556
[docker] feat: dockerfile rocm7 initial commit by @vickytsang in https://github.com/volcengine/verl/pull/3547
[trainer,rollout] fix: ensure LoRA weights are loaded when vllm_sleep_level=2 and without using layerd_summon by @piood in https://github.com/volcengine/verl/pull/3541
[ci] fix: fix more ci by pin transformers version by @vermouth1992 in https://github.com/volcengine/verl/pull/3582
[trainer] refactor: move rollout log to inheritable trainer by @ccclyu in https://github.com/volcengine/verl/pull/3576
[CI] chore: Update e2e_ascend CI config by @FightingZhen in https://github.com/volcengine/verl/pull/3532
[misc] feat: remove redundant default params by @techkang in https://github.com/volcengine/verl/pull/3577
[megatron] fix: fix bug when holding empty parameters with custom pipeline layout by @HaochenYuan in https://github.com/volcengine/verl/pull/3565
[ci] fix: fix e2e_sppo ci by @vermouth1992 in https://github.com/volcengine/verl/pull/3587
[sglang] fix: Support SGLang>=0.5.2 by @EduardDurech in https://github.com/volcengine/verl/pull/3526
[megatron] feat: use flash as default attention_backend by @ISEEKYAN in https://github.com/volcengine/verl/pull/3578
[doc] fix: add faq doc to avoid vllm issue 22103 by @chenhaiq in https://github.com/volcengine/verl/pull/3595
[misc] chore: Update CODEOWNERS by @vermouth1992 in https://github.com/volcengine/verl/pull/3594
[megatron] fix: revert megatron actor refactor by @vermouth1992 in https://github.com/volcengine/verl/pull/3553
[misc] feat: prototype deprecate DataProto and replace with Tensordict: part 2 by @houminz in https://github.com/volcengine/verl/pull/3567
[algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup by @EduardDurech in https://github.com/volcengine/verl/pull/3555
[CI] chore: reopen ppo test in e2e_ascend CI by @FightingZhen in https://github.com/volcengine/verl/pull/3588
[trainer] fix: Import flash attn utils for Transformers higher than 4.55.0 by @A1waysBeenHere in https://github.com/volcengine/verl/pull/3596
[recipe] feat: CollabLLM integration for multiturn training by @Wuyxin in https://github.com/volcengine/verl/pull/3574
[doc] feat: add model engine doc by @vermouth1992 in https://github.com/volcengine/verl/pull/3611
[ci] chore: Use local dataset and models in e2e_ascend CI by @FightingZhen in https://github.com/volcengine/verl/pull/3601
[rollout] fix: remove code responsible for tool response duplication by @mgilmore-relace in https://github.com/volcengine/verl/pull/3604
[doc] fix: fix doc by @vermouth1992 in https://github.com/volcengine/verl/pull/3614
[worker] fix: correctly determine is_vlm_model if sp > 1 by @HollowMan6 in https://github.com/volcengine/verl/pull/3282
[rollout, tool] feat: export rollout rewards to total rewards by @Tavish9 in https://github.com/volcengine/verl/pull/3563
[ci] fix: use local models/configs/datasets to increase stability by @vermouth1992 in https://github.com/volcengine/verl/pull/3616
[ci] fix: fix sanity ci by @vermouth1992 in https://github.com/volcengine/verl/pull/3626
[doc] feat: Adding Table-R1 to the Awesome work by @FlowRays in https://github.com/volcengine/verl/pull/3627
[ci] feat: upgrade sglang to 0.5.2 by @wuxibin89 in https://github.com/volcengine/verl/pull/3613
[ci] feat: increase timeout of e2e_sft by @vermouth1992 in https://github.com/volcengine/verl/pull/3630
[tool] feat: support load local datasets when preparing datasets by @ji-huazhong in https://github.com/volcengine/verl/pull/3621
[CI] fix: changed the model used in the PPO test case to Qwen2.5-0.5B to avoid the huggingface download error by @ji-huazhong in https://github.com/volcengine/verl/pull/3631
[recipe] fix: Fix a Typo in One_Step_Off_Policy and Add async of Generative Reward Model in Response Generation by @ZhichaoWang970201 in https://github.com/volcengine/verl/pull/3369
[megatron] feat: add mindspeed engine and support sft by @ji-huazhong in https://github.com/volcengine/verl/pull/3599
[rollout] refactor: Update rollout and reward configs to reuse vllm/sglang replicas by @yyDing1 in https://github.com/volcengine/verl/pull/3625
[trainer] fix: Ref to [#3596]. More import fix for transformers version higher than 4.55.0 by @A1waysBeenHere in https://github.com/volcengine/verl/pull/3608
[2/N][rollout] feat: support vllm/sglang DP+EP in server mode by @wuxibin89 in https://github.com/volcengine/verl/pull/3530
[model] feat: add glm4v by @lambertwjh in https://github.com/volcengine/verl/pull/3291
[algo, perf] feat: Vectorize GRPO Advantage Estimator - 13～26x Speedup by @CedricHwong in https://github.com/volcengine/verl/pull/3635
[megatron, worker] fix: use extract_multi_modal_inputs method for handling multi_modal_inputs by @HollowMan6 in https://github.com/volcengine/verl/pull/3641
[rollout,vllm] fix: Add LoRA Loading to Async vLLM by @kfallah in https://github.com/volcengine/verl/pull/3639
[megatron, training_utils] fix: encoder pp is removed in mcore >= 0.14 by @HollowMan6 in https://github.com/volcengine/verl/pull/3640
[recipe] feat: add multiturn scripts for vllm backend; fix progess bar in dapo by @jiaqiw09 in https://github.com/volcengine/verl/pull/3644
[sglang] feat: adapt for sglang+verl by @lbk-sys in https://github.com/volcengine/verl/pull/3506
[model] fix: stuck issue with mixed text-image data by @HollowMan6 in https://github.com/volcengine/verl/pull/3670
[ci] fix: disable workflows with self-host machines to run on fork by @HollowMan6 in https://github.com/volcengine/verl/pull/3677
[rollout] fix: qwen2_vl position_ids shape mismatch by @m-Just in https://github.com/volcengine/verl/pull/3653
[model] feat: add qwen3vl by @hiyouga in https://github.com/volcengine/verl/pull/3681
[ci] fix: merge pre-commit-full into pre-commit by @HollowMan6 in https://github.com/volcengine/verl/pull/3684
[ci] fix: fix checkpoint converter ci by @vermouth1992 in https://github.com/volcengine/verl/pull/3685
[model] fix: qwen3vl patch by @hiyouga in https://github.com/volcengine/verl/pull/3686
[trainer] feat: Enabled fused adamw by @puneeshkhanna in https://github.com/volcengine/verl/pull/3692
[worker] fix: support for vllm V0 deprecation version by @HollowMan6 in https://github.com/volcengine/verl/pull/3687
[rollout,sglang] fix: get_tool_call_parser_type for gpt-oss models in sglang rollout by @HJSang in https://github.com/volcengine/verl/pull/3661
[rollout] fix: add batch_data_id default value check in AsyncRolloutRequest by @pandengyao in https://github.com/volcengine/verl/pull/3657
[rollout] fix: Add LoRA datatype based on rollout model type to the LoRA config by @mgilmore-relace in https://github.com/volcengine/verl/pull/3675
[rollout] feat: support async mode for multimodal data inference by @xichengpro in https://github.com/volcengine/verl/pull/3702
[worker] refactor: Add kwargs to checkpoint related functions in BaseEngine and its subclasses by @hongpeng-guo in https://github.com/volcengine/verl/pull/3662
[worker] fix: create a new event loop if none exists by @ji-huazhong in https://github.com/volcengine/verl/pull/3703
[recipe] fix: move all collabllm files into recipe directory by @chenhaiq in https://github.com/volcengine/verl/pull/3706
[megatron, model] fix: VLMs using mbridge together with fused kernels by @HollowMan6 in https://github.com/volcengine/verl/pull/3700
[data] fix: merge metrics from all workers in DataProto.concat() by @szrlee in https://github.com/volcengine/verl/pull/3699
[misc] fix: model reassign to inner model in vllm patch file by @ccclyu in https://github.com/volcengine/verl/pull/3668
[misc] fix: Allow HF model ID with use_shm by @EduardDurech in https://github.com/volcengine/verl/pull/3663
[megatron] feat: add ascend megatron merge support by @jiaqiw09 in https://github.com/volcengine/verl/pull/3722
[fsdp] fix: Handle dict type for per_tensor_param in LoRA weight sync by @pourion in https://github.com/volcengine/verl/pull/3712
[rollout] feat: Add gpt-oss tool parser to enable agent loop training for gpt-oss models by @HJSang in https://github.com/volcengine/verl/pull/3705
[rollout] feat: add default agent name for agent loop by @wuxibin89 in https://github.com/volcengine/verl/pull/3716
[rollout] chore: Misc changes for extending internal compatibility by @pengwu22 in https://github.com/volcengine/verl/pull/3701
[misc] feat: support build DataProto from TensordDict by @ji-huazhong in https://github.com/volcengine/verl/pull/3726
[misc] feat: support offline generation with server mode by @vermouth1992 in https://github.com/volcengine/verl/pull/3732
[misc] feat: prototype deprecate DataProto and replace with Tensordict: part 3 by @houminz in https://github.com/volcengine/verl/pull/3600
[ci] feat: increase sft e2e time by @vermouth1992 in https://github.com/volcengine/verl/pull/3738
[model] fix: qwen3vl training stuck with mixed text-image data by @HollowMan6 in https://github.com/volcengine/verl/pull/3734
[model] fix: qwen3vl models shape mismatch error with SP by @HollowMan6 in https://github.com/volcengine/verl/pull/3735
[fsdp,doc] refactor: rename warmup_style@FSDPOptimizerConfig -> lr_scheduler_type by @bxyang in https://github.com/volcengine/verl/pull/3739
[BREAKING][rollout, trainer, algo] feat: comprehensive rollout importance sampling implementation by @szrlee in https://github.com/volcengine/verl/pull/3694
[rollout] refactor: rename "clip" mode back to "mask" mode by @szrlee in https://github.com/volcengine/verl/pull/3750
[sglang, recipe] feat: add SGLang as rollout engine for one-step-off-policy by @KAMiPan in https://github.com/volcengine/verl/pull/3531
Add ARES and Revisual-R1 two awesome multimodal reasoning work using verl. by @CSfufu in https://github.com/volcengine/verl/pull/3755
Add Meta-Bandit-LLM, a long-horizon multiturn interative awesome use case of verl by @sanxing-chen in https://github.com/volcengine/verl/pull/3756
[trainer] feat: set interleave to False in dapo trainer by @jiaqiw09 in https://github.com/volcengine/verl/pull/3760
[megatron] feat: support qwen3vl by @ISEEKYAN in https://github.com/volcengine/verl/pull/3763
[recipe] fix: update readme for gmpo-trainer by @MzeroMiko in https://github.com/volcengine/verl/pull/3764
[misc] feat: bump version to 0.6.0.dev by @vermouth1992 in https://github.com/volcengine/verl/pull/3768