The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-11-14	20.7 kB	0
v0.6.1 source code.tar.gz	2025-11-14	2.1 MB	1
v0.6.1 source code.zip	2025-11-14	3.1 MB	1
Totals: 3 Items		5.2 MB	2

Highlights

Trainer

support fp16 training (FSDP/Megatron)

Megatron

support 1f1b_overlap/moe_a2a_overlap
support for Qwen3VL MoE/dense models
support Qwen2.5/3vl with context parallel

Rollout

Use vllm and sglang release image as ci base image, upgrade vllm==0.11.0, upgrade sglang==0.5.5
Prometheus monitoring

Algorithm

Rollout Correction: comprehensive overhaul of the rollout correction system with typed configuration, mathematical documentation, and performance optimizations.

Recipe

Introduce two new experimental recipes, which will be gradually merge to main in future release.

Fully Async Policy Trainer: fully asynchronous PPO training system that completely decouples the Trainer and Rollouter, supporting asynchronous sample generation and training.
TransferQueue Data System: an asynchronous streaming data management system for efficient post-training.
FlowRL

Importance bug fixes

https://github.com/volcengine/verl/pull/3861: fix missing offload parameter and optimizer to cpu when no checkpoint
https://github.com/volcengine/verl/pull/4097: fix missing finalize_model_grads_func in megatron model engine

What's Changed

[misc] feat: bump version to 0.7.0.dev by @vermouth1992 in https://github.com/volcengine/verl/pull/3772
[recipe] feat: Add example for gpt-oss training using agent loop by @HJSang in https://github.com/volcengine/verl/pull/3774
[docker] feat: update Dockerfile.rocm7 by @vickytsang in https://github.com/volcengine/verl/pull/3781
[doc] fix: actor_rollout_ref.critic is not correct by @HollowMan6 in https://github.com/volcengine/verl/pull/3778
[misc] fix: sft SFT E2E CI test failure due to megatron engine by @houminz in https://github.com/volcengine/verl/pull/3786
[recipe] fix: fix the gpt-oss-20b training script for agent loop recipe by @HJSang in https://github.com/volcengine/verl/pull/3793
[doc] chore: add agent loop get started tutorial by @wuxibin89 in https://github.com/volcengine/verl/pull/3790
[vllm] fix: catch exception of vllm async engine by @Yangruipis in https://github.com/volcengine/verl/pull/3789
[trainer] fix: batch size mismatch with n>1 when gen_max for ReMax by @HollowMan6 in https://github.com/volcengine/verl/pull/3779
[trainer] feat: ReMax support using reward model for baseline by @HollowMan6 in https://github.com/volcengine/verl/pull/3780
[megatron] feat: script of qwen3vl 235b by @ISEEKYAN in https://github.com/volcengine/verl/pull/3799
[trainer, recipe] feat: fully async training recipe by @ArronHZG in https://github.com/volcengine/verl/pull/2981
[doc] feat: update fully async experiment message by @ArronHZG in https://github.com/volcengine/verl/pull/3804
[worker] fix: create a new event loop if none exists when building rollouts by @ChangyWen in https://github.com/volcengine/verl/pull/3803
[trainer] fix: address serialization issues when using async reward function and ray ppo trainer by @benprofessionaledition in https://github.com/volcengine/verl/pull/3769
[megatron] fix: fix logits process error when disable pack_seqs by @HaochenYuan in https://github.com/volcengine/verl/pull/3777
[misc] fix: Sanitize MLFlow metric names by @pratik9891 in https://github.com/volcengine/verl/pull/3736
[ci] fix: Install mlflow dependency by @HollowMan6 in https://github.com/volcengine/verl/pull/3817
[rollout, vllm] fix: make LoRA with async vLLM work properly by @listar2000 in https://github.com/volcengine/verl/pull/3821
Revert "[worker] fix: create a new event loop if none exists when building rollouts" by @vermouth1992 in https://github.com/volcengine/verl/pull/3820
[trainer] fix: Add data.seed to config by @HollowMan6 in https://github.com/volcengine/verl/pull/3815
[doc] fix: update install instruction and retool readme by @chenhaiq in https://github.com/volcengine/verl/pull/3824
[algo] fix: remove torch.quantile-based percentile metrics to resolve tensor size limit error by @szrlee in https://github.com/volcengine/verl/pull/3810
[data] feat: filter out malformed data together with long prompts by @HollowMan6 in https://github.com/volcengine/verl/pull/3814
[worker] fix: to create a new event loop if none exists when building rollouts (a safer fix) by @ChangyWen in https://github.com/volcengine/verl/pull/3828
[data, trainer] feat: add support for limiting samples from dataset by @HollowMan6 in https://github.com/volcengine/verl/pull/3812
[model, megatron] feat: Support for Qwen3VL dense models by @HollowMan6 in https://github.com/volcengine/verl/pull/3838
[recipe] fix: Update the grpo training script for gpt-oss models by @HJSang in https://github.com/volcengine/verl/pull/3836
[recipe, rollout] feat: enable gpt-oss training for tool agent add gpt-oss for retool recipe by @HJSang in https://github.com/volcengine/verl/pull/3837
[data] feat: TransferQueue - An asynchronous streaming data management system by @0oshowero0 in https://github.com/volcengine/verl/pull/3649
[trainer, worker] feat: more flexible and easy-to-use reward model by @yyDing1 in https://github.com/volcengine/verl/pull/3679
[doc] fix: fix async policy message by @ArronHZG in https://github.com/volcengine/verl/pull/3845
[worker] fix: create a new event loop if none exists by @baymax591 in https://github.com/volcengine/verl/pull/3839
[misc] feat: add megatron script for open math reasoning by @vermouth1992 in https://github.com/volcengine/verl/pull/3844
[rollout, vllm] fix: name change for compilation level by @HollowMan6 in https://github.com/volcengine/verl/pull/3848
[trainer] fix: missing offload parameter and optimizer to cpu when no checkpoint by @wuxibin89 in https://github.com/volcengine/verl/pull/3861
[sglang] fix: make sglang wake_up/sleep work in colocate mode by @yyDing1 in https://github.com/volcengine/verl/pull/3860
[doc] feat: add doc for reward loop by @yyDing1 in https://github.com/volcengine/verl/pull/3851
[doc] misc: fix doc that penalty starts when exceeds the max_response_length - overlong_buffer.len by @bzantium in https://github.com/volcengine/verl/pull/3856
[recipe]fix: bugfix of Qwen3 8b/14b DAPO npu script by @acat-rw in https://github.com/volcengine/verl/pull/3858
[BREAKING][misc] feat: Abstract optimizer by @EduardDurech in https://github.com/volcengine/verl/pull/3656
[ci] feat: migrate gpu_unit_tests to volcengine by @vermouth1992 in https://github.com/volcengine/verl/pull/3872
[rollout] fix: Fix gpt-oss training in tool agent by @HJSang in https://github.com/volcengine/verl/pull/3865
[fsdp] fix : fix moe model run on full-async error by @chenjiaoAngel in https://github.com/volcengine/verl/pull/3874
[doc] feat: update doc of reward loop by @yyDing1 in https://github.com/volcengine/verl/pull/3880
[perf, data] feat: DP workload balance by @conver334 in https://github.com/volcengine/verl/pull/3605
[ci] fix: gsm8k interaction unit test by @wuxibin89 in https://github.com/volcengine/verl/pull/3888
[model] chore: deprecated legacy code for GRM by @yyDing1 in https://github.com/volcengine/verl/pull/3885
[recipe] fix: Qwen3-vl moe model patch by @leisuzz in https://github.com/volcengine/verl/pull/3878
Add PokeeResearch to README resources by @BillMatrix in https://github.com/volcengine/verl/pull/3892
[misc] feat: read environment for WandB entity (team) name by @BaiqingL in https://github.com/volcengine/verl/pull/3889
[tool] fix: remove duplicate tool initialization by @Tree-Shu-Zhao in https://github.com/volcengine/verl/pull/3893
[rollout] fix: incorrect value assignment while trying to access call_tool_result by @BaiqingL in https://github.com/volcengine/verl/pull/3891
[megatron] fix: VLMs using fused kernels by @HollowMan6 in https://github.com/volcengine/verl/pull/3849
[megatron] fix: mbridge load optimizer dist_ckpt by @ccilery in https://github.com/volcengine/verl/pull/3850
[misc] feat: fix ci break by @wuxibin89 in https://github.com/volcengine/verl/pull/3898
[doc, recipe] feat: update doc of rewardloop and add runnable scripts of fapo by @yyDing1 in https://github.com/volcengine/verl/pull/3900
[doc] chore: update installation scripts to use newer versions by @HollowMan6 in https://github.com/volcengine/verl/pull/3901
[recipe] fix: fix bug of tranfer queue runtime env by @baymax591 in https://github.com/volcengine/verl/pull/3904
[doc] fix: formatting issue for kl_ctrl and fused_kernel_options configs by @HollowMan6 in https://github.com/volcengine/verl/pull/3917
[recipe] fix: DAPO using KL in reward by @HollowMan6 in https://github.com/volcengine/verl/pull/3916
[recipe] fix: DAPO add trust_remote_code parameter to tokenizer and processor by @quancs in https://github.com/volcengine/verl/pull/3913
[recipe] fix: Update README with training and backend instructions by @vermouth1992 in https://github.com/volcengine/verl/pull/3929
[recipe] chore: use verl.utils.metric to import reduce_metrics by @HollowMan6 in https://github.com/volcengine/verl/pull/3927
[algo] refactor: Rollout Importance Sampling - Separate IS Weights from Rejection Sampling by @szrlee in https://github.com/volcengine/verl/pull/3915
[trainer, worker] feat: support loading LoRA adapters by @piood in https://github.com/volcengine/verl/pull/3523
[rollout, sglang] fix: correct input length check in sglang_rollout by @triston-lee in https://github.com/volcengine/verl/pull/3935
[rollout, vllm] fix: handle lora request when base_sync_done is false initially by @listar2000 in https://github.com/volcengine/verl/pull/3907
[rollout] fix: Add "non_block" argument compatibility to collective_rpc() by @kevssim in https://github.com/volcengine/verl/pull/3934
[megatron] chore: update mcore docs by @ISEEKYAN in https://github.com/volcengine/verl/pull/3940
[docker] feat: add Ascend dockerfile and image build pipeline by @songyy29 in https://github.com/volcengine/verl/pull/3485
[rollout] fix: Pass tool related extra fields in reward loop by @huaiyizhao in https://github.com/volcengine/verl/pull/3941
[docker] fix: Workaround mount-type=bind issue from scratch in some environments. by @vickytsang in https://github.com/volcengine/verl/pull/3944
[megatron] feat: support async training with megatron and VLM by @ISEEKYAN in https://github.com/volcengine/verl/pull/3846
[data] fix: Pass video metadata to vLLM and support change image_patch_size by @kaln27 in https://github.com/volcengine/verl/pull/3928
[ci] fix: disable docker-build-ascend from running on fork by @HollowMan6 in https://github.com/volcengine/verl/pull/3959
[recipe] chore: entropy removes WANDB_API_KEY in code by @HollowMan6 in https://github.com/volcengine/verl/pull/3956
[Megatron] feat: 1f1b overlap/moe_a2a_overlap by @ISEEKYAN in https://github.com/volcengine/verl/pull/3522
[ci] chore: Rename dockerfile and update e2e_ascend by @FightingZhen in https://github.com/volcengine/verl/pull/3966
[trainer, recipe] feat: Fully Async Policy add Rollout Importance Sampling by @ArronHZG in https://github.com/volcengine/verl/pull/3955
[ci, recipe] fix: add the missing key compute_prox_log_prob to fully_async_ppo_megatron_trainer.yaml by @ji-huazhong in https://github.com/volcengine/verl/pull/3979
[misc] fix: add compileall pre-commit hook checks and improve code quality by @HollowMan6 in https://github.com/volcengine/verl/pull/3946
[doc] feat: add a doc for vllm+megatron training by @techkang in https://github.com/volcengine/verl/pull/3974
[data] feat: passing tool_config to data by @huaiyizhao in https://github.com/volcengine/verl/pull/3950
[worker] fix: Add attn_implementation override support in FSDP workers by @arde171 in https://github.com/volcengine/verl/pull/3978
[trainer] fix: normalize sft loss by num_tokens in global batch by @wuxibin89 in https://github.com/volcengine/verl/pull/3994
[sglang,rollout] fix: sglang port race condition by @theely in https://github.com/volcengine/verl/pull/3977
[BREAKING][megatron] feat: support qwen2.5/3vl with context parallel by @ISEEKYAN in https://github.com/volcengine/verl/pull/3998
[ci] feat: Add weekly scheduled validation workflow for Ascend docker image by @songyy29 in https://github.com/volcengine/verl/pull/3997
[recipe] fix: message extension logic in tool_agent_loop.py by @NIL-zhuang in https://github.com/volcengine/verl/pull/3991
[tool] fix: Errors when merge Qwen3-VL-2B (FSDP) by @ieellee in https://github.com/volcengine/verl/pull/3971
[trainer] fix: Handle None sandbox_config in load_reward_manager by @CzsGit in https://github.com/volcengine/verl/pull/4008
[ci] fix: update config in docker_validate_ascend.yml by @FightingZhen in https://github.com/volcengine/verl/pull/4009
[sglang] fix: relocate sglang cache free logic to avoid GPU OOM by @dongju-2 in https://github.com/volcengine/verl/pull/4005
[docker] feat: support Ascend A3 docker image build pipeline, update related documents by @FightingZhen in https://github.com/volcengine/verl/pull/3970
[megatron] fix: pass video data to megatron backend by @ccilery in https://github.com/volcengine/verl/pull/4016
[env] feat: update docker file building schema, from VLLM base images by @ISEEKYAN in https://github.com/volcengine/verl/pull/3937
[rollout, vllm] fix: Fixed the issue of rollout causing OOM in ep > 1 by @echo-rain in https://github.com/volcengine/verl/pull/4007
[recipe] feat: add FlowRL recipe by @Xuekai-Zhu in https://github.com/volcengine/verl/pull/3924
[BREAKING][algo] feat: Rollout Correction for General Off-Policy Problems by @szrlee in https://github.com/volcengine/verl/pull/3984
[doc] feat: render LaTeX in md docs by @tongyx361 in https://github.com/volcengine/verl/pull/4061
[ci] fix: Remove extra pip config in Ascend dockerfile by @FightingZhen in https://github.com/volcengine/verl/pull/4059
[ci, docker] chore: Update Ascend dockerfile and docs by @FightingZhen in https://github.com/volcengine/verl/pull/4064
[algo] feat: return loss and metrics from policy_loss_fn by @tongyx361 in https://github.com/volcengine/verl/pull/4062
[trainer] fix: prevent ReactAgentLoop infinite recursion by @CzsGit in https://github.com/volcengine/verl/pull/4051
[rollout] fix: Agentloop agent.image_data bug [#4050] by @DBMing in https://github.com/volcengine/verl/pull/4052
[rollout] fix: resolve agent loop config path in multi-node Ray training by @CzsGit in https://github.com/volcengine/verl/pull/4029
Clear up the deepeyes README by @willem-bd in https://github.com/volcengine/verl/pull/4076
[rollout,vllm] fix: custom model config pickle error when trust_remote_code=True by @wuxibin89 in https://github.com/volcengine/verl/pull/4079
[trainer, recipe] feat: Fully Async Policy add Rollout Importance Sampling with Megatron by @lalala-2 in https://github.com/volcengine/verl/pull/4023
[megatron] fix: engine alignment by @ISEEKYAN in https://github.com/volcengine/verl/pull/4097
[trainer, megatron, tool] fix: megatron not support memory profiler by @maijia-cwh in https://github.com/volcengine/verl/pull/4031
[ci,sglang] feat: update docker file building schema, from sglang base images by @ISEEKYAN in https://github.com/volcengine/verl/pull/4037
[recipe] fix: dynamic recursion_limit and error handling in ReactAgentLoop by @le-czs in https://github.com/volcengine/verl/pull/4102
[doc,algo] feat: Rollout Correction - Fix Metrics, Add Documentation, and Add Batch Normalization by @szrlee in https://github.com/volcengine/verl/pull/4070
[doc] chore: update instructions for enabling AMD MI3xx sleep mode by @HollowMan6 in https://github.com/volcengine/verl/pull/4108
[fsdp] feat: add NPU fusion kernels for Qwen2 and Qwen2.5 dense model. by @ZLiao097 in https://github.com/volcengine/verl/pull/3923
[doc] fix: improve docs clarity, fix IS gradient flow, and optimize memory by @szrlee in https://github.com/volcengine/verl/pull/4105
[megatron] fix: expose moe_aux_loss_coeff and moe_z_loss_coeff to improve MoE load balancing by @Kairosxy in https://github.com/volcengine/verl/pull/4103
[megatron] feat: fp16 training (dense model only) by @ISEEKYAN in https://github.com/volcengine/verl/pull/4086
[ci] feat: Enable python cache in ascend docker build workflow by @FightingZhen in https://github.com/volcengine/verl/pull/4113
[ci] chore: update pip before using it in ascend dockerfile by @FightingZhen in https://github.com/volcengine/verl/pull/4114
[sglang,vllm] feat: use prometheus and grafana to show rollout message by @ArronHZG in https://github.com/volcengine/verl/pull/4088
[worker, trainer, recipe] feat: add FP16 training and inference support by @Xuekai-Zhu in https://github.com/volcengine/verl/pull/4036
[megatron] fix: compatible with older megatron version for NPU CI by @ISEEKYAN in https://github.com/volcengine/verl/pull/4116
[ci, training_utils] fix: get_nccl_backend default to nccl by @HollowMan6 in https://github.com/volcengine/verl/pull/4117
[doc] fix: Propose fix a couple of typos by @jeis4wpi in https://github.com/volcengine/verl/pull/4118
[BREAKING][megatron] chore: set ETP default to null by @HollowMan6 in https://github.com/volcengine/verl/pull/4119
[misc] chore: bump version to 0.6.1 by @wuxibin89 in https://github.com/volcengine/verl/pull/4122