| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2026-01-18 | 22.1 kB | |
| v0.2.2 source code.tar.gz | 2026-01-18 | 2.9 MB | |
| v0.2.2 source code.zip | 2026-01-18 | 3.2 MB | |
| Totals: 3 Items | 6.1 MB | 1 | |
v0.2.2 is here! Thanks to everyone who contributed to this release.
Major Updates
In addition to multiple memory and performance improvements, v0.2.2 adds support for:
- Int4-QAT training
- Full R3 (Rollout Routing Replay) support with DeepEP and MTP
- Dependency upgrades: SGLang v0.5.7 and the Megatron dev branch
What's Changed
- add ckpt load save ci by @lilei199908 in https://github.com/THUDM/slime/pull/1104
- Add --rollout-all-samples-process-path for RLVE by @zhuzilin in https://github.com/THUDM/slime/pull/1107
- feat: support Qwen3 Moe BackEnd Kernel by @attack204 in https://github.com/THUDM/slime/pull/1071
- fix max response/context/prompt len by @lilei199908 in https://github.com/THUDM/slime/pull/1110
- fix max len by @lilei199908 in https://github.com/THUDM/slime/pull/1112
- [docker] remove amem and support deepep + r3 by @zhuzilin in https://github.com/THUDM/slime/pull/1115
- [Fix] Fix early return in init rollout engine by @yitianlian in https://github.com/THUDM/slime/pull/1118
- [Fix] Add sglang patch for weight version update by @yitianlian in https://github.com/THUDM/slime/pull/1119
- fix: improve tokenization by @nanjiangwill in https://github.com/THUDM/slime/pull/1113
- [Feature] Add CI test for weight version update by @yitianlian in https://github.com/THUDM/slime/pull/1120
- [docker] optimize r3 with base64 encode by @zhuzilin in https://github.com/THUDM/slime/pull/1124
- [docker] fix r3 gather buffer by @zhuzilin in https://github.com/THUDM/slime/pull/1129
- [docker] support mtp for r3 by @zhuzilin in https://github.com/THUDM/slime/pull/1131
- [Fix] Fix some bugs in retool example by @yitianlian in https://github.com/THUDM/slime/pull/1130
- Add finalize_model_grads_with_empty_cache by @zhuzilin in https://github.com/THUDM/slime/pull/1133
- Feat: add usage docs for fsdp by @lin0303-siyuan in https://github.com/THUDM/slime/pull/1092
- Reserve more ports for new sglang dp attn impl by @zhuzilin in https://github.com/THUDM/slime/pull/1142
- Blog: fix the path of the Blog's architecture image by @ShanningZhuang in https://github.com/THUDM/slime/pull/1125
- Support async save and add extra save at the end of the training by @zhuzilin in https://github.com/THUDM/slime/pull/1143
- fix: fix GemmeRMSNorm.forward() bug by @nanjiangwill in https://github.com/THUDM/slime/pull/1121
- [WIP][FSDP] Support FSDP for Qwen3Next by @rucnyz in https://github.com/THUDM/slime/pull/1116
- Megatron VLM Support (1/N) by @Zhuohao-Li in https://github.com/THUDM/slime/pull/1123
- Update deprecated huggingface-cli and fix broken links by @Lyken17 in https://github.com/THUDM/slime/pull/1147
- Added FSDP checkpoint handling to convert_torch_dist_to_hf.py by @cklxx in https://github.com/THUDM/slime/pull/1101
- minor fix for megatron compatibility by @zhuzilin in https://github.com/THUDM/slime/pull/1149
- Remove config_mapping to use megatron-bridge by @zhuzilin in https://github.com/THUDM/slime/pull/1166
- Avoids repeated work. by @qqwqqw689 in https://github.com/THUDM/slime/pull/1163
- Make tools/convert_torch_dist_to_hf.py not rely on megatron by @zhuzilin in https://github.com/THUDM/slime/pull/1167
- support converting dpsk mtp layer by @zhuzilin in https://github.com/THUDM/slime/pull/1169
- [FSDP] Add Masked importance sampling by @zijiexia in https://github.com/THUDM/slime/pull/1122
- [TIS/MIS] fix and add better metric by @ChangyiYang in https://github.com/THUDM/slime/pull/1174
- Fix optimizer schedule resume by @lr-tsinghua11 in https://github.com/THUDM/slime/pull/1152
- [docker] upgrade to megatron dev branch by @zhuzilin in https://github.com/THUDM/slime/pull/1153
- Minor fix by @lancerts in https://github.com/THUDM/slime/pull/1165
- Fix forward of Qwen3VLTextRotaryEmbedding in Megatron-Bridge by @zhuzilin in https://github.com/THUDM/slime/pull/1179
- Reuse the text llm config for qwen3 vl models by @zhuzilin in https://github.com/THUDM/slime/pull/1180
- Don't save AutoBridge in args by @zhuzilin in https://github.com/THUDM/slime/pull/1181
- [Fix] Fix port error in PD disaggregation setting by @yitianlian in https://github.com/THUDM/slime/pull/1175
- Fix prompt type bug in generate_with_search within examples/search-r1 by @jiahe7ay in https://github.com/THUDM/slime/pull/1182
- feat: support Qwen3 VL MoE by @nanjiangwill in https://github.com/THUDM/slime/pull/1171
- [Fix] Minor fix by @yitianlian in https://github.com/THUDM/slime/pull/1183
- Set parallel config for megatron bridge by @zhuzilin in https://github.com/THUDM/slime/pull/1184
- Fix tools/convert_hf_to_torch_dist.py by @zhuzilin in https://github.com/THUDM/slime/pull/1186
- Don't calculate entropy grad when coef is 0 by @zhuzilin in https://github.com/THUDM/slime/pull/1185
- Disable routing replay for critic by @zhuzilin in https://github.com/THUDM/slime/pull/1187
- Revert "Don't calculate entropy grad when coef is 0" by @zhuzilin in https://github.com/THUDM/slime/pull/1189
- Fix qwen3next for megatron dev branch by @zhuzilin in https://github.com/THUDM/slime/pull/1190
- fix: fix logging for rollout by @nanjiangwill in https://github.com/THUDM/slime/pull/1188
- sync internal features by @zhuzilin in https://github.com/THUDM/slime/pull/1192
- Fix check_weights api by @zhuzilin in https://github.com/THUDM/slime/pull/1194
- Add --custom-rollout-log-function-path and --custom-eval-rollout-log-function-path by @zhuzilin in https://github.com/THUDM/slime/pull/1196
- [Feature] Add more logging for health monitor by @yitianlian in https://github.com/THUDM/slime/pull/1195
- fix: SFT tools support by @maoquan-ms in https://github.com/THUDM/slime/pull/1198
- [Featuren] Change default value of rollout health check by @yitianlian in https://github.com/THUDM/slime/pull/1197
- Megatron VLM Support w/ SFT (2/N) by @Zhuohao-Li in https://github.com/THUDM/slime/pull/1150
- tiny fix for sft script after tokenizer improvement by @Zhuohao-Li in https://github.com/THUDM/slime/pull/1201
- tests: add test for multi turn loss mask by @maoquan-ms in https://github.com/THUDM/slime/pull/1204
- Always pass loss masks to model by @zhuzilin in https://github.com/THUDM/slime/pull/1205
- [on-policy distillation] update reward function to fix potential token mismatches by @ahxt in https://github.com/THUDM/slime/pull/1128
- Add ci for mtp by @zhuzilin in https://github.com/THUDM/slime/pull/1207
- Fix mla tflops by @lilei199908 in https://github.com/THUDM/slime/pull/1209
- update docs by @zhuzilin in https://github.com/THUDM/slime/pull/1211
- update docs by @zhuzilin in https://github.com/THUDM/slime/pull/1214
- [Feature] Support 0.3.0 sglang router for fault tolerance by @yitianlian in https://github.com/THUDM/slime/pull/1215
- sync internal features by @zhuzilin in https://github.com/THUDM/slime/pull/1216
- feat: add custom logic for processing list[list[Sample]] to training data by @nanjiangwill in https://github.com/THUDM/slime/pull/1218
- add int4_quant cuda kernel by @Hyaloid in https://github.com/THUDM/slime/pull/1220
- update doc by @zhuzilin in https://github.com/THUDM/slime/pull/1224
- Improve AMD tutorial with complete model/data setup workflow by @Vivicai1005 in https://github.com/THUDM/slime/pull/1212
- update megatron patch by @zhuzilin in https://github.com/THUDM/slime/pull/1228
- sync from internal by @zhuzilin in https://github.com/THUDM/slime/pull/1229
- fix model saving bug in megatron by @zhuzilin in https://github.com/THUDM/slime/pull/1230
- add new status by @nanjiangwill in https://github.com/THUDM/slime/pull/1219
- update customization docs by @nanjiangwill in https://github.com/THUDM/slime/pull/1233
- Revert data processing of VLM by @zhuzilin in https://github.com/THUDM/slime/pull/1232
- [VLM] optimize VLM processing by @nanjiangwill in https://github.com/THUDM/slime/pull/1234
- feat: add custom pg_loss reducer by @ChangyiYang in https://github.com/THUDM/slime/pull/1235
- fix: typo "sgalng" → "sglang" in ROCm Dockerfiles by @yurekami in https://github.com/THUDM/slime/pull/1282
- sync bugfix from internal by @zhuzilin in https://github.com/THUDM/slime/pull/1284
- sync internal bugfix by @zhuzilin in https://github.com/THUDM/slime/pull/1286
- add bshd support by @yueming-yuan in https://github.com/THUDM/slime/pull/1285
- [docker] fix bugs on pd disaggregation and add --disable-draft-cuda-graph by @zhuzilin in https://github.com/THUDM/slime/pull/1288
- Add longest_effective_sample_tokens_per_sec metric by @zhuzilin in https://github.com/THUDM/slime/pull/1291
- [fix] conditionally pass kwargs for megatron-bridge VLM by @yueming-yuan in https://github.com/THUDM/slime/pull/1290
- [VLM] Bugfix: image_patch_size for vision preprocessing by @coding-famer in https://github.com/THUDM/slime/pull/1227
- feat: add --custom-model-provider-path argument by @yurekami in https://github.com/THUDM/slime/pull/1239
- [Feature/Fix] Support IPv6 host resolution and robust URI formatting by @Chen-GX in https://github.com/THUDM/slime/pull/859
- Fix missing trust_remote_code in HfWeightIteratorBridge by @SwordFaith in https://github.com/THUDM/slime/pull/1287
- fix: remove invalid None default and fix misleading underscore variable naming by @lancerts in https://github.com/THUDM/slime/pull/1283
- fix: remove duplicate Megatron-LM installation in build_conda.sh by @yurekami in https://github.com/THUDM/slime/pull/1238
- fix dev megatron ckpt save bugs by @lilei199908 in https://github.com/THUDM/slime/pull/1294
- [Fix] fix image_patch_size in processing_utils by @coding-famer in https://github.com/THUDM/slime/pull/1295
- support hicache for pd disaggregation by @zhuzilin in https://github.com/THUDM/slime/pull/1296
- Optimize data.py for efficient data loading by @ppraneth in https://github.com/THUDM/slime/pull/696
- Auto Sync Code by @miles-code-angel in https://github.com/THUDM/slime/pull/1303
- [VLM] end2end geo3k multi-turn RL of VLM Recipe by @gxlvera in https://github.com/THUDM/slime/pull/1141
- [docker] Fix sglang ima on mtp + pd disaggregation by @zhuzilin in https://github.com/THUDM/slime/pull/1313
- fix: fix processing logic by @nanjiangwill in https://github.com/THUDM/slime/pull/1292
- [BugFix] Delete apply chat template for SFT by @PopSoda2002 in https://github.com/THUDM/slime/pull/1307
- Remove token retrieval test from main. by @qqwqqw689 in https://github.com/THUDM/slime/pull/1243
- update quick start doc by @zijiexia in https://github.com/THUDM/slime/pull/1193
- fix: replace blocking sleep with async sleep and fix file handle leak by @lancerts in https://github.com/THUDM/slime/pull/1200
- Set base_gpu_id for sglang from placement groups by @vpj in https://github.com/THUDM/slime/pull/1306
- [FSDP] Move gptoss scripts by @PopSoda2002 in https://github.com/THUDM/slime/pull/1317
- [refactor] Make sglang_rollout.py shorter and add prefix cached info by @zhuzilin in https://github.com/THUDM/slime/pull/1318
- set spec args for mtp ci by @zhuzilin in https://github.com/THUDM/slime/pull/1322
- Add non_generation_time stat in sample by @zhuzilin in https://github.com/THUDM/slime/pull/1323
- ad slime test images ci by @lilei199908 in https://github.com/THUDM/slime/pull/1325
- update sglang to lmsysorg/sglang:nightly-dev-20260103-24c91001 by @lilei199908 in https://github.com/THUDM/slime/pull/1324
- update sglang to lmsysorg/sglang:nightly-dev-20260103-24c91001 by @lilei199908 in https://github.com/THUDM/slime/pull/1331
- code sync by @miles-code-angel in https://github.com/THUDM/slime/pull/1329
- perf: replace quadratic list flattening with linear chaining in rollout manager by @ppraneth in https://github.com/THUDM/slime/pull/1319
- Implement local GPU ID remapping based on CUDA_VISIBLE_DEVICES for SGLang Engine by @zijiexia in https://github.com/THUDM/slime/pull/1327
- Fix: Remove --apply-chat-template from Qwen3-235B SFT script by @kaysonyu in https://github.com/THUDM/slime/pull/1315
- [Megatron Bridge] Support save hf format model by @coding-famer in https://github.com/THUDM/slime/pull/1289
- update default paths and disable offloading for AMD qwen3-4B training by @Vivicai1005 in https://github.com/THUDM/slime/pull/1225
- [internal sync] reset optimizer state and dynamic global batch size by @zhuzilin in https://github.com/THUDM/slime/pull/1330
- [refactor] minor code refactor for save_hf by @zhuzilin in https://github.com/THUDM/slime/pull/1334
- optimize long prompt filter by @zhuzilin in https://github.com/THUDM/slime/pull/1335
- remove deprecated interface by @zhuzilin in https://github.com/THUDM/slime/pull/1336
- Revert "remove deprecated interface" by @zhuzilin in https://github.com/THUDM/slime/pull/1337
- code cleanup by @zhuzilin in https://github.com/THUDM/slime/pull/1338
- save sglang v0.5.7 patch by @zhuzilin in https://github.com/THUDM/slime/pull/1339
- [Feature] Add CI for fault tolerance by @yitianlian in https://github.com/THUDM/slime/pull/1222
- Patch validate_non_overlapping_shards_metadata to speed up ckpt loading by @zhuzilin in https://github.com/THUDM/slime/pull/1342
- [Feature] Reorganize CI by @yitianlian in https://github.com/THUDM/slime/pull/1343
- [Feature] Option not to save optimizer states to save disk space by @yzlnew in https://github.com/THUDM/slime/pull/1333
- Add clear_num_new_engines and some code cleanup by @zhuzilin in https://github.com/THUDM/slime/pull/1349
- fix get_response_lengths by @zhuzilin in https://github.com/THUDM/slime/pull/1350
- bugfix by @zhuzilin in https://github.com/THUDM/slime/pull/1351
- default setting --tool-keys to tools by @UbeCc in https://github.com/THUDM/slime/pull/1352
- code sync by @miles-code-angel in https://github.com/THUDM/slime/pull/1356
- Revert "code sync" by @zhaochenyang20 in https://github.com/THUDM/slime/pull/1357
- update code by @miles-code-angel in https://github.com/THUDM/slime/pull/1358
- [sync] sync internal bugfixes by @zhuzilin in https://github.com/THUDM/slime/pull/1371
- feat: add int4 reinforcement learning training support (Part1) by @GeLee-Q in https://github.com/THUDM/slime/pull/1362
- [docker] Fix mtp r3 and add tilelang by @zhuzilin in https://github.com/THUDM/slime/pull/1380
- [docker] Comment out 'quant weights to fp8 ue8m0' by @zhuzilin in https://github.com/THUDM/slime/pull/1381
- geo3k VLM multi-turn megatron update by @gxlvera in https://github.com/THUDM/slime/pull/1378
- [Doc] Add docs for R2/R3 by @Hecate0821 in https://github.com/THUDM/slime/pull/1382
- [ci] borrow bot-slash-lint.yaml from miles by @zhuzilin in https://github.com/THUDM/slime/pull/1384
- feat: add int4 reinforcement learning training support (Part2) by @fy1214 in https://github.com/THUDM/slime/pull/1172
- feat: add int4 reinforcement learning training support (Part3) by @Gao016 in https://github.com/THUDM/slime/pull/1368
- fix lint by @zhuzilin in https://github.com/THUDM/slime/pull/1385
- VLM Multi-turn, add Megatron in README by @gxlvera in https://github.com/THUDM/slime/pull/1387
- Fix grammar and formatting in README.md by @zhaochenyang20 in https://github.com/THUDM/slime/pull/1388
- [refactor] refactor int4 qat code by @zhuzilin in https://github.com/THUDM/slime/pull/1390
- [1/X] Refactor: unify training backends by general utils, tested Megatron & FSDP alignment by @yueming-yuan in https://github.com/THUDM/slime/pull/1373
- bugfix by @zhuzilin in https://github.com/THUDM/slime/pull/1394
- bugfix by @zhuzilin in https://github.com/THUDM/slime/pull/1395
- fix ppo utils & mis by @yueming-yuan in https://github.com/THUDM/slime/pull/1396
- feat(examples): add strands-sglang integration for agentic RL with TITO support by @Lawhy in https://github.com/THUDM/slime/pull/1359
- remove swe-agent example by @zhuzilin in https://github.com/THUDM/slime/pull/1397
- fix to suppot dpsk-v3.2 bf16 weight convert to fp8 by @Gao016 in https://github.com/THUDM/slime/pull/1392
- Only rank0 should call post_process_weights by @zhuzilin in https://github.com/THUDM/slime/pull/1398
- bugfix by @zhuzilin in https://github.com/THUDM/slime/pull/1400
- [docker] support mtp in dpsk v3.2 by @zhuzilin in https://github.com/THUDM/slime/pull/1401
- bug gix by @lilei199908 in https://github.com/THUDM/slime/pull/1403
- [style] minor: remove subclass by @yueming-yuan in https://github.com/THUDM/slime/pull/1402
- [doc] suggest pip install -e . --no-deps by @zhuzilin in https://github.com/THUDM/slime/pull/1405
- [docs] add note for cudnn by @zhuzilin in https://github.com/THUDM/slime/pull/1406
- [minor] Delete unused util file by @yueming-yuan in https://github.com/THUDM/slime/pull/1408
- [docker] ignore slime in sglang SafeUnpickler by @zhuzilin in https://github.com/THUDM/slime/pull/1409
- add r3 ci by @lilei199908 in https://github.com/THUDM/slime/pull/1407
- Fix rollout-all-samples by @fzyzcjy in https://github.com/THUDM/slime/pull/1410
- [docs] fix ai generate response by @zijiexia in https://github.com/THUDM/slime/pull/1412
- bugfix on UpdateWeightFromDistributed by @zhuzilin in https://github.com/THUDM/slime/pull/1420
- [docker] add tunable indexer is_neox_style by @zhuzilin in https://github.com/THUDM/slime/pull/1421
- [docker] remove rm /root/.tmux.conf by @zhuzilin in https://github.com/THUDM/slime/pull/1422
- [sync] sync internal feature and bugfix by @zhuzilin in https://github.com/THUDM/slime/pull/1423
- [docker] update stable patches by @zhuzilin in https://github.com/THUDM/slime/pull/1424
- skip logits.div when temp is 1.0 by @zhuzilin in https://github.com/THUDM/slime/pull/1428
- [docker] support offload NSATokenToKVPool by @zhuzilin in https://github.com/THUDM/slime/pull/1429
- [doc] cleanup redundant example and scripts by @zhuzilin in https://github.com/THUDM/slime/pull/1431
- [docs] move low precision example into main doc by @zhuzilin in https://github.com/THUDM/slime/pull/1432
- [docs] move reporducibility to main doc by @zhuzilin in https://github.com/THUDM/slime/pull/1433
- [docs] a bit addition info for pd disaggregation by @zhuzilin in https://github.com/THUDM/slime/pull/1434
- [docs] add debug suggestion for ima by @zhuzilin in https://github.com/THUDM/slime/pull/1435
- Fix retool example incorrectly handling max_tool_calls by @fzyzcjy in https://github.com/THUDM/slime/pull/1427
- Docs: Add qqr to "Projects Built upon slime" section by @bcol23 in https://github.com/THUDM/slime/pull/1425
- [docs] fix doc by @zhuzilin in https://github.com/THUDM/slime/pull/1436
- Fix Hf model to Mcore checkpoint conversion on AMD gpus by @gramesh-amd in https://github.com/THUDM/slime/pull/279
- [cleanup] clean up utils folder by @zhuzilin in https://github.com/THUDM/slime/pull/1437
- [FSDP][Fix] Fix redundant import by @Hecate0821 in https://github.com/THUDM/slime/pull/1354
- [Fix] Update deprecated sglang ep args in docs and scripts by @coding-famer in https://github.com/THUDM/slime/pull/1344
- Add Qwen3-Coder-30B-A3B-Instruct model script by @maoquan-ms in https://github.com/THUDM/slime/pull/1213
- Revert "[style] minor: remove subclass" by @zhuzilin in https://github.com/THUDM/slime/pull/1441
- [revert] revert the parallel state change by @zhuzilin in https://github.com/THUDM/slime/pull/1442
- [FSDP] remove cp in fsdp by @zhuzilin in https://github.com/THUDM/slime/pull/1443
- [fsdp] remove tis by @zhuzilin in https://github.com/THUDM/slime/pull/1444
- Megatron VLM Support (Qwen2.5-VL series) (3/N) by @Zhuohao-Li in https://github.com/THUDM/slime/pull/1210
- fix the loss mask for mask_offpolicy_in_partial_rollout by @zhuzilin in https://github.com/THUDM/slime/pull/1445
- [Fix] Return origin_samples instead of False in filter_long_prompt by @kaysonyu in https://github.com/THUDM/slime/pull/1438
- [docker] fix sglang streaming output bug by @zhuzilin in https://github.com/THUDM/slime/pull/1446
- [docker] change base image from lmsysorg to slimerl/sglang by @zhuzilin in https://github.com/THUDM/slime/pull/1447
- Fix: Apply loss mask to KL in REINFORCE++ returns calculation by @kaysonyu in https://github.com/THUDM/slime/pull/1372
- [docs] add docs for ppo by @zhuzilin in https://github.com/THUDM/slime/pull/1448
- [release] bump to v0.2.2 by @zhuzilin in https://github.com/THUDM/slime/pull/1345
New Contributors
- @attack204 made their first contribution in https://github.com/THUDM/slime/pull/1071
- @lin0303-siyuan made their first contribution in https://github.com/THUDM/slime/pull/1092
- @ShanningZhuang made their first contribution in https://github.com/THUDM/slime/pull/1125
- @rucnyz made their first contribution in https://github.com/THUDM/slime/pull/1116
- @Lyken17 made their first contribution in https://github.com/THUDM/slime/pull/1147
- @cklxx made their first contribution in https://github.com/THUDM/slime/pull/1101
- @qqwqqw689 made their first contribution in https://github.com/THUDM/slime/pull/1163
- @zijiexia made their first contribution in https://github.com/THUDM/slime/pull/1122
- @lr-tsinghua11 made their first contribution in https://github.com/THUDM/slime/pull/1152
- @jiahe7ay made their first contribution in https://github.com/THUDM/slime/pull/1182
- @maoquan-ms made their first contribution in https://github.com/THUDM/slime/pull/1198
- @Hyaloid made their first contribution in https://github.com/THUDM/slime/pull/1220
- @Vivicai1005 made their first contribution in https://github.com/THUDM/slime/pull/1212
- @yurekami made their first contribution in https://github.com/THUDM/slime/pull/1282
- @miles-code-angel made their first contribution in https://github.com/THUDM/slime/pull/1303
- @gxlvera made their first contribution in https://github.com/THUDM/slime/pull/1141
- @vpj made their first contribution in https://github.com/THUDM/slime/pull/1306
- @kaysonyu made their first contribution in https://github.com/THUDM/slime/pull/1315
- @yzlnew made their first contribution in https://github.com/THUDM/slime/pull/1333
- @gramesh-amd made their first contribution in https://github.com/THUDM/slime/pull/279
Full Changelog: https://github.com/THUDM/slime/compare/v0.2.1...v0.2.2