| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2026-05-07 | 19.2 kB | |
| v4.2.0 source code.tar.gz | 2026-05-07 | 15.4 MB | |
| v4.2.0 source code.zip | 2026-05-07 | 16.1 MB | |
| Totals: 3 Items | 31.5 MB | 0 | |
中文版
新特性
- Megatron-SWIFT
a. 新增 model_type 支持:kimi_k25、hy_v3、llava_onevision。(llava_onevision 感谢 @randydl 的贡献)
b. 支持 GLM-5 共享参数 MTP,可通过
--mtp_shared_weights参数启用。 c. 支持 Qwen3.5 FP8 训练,训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/models/qwen3_5/fp8.sh d. 自定义 Megatron 模型文档:https://swift.readthedocs.io/zh-cn/latest/Megatron-SWIFT/Custom-Model.html e. 支持控制 MTP 分支中decoder_input是否停止梯度,即 MTP loss 能否直接通过 decoder_input 回传梯度到 Embedding/ViT,可通过--mtp_decoder_input_detach参数控制。 f.mlp_padding_free参数兼容序列并行 g. 支持通过megatron export命令进行权重 FP8 量化导出,脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/megatron/fp8/quant.sh h. 移除对 megatron-core 0.12 - 0.14 版本的依赖兼容支持。 - RL
a. GKD/OPSD 支持设置
generation_batch_size/steps_per_generaiton参数。 b. GKD/OPSD teacher_server_api 兼容多模态训练。 c. GKD/OPSD 兼容 padding_free。 d. Megatron GRPO/GKD 权重同步支持仅同步 LoRA 权重。 e. swift rollout 新增异常捕获机制,避免进程静默卡死。 f. GRPO ref_sync_callback 支持在 ZeRO-3 下进行分层 gather,避免 OOM。 g. GRPO TRL 依赖版本升级至 >= 0.26。 - 训练
a. 支持 Qwen3.5 序列并行,可通过
--sequence_parallel_size参数控制。(感谢 @meichangsu1 的贡献) b. 支持在数据集中直接指定loss_scale,提供更灵活的控制方式,参考文档:https://swift.readthedocs.io/zh-cn/latest/Customization/Custom-dataset.html#id4 c. 数据集 datasets 依赖兼容 4.x 版本。 d. cached_dataset 与--truncation_strategy split策略兼容。 - 硬件
a. NPU 支持基于 transformers/Megatron 后端的 Qwen3.5 训练,使用 Megatron 后端时需开启
USE_MCORE_GDN=0环境变量。(感谢 @addsubmuldiv、@hazelduan 的贡献) b. 新增 AMD 支持文档:https://swift.readthedocs.io/zh-cn/latest/BestPractices/AMD-support.html (感谢 @Treemann 的贡献) c. 支持 Metax 硬件的 RL 训练。(感谢 @suenphey 的贡献) d. NPU Megatron 训练兼容 megatron-core 0.15.3。(感谢 @addsubmuldiv 的贡献)
新模型
- 纯文本模型 a. ZhipuAI/GLM-5.1 b. MiniMax/MiniMax-M2.7 c. moonshotai/Kimi-K2.6(仅含纯文本) d. Tencent-Hunyuan/Hy3-preview e. AIDC-AI/Marco-Nano-Instruct 系列
- 多模态模型 a. Qwen/Qwen3.6-35B-A3B、Qwen/Qwen3.6-27B b. Qwen3-ASR(感谢 @xut806 的贡献) c. Gemma4 系列模型混合模态数据集训练支持 d. OpenDataLab/MinerU2.5-Pro-2604-1.2B e. OpenBMB/MiniCPM-o-4_5 新增音频模态支持(感谢 @fanqiNO1 的贡献) f. allenai/Molmo2-4B(感谢 @Kagura-0001 的贡献)
English Version
New Features
- Megatron-SWIFT
a. Added model_type support: kimi_k25, hy_v3, llava_onevision. (llava_onevision contributed by @randydl)
b. Added support for GLM-5 shared-parameter MTP, which can be enabled via the
--mtp_shared_weightsargument. c. Added support for Qwen3.5 FP8 training. Training script reference: https://github.com/modelscope/ms-swift/blob/main/examples/models/qwen3_5/fp8.sh d. Custom Megatron model documentation: https://swift.readthedocs.io/en/latest/Megatron-SWIFT/Custom-Model.html e. Added support for controlling whetherdecoder_inputstops gradient in the MTP branch (i.e., whether MTP loss can backpropagate gradients throughdecoder_inputto Embedding/ViT), configurable via the--mtp_decoder_input_detachargument. f.mlp_padding_freeis now compatible with Sequence Parallelism. g. Added support for FP8 quantization export via themegatron exportcommand. Script reference: https://github.com/modelscope/ms-swift/blob/main/examples/megatron/fp8/quant.sh h. Removed dependency compatibility support for megatron-core versions 0.12 - 0.14. - RL
a. GKD/OPSD now supports the
generation_batch_size/steps_per_generationparameters. b. GKD/OPSDteacher_server_apiis now compatible with multimodal training. c. GKD/OPSD is now compatible withpadding_free. d. Megatron GRPO/GKD weight synchronization now supports syncing LoRA weights only. e. Added exception handling toswift rolloutto prevent silent process hangs. f. GRPOref_sync_callbacknow supports layer-wise gather under ZeRO-3 to avoid OOM. g. GRPO TRL dependency upgraded to>= 0.26. - Training
a. Added support for Qwen3.5 Sequence Parallelism, controllable via the
--sequence_parallel_sizeargument. (Contributed by @meichangsu1) b. Added support for specifyingloss_scaledirectly in the dataset for more flexible loss control. Documentation: https://swift.readthedocs.io/en/latest/Customization/Custom-dataset.html#supervised-fine-tuning c. Dataset dependency is now compatible with datasets 4.x. d.cached_datasetis now compatible with the--truncation_strategy splitstrategy. - Hardware
a. NPU now supports Qwen3.5 training with transformers/Megatron backends. When using the Megatron backend, the
USE_MCORE_GDN=0environment variable must be set. (Contributed by @addsubmuldiv, @hazelduan) b. Added AMD support documentation: https://swift.readthedocs.io/en/latest/BestPractices/AMD-support.html (Contributed by @Treemann) c. Added RL training support for MetaX hardware. (Contributed by @suenphey) d. NPU Megatron training is now compatible with megatron-core 0.15.3. (Contributed by @addsubmuldiv)
New Models
- Text-only Models a. ZhipuAI/GLM-5.1 b. MiniMax/MiniMax-M2.7 c. moonshotai/Kimi-K2.6 (text-only) d. Tencent-Hunyuan/Hy3-preview e. AIDC-AI/Marco-Nano-Instruct series
- Multimodal Models a. Qwen/Qwen3.6-35B-A3B, Qwen/Qwen3.6-27B b. Qwen3-ASR (Contributed by @xut806) c. Added mixed-modality dataset training support for Gemma4 series models. d. OpenDataLab/MinerU2.5-Pro-2604-1.2B e. OpenBMB/MiniCPM-o-4_5 now supports audio modality. (Contributed by @fanqiNO1) f. allenai/Molmo2-4B (Contributed by @Kagura-0001)
What's Changed
- [model] Support GLM-5.1 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9038
- [docs] update readme by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9043
- [docs] update qwen3.5 best practice by @zhangfanTJU in https://github.com/modelscope/ms-swift/pull/9039
- [bugfix] sync template.padding_free with args after prepare_model for… by @yaoruda in https://github.com/modelscope/ms-swift/pull/9031
- [bugfix] fix gemma4 audio batch by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9045
- [megatron] refactor forward_step_helper by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9048
- [megatron] update megatron destroy_process_group by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9052
- feat: add Qwen3-ASR model support (#8118) by @xut806 in https://github.com/modelscope/ms-swift/pull/9034
- [bugfix] fix multi-node server mode weight sync race condition by @sys-reasoner in https://github.com/modelscope/ms-swift/pull/9060
- update qwen_asr by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9061
- [bugfix] fix qwen3_reranker mcore_model_type by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9062
- [bugfix] fix qwen3 omni template by @addsubmuldiv in https://github.com/modelscope/ms-swift/pull/9066
- [docs] add AMD best practices by @Treemann in https://github.com/modelscope/ms-swift/pull/9069
- Update npu mindspeed doc and fix new version mindspeed's cp error by @addsubmuldiv in https://github.com/modelscope/ms-swift/pull/9067
- [bugfix] fix megatron vllm_engine_kwargs & cosine_max_len by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9072
- [bugfix] fix transformers generate default top_k by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9071
- [model] support MinerU2.5-Pro by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9074
- [bugfix] fix megatron pt by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9075
- [model] Support minimax 2.7 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9079
- [bugfix] fix gemma4 31b by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9080
- [bugfix] fix vllm (0.19.0) qwen3_5 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9086
- [bugfix] fix gemma4 zero3 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9083
- [bugfix] fix gemma4 system by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9089
- [bugfix] fix bge-m3 reranker by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9091
- remove prompt id for megatron grpo by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9094
- [docs] update npu docs en by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9097
- [metax] support pynccl communicator in vllm by @suenphey in https://github.com/modelscope/ms-swift/pull/9090
- [bugfix] fix megatron finetune by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9099
- [grpo] set default load_format auto by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9100
- update qr code by @tastelikefeet in https://github.com/modelscope/ms-swift/pull/9109
- Optimize weight synchronization for LoRA adapter weights by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9077
- support gemma4 vllm multi-modal inference by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9105
- [bugfix] fix gptq transformers>=5.0 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9042
- [bugfix] Fix gemma4 image template by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9115
- fix bugs by @hpsun1109 in https://github.com/modelscope/ms-swift/pull/9120
- [bugfix] fix vit_gc by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9125
- [megatron] support qwen3.5 fp8 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9106
- fix chunked data slicing in multi-turn GRPO by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9128
- [model] support qwen3.6 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9129
- [bugfix] fix vllm mtp by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9138
- Update shell by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9140
- [model] Support Marco by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9137
- [bugfix] fix opsd transformer generate by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9145
- [megatron] mtp_decoder_input_detach by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9146
- [Feature] Add Molmo2 support (image + video inference, LoRA SFT) by @Kagura-0001 in https://github.com/modelscope/ms-swift/pull/9063
- [docs] update docs by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9148
- [megatron] support mtp_shared_weights by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9151
- update swift image 4.1 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9153
- feat: support audio input for minicpm-o-4_5 by @fanqiNO1 in https://github.com/modelscope/ms-swift/pull/9147
- [model] update minicpmo 4_5 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9159
- [bugfix] fix mtp keys by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9163
- [bugfix] fix qwen3_omni infer by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9164
- feat(qwen): add sequence parallel support for Qwen3.5 linear attention by @meichangsu1 in https://github.com/modelscope/ms-swift/pull/9162
- [bugfix] fix qwen3_5 sp compat transformers 5.5.4 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9165
- Fix megatron save oom by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9166
- [bugfix] fix eval loss denominator under sequence_parallel by @YarivColbeci in https://github.com/modelscope/ms-swift/pull/9152
- Add sequence parallel compatibility with transformers >= 5.4.0 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9167
- [bugfix] fix megatron minimax save hang by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9171
- [bugfix] fix optimizer deepspeed by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9173
- [bugfix] Fix megatron save_total_limit & pp by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9175
- [bugfix] fix docs by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9176
- [bugfix] fix qwen3 omni audio 30s by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9182
- [bugfix] fix grpo generate by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9183
- [model] support Qwen/Qwen3.6-27B by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9184
- [bugfix] fix qwen3.5 sp by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9189
- [compat] compat peft 0.19 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9192
- [trainer] optimize use_logits_to_keep by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9194
- [bugfix] fix seq_cls zero3 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9190
- npu qwen3.5 megatron padding_free fix by @addsubmuldiv in https://github.com/modelscope/ms-swift/pull/9196
- Support zero3 hierarchical gather in the ref sync callback by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9170
- support multi-modal training for gkd teacher api by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9197
- Fixing issue with video loading for Gemma 4 with relative paths by @perone in https://github.com/modelscope/ms-swift/pull/9201
- [bugfix] fix CI by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9209
- NPU patch FLA by @hazelduan in https://github.com/modelscope/ms-swift/pull/9195
- [bugfix] fix cache_dataset truncation_strategy by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9210
- support truncation_strategy split & cached_dataset (qwen3.5) by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9211
- [docs] update swift image 4.1.3 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9213
- [dataset] support "loss_scale" in dataset by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9214
- [model] support hy3 preview by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9198
- [bugfix] fix agent_template test by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9215
- [model] support kimi k2.6 (only text) by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9186
- [template] remove template remove_response by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9217
- support fa4 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9218
- [bugfix] fix ignore_data_skip by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9220
- Npu patcher refactor by @addsubmuldiv in https://github.com/modelscope/ms-swift/pull/9223
- [bugfix] Fix lora llm resume from checkpoint by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9225
- [model] refactor ling model_type by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9232
- fix(rollout): add exception capture and non-blocking poll in by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9229
- fix(colocate): vllm triton error by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9233
- [model] support gemma4 mixed data by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9180
- [bugfix] cached_dataset reduce disk usage by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9242
- [docs] fix docs by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9244
- [docs] update wechat by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9247
- Document Qwen3.5 FLA patch for NPU support by @hazelduan in https://github.com/modelscope/ms-swift/pull/9237
- [docs] update megatron docs by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9249
- [docs] update docs by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9250
- Update datasets requirements by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9252
- [megatron] remove megatron core 0.12-0.14 by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9260
- update mlp_padding_free by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9262
- Document FLA/MindSpeed replacement for Qwen3.5 on NPU by @hazelduan in https://github.com/modelscope/ms-swift/pull/9238
- Npu doc update by @addsubmuldiv in https://github.com/modelscope/ms-swift/pull/9245
- [bugfix] fix tool_call loss_scale by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9266
- update requirements by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9275
- [bugfix] fix qwen3_5 template by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9279
- [Bug fix] Adapt SwiftMixin.create_optimizer signature for transformers >= 4.40 by @ys2025-AI in https://github.com/modelscope/ms-swift/pull/9281
- [bugfix] fix grpo rollout step by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9264
- [bugfix] fix create_optimizer by @Jintao-Huang in https://github.com/modelscope/ms-swift/pull/9282
- [gkd] support buffers & fix some bugs by @hjh0119 in https://github.com/modelscope/ms-swift/pull/9278
New Contributors
- @zhangfanTJU made their first contribution in https://github.com/modelscope/ms-swift/pull/9039
- @yaoruda made their first contribution in https://github.com/modelscope/ms-swift/pull/9031
- @xut806 made their first contribution in https://github.com/modelscope/ms-swift/pull/9034
- @sys-reasoner made their first contribution in https://github.com/modelscope/ms-swift/pull/9060
- @Treemann made their first contribution in https://github.com/modelscope/ms-swift/pull/9069
- @suenphey made their first contribution in https://github.com/modelscope/ms-swift/pull/9090
- @Kagura-0001 made their first contribution in https://github.com/modelscope/ms-swift/pull/9063
- @fanqiNO1 made their first contribution in https://github.com/modelscope/ms-swift/pull/9147
- @YarivColbeci made their first contribution in https://github.com/modelscope/ms-swift/pull/9152
- @perone made their first contribution in https://github.com/modelscope/ms-swift/pull/9201
- @hazelduan made their first contribution in https://github.com/modelscope/ms-swift/pull/9195
- @ys2025-AI made their first contribution in https://github.com/modelscope/ms-swift/pull/9281
Full Changelog: https://github.com/modelscope/ms-swift/compare/v4.1.0...v4.2.0