SWIFT LLM - Browse /v4.2.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2026-05-07	19.2 kB	0
v4.2.0 source code.tar.gz	2026-05-07	15.4 MB	0
v4.2.0 source code.zip	2026-05-07	16.1 MB	0
Totals: 3 Items		31.5 MB	0

中文版

新特性

Megatron-SWIFT a. 新增 model_type 支持：kimi_k25、hy_v3、llava_onevision。（llava_onevision 感谢 @randydl 的贡献） b. 支持 GLM-5 共享参数 MTP，可通过 --mtp_shared_weights 参数启用。 c. 支持 Qwen3.5 FP8 训练，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/models/qwen3_5/fp8.sh d. 自定义 Megatron 模型文档：https://swift.readthedocs.io/zh-cn/latest/Megatron-SWIFT/Custom-Model.html e. 支持控制 MTP 分支中 decoder_input 是否停止梯度，即 MTP loss 能否直接通过 decoder_input 回传梯度到 Embedding/ViT，可通过 --mtp_decoder_input_detach 参数控制。 f. mlp_padding_free 参数兼容序列并行 g. 支持通过 megatron export 命令进行权重 FP8 量化导出，脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/megatron/fp8/quant.sh h. 移除对 megatron-core 0.12 - 0.14 版本的依赖兼容支持。
RL a. GKD/OPSD 支持设置 generation_batch_size/steps_per_generaiton 参数。 b. GKD/OPSD teacher_server_api 兼容多模态训练。 c. GKD/OPSD 兼容 padding_free。 d. Megatron GRPO/GKD 权重同步支持仅同步 LoRA 权重。 e. swift rollout 新增异常捕获机制，避免进程静默卡死。 f. GRPO ref_sync_callback 支持在 ZeRO-3 下进行分层 gather，避免 OOM。 g. GRPO TRL 依赖版本升级至 >= 0.26。
训练 a. 支持 Qwen3.5 序列并行，可通过 --sequence_parallel_size 参数控制。（感谢 @meichangsu1 的贡献） b. 支持在数据集中直接指定 loss_scale，提供更灵活的控制方式，参考文档：https://swift.readthedocs.io/zh-cn/latest/Customization/Custom-dataset.html#id4 c. 数据集 datasets 依赖兼容 4.x 版本。 d. cached_dataset 与 --truncation_strategy split 策略兼容。
硬件 a. NPU 支持基于 transformers/Megatron 后端的 Qwen3.5 训练，使用 Megatron 后端时需开启 USE_MCORE_GDN=0 环境变量。（感谢 @addsubmuldiv、@hazelduan 的贡献） b. 新增 AMD 支持文档：https://swift.readthedocs.io/zh-cn/latest/BestPractices/AMD-support.html （感谢 @Treemann 的贡献） c. 支持 Metax 硬件的 RL 训练。（感谢 @suenphey 的贡献） d. NPU Megatron 训练兼容 megatron-core 0.15.3。（感谢 @addsubmuldiv 的贡献）

新模型

纯文本模型 a. ZhipuAI/GLM-5.1 b. MiniMax/MiniMax-M2.7 c. moonshotai/Kimi-K2.6（仅含纯文本） d. Tencent-Hunyuan/Hy3-preview e. AIDC-AI/Marco-Nano-Instruct 系列
多模态模型 a. Qwen/Qwen3.6-35B-A3B、Qwen/Qwen3.6-27B b. Qwen3-ASR（感谢 @xut806 的贡献） c. Gemma4 系列模型混合模态数据集训练支持 d. OpenDataLab/MinerU2.5-Pro-2604-1.2B e. OpenBMB/MiniCPM-o-4_5 新增音频模态支持（感谢 @fanqiNO1 的贡献） f. allenai/Molmo2-4B（感谢 @Kagura-0001 的贡献）

English Version

New Features

Megatron-SWIFT a. Added model_type support: kimi_k25, hy_v3, llava_onevision. (llava_onevision contributed by @randydl) b. Added support for GLM-5 shared-parameter MTP, which can be enabled via the --mtp_shared_weights argument. c. Added support for Qwen3.5 FP8 training. Training script reference: https://github.com/modelscope/ms-swift/blob/main/examples/models/qwen3_5/fp8.sh d. Custom Megatron model documentation: https://swift.readthedocs.io/en/latest/Megatron-SWIFT/Custom-Model.html e. Added support for controlling whether decoder_input stops gradient in the MTP branch (i.e., whether MTP loss can backpropagate gradients through decoder_input to Embedding/ViT), configurable via the --mtp_decoder_input_detach argument. f. mlp_padding_free is now compatible with Sequence Parallelism. g. Added support for FP8 quantization export via the megatron export command. Script reference: https://github.com/modelscope/ms-swift/blob/main/examples/megatron/fp8/quant.sh h. Removed dependency compatibility support for megatron-core versions 0.12 - 0.14.
RL a. GKD/OPSD now supports the generation_batch_size/steps_per_generation parameters. b. GKD/OPSD teacher_server_api is now compatible with multimodal training. c. GKD/OPSD is now compatible with padding_free. d. Megatron GRPO/GKD weight synchronization now supports syncing LoRA weights only. e. Added exception handling to swift rollout to prevent silent process hangs. f. GRPO ref_sync_callback now supports layer-wise gather under ZeRO-3 to avoid OOM. g. GRPO TRL dependency upgraded to >= 0.26.
Training a. Added support for Qwen3.5 Sequence Parallelism, controllable via the --sequence_parallel_size argument. (Contributed by @meichangsu1) b. Added support for specifying loss_scale directly in the dataset for more flexible loss control. Documentation: https://swift.readthedocs.io/en/latest/Customization/Custom-dataset.html#supervised-fine-tuning c. Dataset dependency is now compatible with datasets 4.x. d. cached_dataset is now compatible with the --truncation_strategy split strategy.
Hardware a. NPU now supports Qwen3.5 training with transformers/Megatron backends. When using the Megatron backend, the USE_MCORE_GDN=0 environment variable must be set. (Contributed by @addsubmuldiv, @hazelduan) b. Added AMD support documentation: https://swift.readthedocs.io/en/latest/BestPractices/AMD-support.html (Contributed by @Treemann) c. Added RL training support for MetaX hardware. (Contributed by @suenphey) d. NPU Megatron training is now compatible with megatron-core 0.15.3. (Contributed by @addsubmuldiv)

New Models

Text-only Models a. ZhipuAI/GLM-5.1 b. MiniMax/MiniMax-M2.7 c. moonshotai/Kimi-K2.6 (text-only) d. Tencent-Hunyuan/Hy3-preview e. AIDC-AI/Marco-Nano-Instruct series
Multimodal Models a. Qwen/Qwen3.6-35B-A3B, Qwen/Qwen3.6-27B b. Qwen3-ASR (Contributed by @xut806) c. Added mixed-modality dataset training support for Gemma4 series models. d. OpenDataLab/MinerU2.5-Pro-2604-1.2B e. OpenBMB/MiniCPM-o-4_5 now supports audio modality. (Contributed by @fanqiNO1) f. allenai/Molmo2-4B (Contributed by @Kagura-0001)