Download Latest Version v0.9.3_ Llama4, Gemma3, Qwen3, InternVL3, Qwen2.5-Omni source code.tar.gz (10.1 MB)
Email in envelope

Get an email when there's a new version of LLaMA Efficient Tuning

Home / v0.9.3
Name Modified Size InfoDownloads / Week
Parent folder
llamafactory-0.9.3.tar.gz 2025-06-16 259.9 kB
llamafactory-0.9.3-py3-none-any.whl 2025-06-16 305.3 kB
README.md 2025-06-16 7.5 kB
v0.9.3_ Llama4, Gemma3, Qwen3, InternVL3, Qwen2.5-Omni source code.tar.gz 2025-06-16 10.1 MB
v0.9.3_ Llama4, Gemma3, Qwen3, InternVL3, Qwen2.5-Omni source code.zip 2025-06-16 10.3 MB
Totals: 5 Items   20.9 MB 3

New features

  • πŸ”₯ InternVL2.5/InternVL3 model by @Kuangdd01 in [#7258]
  • πŸ”₯ Qwen2.5-Omni model by @Kuangdd01 in [#7537]
  • πŸ”₯ Llama 4 and Gemma 3 multimodal model by @hiyouga in [#7273] and [#7611]
  • πŸ”₯ Official GPU docker image by @yzoaim in [#8181]
  • πŸ”₯ SGLang inference by @Qiaolin-Yu and @jhinpan in [#7278]
  • GLM-4-0414 and GLM-Z1 model by @zRzRzRzRzRzRzR in [#7695]
  • Kimi-VL model by @Kuangdd01 in [#7719]
  • Qwen3 model by @hiyouga in [#7885]
  • MiMo and MiMo-VL model by @Kuangdd01 in [#7946] [#8249]
  • SmolLM/SmolLM2 model by @akshatsehgal in [#8050] [#8220]
  • MiniCPM4 model by @LDLINGLINGLING in [#8314]
  • Mistral-Small-3.1 model by @Kuangdd01 in [#8335]
  • Add scripts/eval_bleu_rouge.py by @SnowFox4004 in [#7419]
  • Add Muon optimizer by @tianshijing in [#7749]
  • Support video/audio inference with vLLM by @hiyouga in [#7566]
  • Support S3/GCS cloud data by @erictang000 in [#7567]
  • Support vLLM-ascend by @leo-pony in [#7739]
  • Support OmegaConf by @hiyouga in [#7793]
  • Support early-stopping by @hiyouga in [#7797]
  • Add enable_thinking argument for reasoning models by @hiyouga in [#7928]
  • PyTorch-elastic and fault-tolerant launch by @hubutui in [#8286]
  • Length Desensitization DPO (LD-DPO) by @amangup in [#8362]

New models

  • Base models
  • SmolLM/SmolLM2 (135M/360M/1.7B) πŸ“„
  • Qwen3 Base (0.6B/1.7B/4B/8B/14B/30B) πŸ“„
  • Gemma 3 (1B/4B/12B/27B) πŸ“„πŸ–ΌοΈ
  • MedGemma (4B) πŸ“„πŸ©Ί
  • MiMo Base (7B) πŸ“„
  • Seed-Coder Base (8B) πŸ“„βŒ¨οΈ
  • Mistral-Small-3.1 Base (24B) πŸ“„πŸ–ΌοΈ
  • GLM-4-0414 Base (32B) πŸ“„
  • Llama 4 (109B/492B) πŸ“„πŸ–ΌοΈ
  • Instruct/Chat models
  • SmolLM/SmolLM2 Instruct (135M/360M/1.7B) πŸ“„πŸ€–
  • MiniCPM4 (0.5B/8B) πŸ“„πŸ€–
  • Qwen3 (0.6B/1.7B/4B/8B/14B/32B/30B/235B) πŸ“„πŸ€–πŸ§ 
  • Gemma 3 Instruct (1B/4B/12B/27B) πŸ“„πŸ€–πŸ–ΌοΈ
  • InternVL2.5/3 Instruct/MPO (1B/2B/8B/14B/38B/78B) πŸ“„πŸ€–πŸ–ΌοΈ
  • Qwen2.5-Omni (3B/7B) πŸ“„πŸ€–πŸ–ΌοΈπŸ”ˆ
  • MedGemma Instruct (4B/27B) πŸ“„πŸ€–πŸ©Ί
  • MiMo SFT/RL (7B) πŸ“„πŸ€–
  • MiMo-VL SFT/RL (7B) πŸ“„πŸ€–πŸ–ΌοΈ
  • Hunyuan Instruct (7B) πŸ“„πŸ€–
  • Seed-Coder Instruct/Reasoning (8B) πŸ“„πŸ€–πŸ§ βŒ¨οΈ
  • GLM-4-0414/GLM-Z1 Instruct (9B/32B) πŸ“„πŸ€–πŸ§ 
  • DeepSeek-R1-0528 (8B/671B) πŸ“„πŸ€–πŸ§ 
  • Kimi-VL Instruct/Thinking (17B) πŸ“„πŸ€–πŸ§ πŸ–ΌοΈ
  • Mistral-Small-3.1 Instruct (24B) πŸ“„πŸ€–πŸ–ΌοΈ
  • Qwen2.5-VL Instruct (32B) πŸ“„πŸ€–πŸ–ΌοΈ
  • Llama 4 Instruct (109B/492B) πŸ“„πŸ€–πŸ–ΌοΈ

New datasets

  • Preference datasets
  • COIG-P (zh) πŸ“„

Bug fix

  • Fix add new tokens by @flashJd in [#7253]
  • Fix ultrachat_200k dataset by @felladrin in [#7259]
  • Add efficient 4D attention mask for neat packing by @BlackWingedKing in [#7272]
  • Fix WSD lr scheduler by @x22x22 in [#7304]
  • Fix position ids in neat packing by @BlackWingedKing in [#7318]
  • Fix proxy setting in webui by @taoharry in [#7332]
  • Improve entrypoint by @ENg-122 in [#7345]
  • Fix ray destroy process group by @erictang000 in [#7395]
  • Fix SGLang dependencies by @guoquan in [#7432]
  • Upgrade docker package version by @rumichi2210 in [#7442]
  • Update liger kernel for qwen2.5-vl by @xiaosu-zhu in [#7453]
  • Fix lora on quant models by @GuoCoder in [#7456]
  • Enable liger kernel for gemma3 by @kennylam777 in [#7462]
  • Enable liger kernel for paligemma by @eljandoubi in [#7466]
  • Add Swanlab lark notification by @Xu-pixel in [#7481]
  • Fix gemma3 use cache attribute by @ysjprojects in [#7500]
  • Fix pixtral plugin by @Kuangdd01 in [#7505]
  • Fix KTO mismatch pair strategy by @himalalps in [#7509]
  • Support dataset_shards by @aliencaocao in [#7530]
  • Fix qwen2.5omni plugin by @Kuangdd01 in [#7573] [#7578] [#7883]
  • Fix ppo trainer by @gechengze in [#7576]
  • Fix workflow by @Shawn-Tao in [#7635]
  • Support qwen2.5omni audio+video2text by @Kuangdd01 in [#7638]
  • Upgrade deps for SGLang by @adarshxs in [#7639]
  • Allow ray env setting by @erictang000 in [#7647]
  • Fix CUDA warning on intel xpus by @jilongW in [#7655]
  • Fix liger kernel patch by @danny980521 in [#7660]
  • Fix rocm dockerfile by @fluidnumerics-joe in [#7725]
  • Fix qwen2vl with neat packing by @GeoffreyChen777 in [#7754]
  • Fix a constant by @AlphaBladez in [#7765]
  • Fix autogptq for Gemma by @ddddng in [#7786]
  • Fix internvl models by @Kuangdd01 in [#7801] [#7803] [#7817] [#8129]
  • Fix DeepSpeed ZeRO3 on moe models by @hiyouga in [#7826] [#7879]
  • Fix gradient checkpoint func for vit by @hiyouga in [#7830]
  • Support S3 ray storage by @erictang000 in [#7854]
  • Fix Kimi-VL attention by @Kuangdd01 in [#7867]
  • Fix minicpm-o vllm inference by @hiyouga in [#7870]
  • Unfreeze muiltimodal projector in freeze training by @zhaop-l in [#7872]
  • Fix Qwen2.5-omni plugin by @hiyouga in [#7875] [#7962]
  • Add warp support link by @ericdachen in [#7887]
  • Replace eos token for base model by @hiyouga in [#7911]
  • Add eval_on_each_dataset arg by @hiyouga in [#7912]
  • Fix qwen3 loss by @hiyouga in [#7923] [#8109]
  • Add repetition_penalty to api by @wangzhanxd in [#7958]
  • Add graphgen to readme by @tpoisonooo in [#7974]
  • Support video params in vllm batch infer by @Kuangdd01 in [#7992]
  • Fix tool formatter by @yunhao-tech in [#8000]
  • Fix kimi vl plugin by @hiyouga in [#8015]
  • Support batch preprocess in vllm batch infer by @Shawn-Tao in [#8051]
  • Support loading remote folder by @erictang000 in [#8078]
  • Fix video utils import by @Kuangdd01 in [#8077]
  • Fix SGLang LoRA inference by @Kiko-RWan in [#8067]
  • Fix cli by @Wangbiao2 in [#8095]
  • Fix pretrain workflow by @SunnyHaze in [#8099]
  • Fix rope args for yarn by @piamo in [#8101]
  • Add no build isolation in installing by @hiyouga in [#8103]
  • Switch to GPTQModel and deprecate AutoGPTQ by @hiyouga in [#8108]
  • Support llama3 parallel function call by @hiyouga in [#8124]
  • Add data_shared_file_system by @hiyouga in [#8179]
  • Fix load remote files by @youngwookim in [#8183]
  • Fix dataset info by @Muqi1029 in [#8197]
  • Fix qwen2.5 omni merge script by @Kuangdd01 in [#8227] [#8293]
  • Add unittest for VLM save load by @Kuangdd01 in [#8248]
  • Add tag in swanlab by @Zeyi-Lin in [#8258]
  • Support input video frames by @Kuangdd01 in [#8264]
  • Fix empty template by @hiyouga in [#8312]
  • Support full-finetuning with unsloth by @Remorax in [#8325]
  • Add awesome work by @MING-ZCH in [#8333]
  • Release v0.9.3 by @hiyouga in [#8386]
  • Fix qwen2vl position ids by @hiyouga in [#8387]
  • Fix vlm utils by @hiyouga in [#8388]
  • Fix [#3802] [#4443] [#5548] [#6236] [#6322] [#6432] [#6708] [#6739] [#6881] [#6919] [#7080] [#7105] [#7119] [#7225] [#7267] [#7327] [#7389] [#7416] [#7427] [#7428] [#7443] [#7447] [#7454] [#7490] [#7501] [#7502] [#7513] [#7520] [#7541] [#7545] [#7552] [#7563] [#7598] [#7600] [#7613] [#7636] [#7678] [#7680] [#7687] [#7688] [#7730] [#7743] [#7772] [#7791] [#7800] [#7816] [#7829] [#7845] [#7865] [#7874] [#7889] [#7905] [#7906] [#7907] [#7909] [#7916] [#7918] [#7919] [#7939] [#7953] [#7965] [#7990] [#8008] [#8056] [#8061] [#8066] [#8069] [#8087] [#8091] [#8092] [#8096] [#8097] [#8111] [#8119] [#8147] [#8166] [#8169] [#8174] [#8182] [#8189] [#8223] [#8241] [#8247] [#8253] [#8294] [#8309] [#8324] [#8326] [#8332]

Full Changelog: https://github.com/hiyouga/LLaMA-Factory/compare/v0.9.2...v0.9.3

Source: README.md, updated 2025-06-16