LMDeploy - Browse /v0.9.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
lmdeploy-0.9.0+cu118-cp38-cp38-manylinux2014_x86_64.whl	2025-06-19	90.3 MB	0
lmdeploy-0.9.0+cu118-cp38-cp38-win_amd64.whl	2025-06-19	26.4 MB	0
lmdeploy-0.9.0+cu118-cp39-cp39-manylinux2014_x86_64.whl	2025-06-19	90.3 MB	1
lmdeploy-0.9.0+cu118-cp39-cp39-win_amd64.whl	2025-06-19	26.4 MB	0
lmdeploy-0.9.0+cu118-cp310-cp310-manylinux2014_x86_64.whl	2025-06-19	90.3 MB	0
lmdeploy-0.9.0+cu118-cp310-cp310-win_amd64.whl	2025-06-19	26.4 MB	0
lmdeploy-0.9.0+cu118-cp311-cp311-manylinux2014_x86_64.whl	2025-06-19	90.3 MB	0
lmdeploy-0.9.0+cu118-cp311-cp311-win_amd64.whl	2025-06-19	26.4 MB	0
lmdeploy-0.9.0+cu118-cp312-cp312-manylinux2014_x86_64.whl	2025-06-19	90.3 MB	1
lmdeploy-0.9.0+cu118-cp312-cp312-win_amd64.whl	2025-06-19	26.4 MB	0
README.md	2025-06-19	8.4 kB	0
v0.9.0 source code.tar.gz	2025-06-19	1.2 MB	0
v0.9.0 source code.zip	2025-06-19	1.9 MB	0
Totals: 13 Items		586.6 MB	2

What's Changed

🚀 Features

LMDeploy Distserve by @JimyMa in https://github.com/InternLM/lmdeploy/pull/3304
allow api server terminated through requests from clients by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3533
support update params for pytorch backend from api server by @irexyc in https://github.com/InternLM/lmdeploy/pull/3535
support eplb for Qwen3-MoE by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3582
support update params for turbomind backend by @irexyc in https://github.com/InternLM/lmdeploy/pull/3566
Quantize Qwen3 MoE bf16 model to fp8 model at runtime by @grimoire in https://github.com/InternLM/lmdeploy/pull/3631
[Feat]: Support internvl3-8b-hf by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3633
Add FP8 MoE for turbomind by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3601

💥 Improvements

reduce ray memory usage by @grimoire in https://github.com/InternLM/lmdeploy/pull/3487
use dlblas by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3489
internlm3 dense fp8 by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3527
random pad input ids by @grimoire in https://github.com/InternLM/lmdeploy/pull/3530
ray nsys profile support by @grimoire in https://github.com/InternLM/lmdeploy/pull/3448
update blockedfp8 scale name by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3532
start engine loop on server startup event by @grimoire in https://github.com/InternLM/lmdeploy/pull/3523
update two microbatch by @SHshenhao in https://github.com/InternLM/lmdeploy/pull/3540
[ascend]set transdata dynamic shape true by @JackWeiw in https://github.com/InternLM/lmdeploy/pull/3531
ray safe exit by @grimoire in https://github.com/InternLM/lmdeploy/pull/3545
support update params with dp=1 for pytorch engine by @irexyc in https://github.com/InternLM/lmdeploy/pull/3562
Skip dp dummy input forward by @grimoire in https://github.com/InternLM/lmdeploy/pull/3552
Unclock mutual exclusivity of argument: tool-call-parser and reasoning-parser by @jingyibo123 in https://github.com/InternLM/lmdeploy/pull/3550
perform torch.cuda.empty_cache() after conversion by @bltcn in https://github.com/InternLM/lmdeploy/pull/3570
pipeline warmup by @irexyc in https://github.com/InternLM/lmdeploy/pull/3548
Launch multiple api servers for dp > 1 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3414
support awq for Qwen2.5-VL by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3559
support qwen3 /think & /no_think & enable_thinking parameter by @BUJIDAOVS in https://github.com/InternLM/lmdeploy/pull/3564
Eplb by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3572
Update benchmark by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3578
block output when prefetch next forward inputs. by @grimoire in https://github.com/InternLM/lmdeploy/pull/3573
support both eplb and microbatch simultaneously by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3591
Add log_file and set loglevel in launch_servers by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3596
1. add migration flow control by @JimyMa in https://github.com/InternLM/lmdeploy/pull/3599
sampling on the tokenizer's vocab by @grimoire in https://github.com/InternLM/lmdeploy/pull/3604
update deepgemm version by @grimoire in https://github.com/InternLM/lmdeploy/pull/3606
[Ascend] set default distrbuted backend as ray for ascend device by @JackWeiw in https://github.com/InternLM/lmdeploy/pull/3603
Blocked fp8 tma by @grimoire in https://github.com/InternLM/lmdeploy/pull/3470
[PDDisaggreagtion] Async migration by @JimyMa in https://github.com/InternLM/lmdeploy/pull/3610
move dp loop to model agent by @grimoire in https://github.com/InternLM/lmdeploy/pull/3598
update some logs of proxy_server and pt engine by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3621
improve loading model performance by shuffling the weight files by @irexyc in https://github.com/InternLM/lmdeploy/pull/3625
add benchmark scripts about pipeline api and inference engines according to the config file by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3622

🐞 Bug fixes

[ascend] fix recompile on different rank by @jinminxi104 in https://github.com/InternLM/lmdeploy/pull/3513
fix attention sm86 by @grimoire in https://github.com/InternLM/lmdeploy/pull/3519
fix stopwords kv cache by @grimoire in https://github.com/InternLM/lmdeploy/pull/3494
[bug fix] fix PD Disaggregation in DSV3 by @JimyMa in https://github.com/InternLM/lmdeploy/pull/3547
fix proxy server heart beat by @irexyc in https://github.com/InternLM/lmdeploy/pull/3543
fix dp>1 tp=1 ep=1 by @grimoire in https://github.com/InternLM/lmdeploy/pull/3555
fix mixtral on new transformers by @grimoire in https://github.com/InternLM/lmdeploy/pull/3580
[Fix]: reset step after eviction by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3589
fix parsing dynamic rope param failed by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3575
Fix batch infer for gemma3vl by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3592
Fix symbol error when dlBLAS is not imported by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3597
read distributed envs by @grimoire in https://github.com/InternLM/lmdeploy/pull/3600
fix side-effect caused by PR 3590 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3608
fix bug in qwen2 by @LKJacky in https://github.com/InternLM/lmdeploy/pull/3614
fix awq kernel by @grimoire in https://github.com/InternLM/lmdeploy/pull/3618
fix flash mla interface by @grimoire in https://github.com/InternLM/lmdeploy/pull/3617
add sampling_vocab_size by @irexyc in https://github.com/InternLM/lmdeploy/pull/3607
fix for default quant by @grimoire in https://github.com/InternLM/lmdeploy/pull/3640
Fix log file env in ray worker by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3624
fix qwen3 chat template by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3641
fix vlm runtime quant by @grimoire in https://github.com/InternLM/lmdeploy/pull/3644
Fix 'Namespace' object has no attribute 'num_tokens_per_iter' when serving by gradio by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3647
Synchronize weight processing by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3649
Fix zero scale in fp8 quantization by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3652

🌐 Other

update doc for ascend 300I Duo docker image by @jinminxi104 in https://github.com/InternLM/lmdeploy/pull/3526
simulate EPLB for benchmark only by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3490
[ci] add test workflow for 3090 machine by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3561
[ci] fix transformers version in prtest by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3584
[Misc] minor api_server and tm loader, and upgrade docformatter to resolve lint error by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3590
[ci] add qwen3 models into testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3593
update Dockerfile by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3634
check in lmdeploy-builder on cuda 12.4 and 12.8 platform by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3630
fix blocked fp8 overflow by @grimoire in https://github.com/InternLM/lmdeploy/pull/3650
Bump version to v0.9.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3609

New Contributors

@JimyMa made their first contribution in https://github.com/InternLM/lmdeploy/pull/3304
@jingyibo123 made their first contribution in https://github.com/InternLM/lmdeploy/pull/3550
@bltcn made their first contribution in https://github.com/InternLM/lmdeploy/pull/3570
@BUJIDAOVS made their first contribution in https://github.com/InternLM/lmdeploy/pull/3564
@LKJacky made their first contribution in https://github.com/InternLM/lmdeploy/pull/3614

Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.8.0...v0.9.0

Source: README.md, updated 2025-06-19

LMDeploy Files

LMDeploy is a toolkit for compressing, deploying, and serving LLMs

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

LMDeploy Files

LMDeploy is a toolkit for compressing, deploying, and serving LLMs

Get an email when there's a new version of LMDeploy

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors