Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
lmdeploy-0.9.0+cu118-cp38-cp38-manylinux2014_x86_64.whl | 2025-06-19 | 90.3 MB | |
lmdeploy-0.9.0+cu118-cp38-cp38-win_amd64.whl | 2025-06-19 | 26.4 MB | |
lmdeploy-0.9.0+cu118-cp39-cp39-manylinux2014_x86_64.whl | 2025-06-19 | 90.3 MB | |
lmdeploy-0.9.0+cu118-cp39-cp39-win_amd64.whl | 2025-06-19 | 26.4 MB | |
lmdeploy-0.9.0+cu118-cp310-cp310-manylinux2014_x86_64.whl | 2025-06-19 | 90.3 MB | |
lmdeploy-0.9.0+cu118-cp310-cp310-win_amd64.whl | 2025-06-19 | 26.4 MB | |
lmdeploy-0.9.0+cu118-cp311-cp311-manylinux2014_x86_64.whl | 2025-06-19 | 90.3 MB | |
lmdeploy-0.9.0+cu118-cp311-cp311-win_amd64.whl | 2025-06-19 | 26.4 MB | |
lmdeploy-0.9.0+cu118-cp312-cp312-manylinux2014_x86_64.whl | 2025-06-19 | 90.3 MB | |
lmdeploy-0.9.0+cu118-cp312-cp312-win_amd64.whl | 2025-06-19 | 26.4 MB | |
README.md | 2025-06-19 | 8.4 kB | |
v0.9.0 source code.tar.gz | 2025-06-19 | 1.2 MB | |
v0.9.0 source code.zip | 2025-06-19 | 1.9 MB | |
Totals: 13 Items | 586.6 MB | 2 |
What's Changed
🚀 Features
- LMDeploy Distserve by @JimyMa in https://github.com/InternLM/lmdeploy/pull/3304
- allow api server terminated through requests from clients by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3533
- support update params for pytorch backend from api server by @irexyc in https://github.com/InternLM/lmdeploy/pull/3535
- support eplb for Qwen3-MoE by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3582
- support update params for turbomind backend by @irexyc in https://github.com/InternLM/lmdeploy/pull/3566
- Quantize Qwen3 MoE bf16 model to fp8 model at runtime by @grimoire in https://github.com/InternLM/lmdeploy/pull/3631
- [Feat]: Support internvl3-8b-hf by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3633
- Add FP8 MoE for turbomind by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3601
💥 Improvements
- reduce ray memory usage by @grimoire in https://github.com/InternLM/lmdeploy/pull/3487
- use dlblas by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3489
- internlm3 dense fp8 by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3527
- random pad input ids by @grimoire in https://github.com/InternLM/lmdeploy/pull/3530
- ray nsys profile support by @grimoire in https://github.com/InternLM/lmdeploy/pull/3448
- update blockedfp8 scale name by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3532
- start engine loop on server startup event by @grimoire in https://github.com/InternLM/lmdeploy/pull/3523
- update two microbatch by @SHshenhao in https://github.com/InternLM/lmdeploy/pull/3540
- [ascend]set transdata dynamic shape true by @JackWeiw in https://github.com/InternLM/lmdeploy/pull/3531
- ray safe exit by @grimoire in https://github.com/InternLM/lmdeploy/pull/3545
- support update params with dp=1 for pytorch engine by @irexyc in https://github.com/InternLM/lmdeploy/pull/3562
- Skip dp dummy input forward by @grimoire in https://github.com/InternLM/lmdeploy/pull/3552
- Unclock mutual exclusivity of argument:
tool-call-parser
andreasoning-parser
by @jingyibo123 in https://github.com/InternLM/lmdeploy/pull/3550 - perform torch.cuda.empty_cache() after conversion by @bltcn in https://github.com/InternLM/lmdeploy/pull/3570
- pipeline warmup by @irexyc in https://github.com/InternLM/lmdeploy/pull/3548
- Launch multiple api servers for dp > 1 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3414
- support awq for Qwen2.5-VL by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3559
- support qwen3 /think & /no_think & enable_thinking parameter by @BUJIDAOVS in https://github.com/InternLM/lmdeploy/pull/3564
- Eplb by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3572
- Update benchmark by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3578
- block output when prefetch next forward inputs. by @grimoire in https://github.com/InternLM/lmdeploy/pull/3573
- support both eplb and microbatch simultaneously by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3591
- Add log_file and set loglevel in launch_servers by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3596
-
- add migration flow control by @JimyMa in https://github.com/InternLM/lmdeploy/pull/3599
- sampling on the tokenizer's vocab by @grimoire in https://github.com/InternLM/lmdeploy/pull/3604
- update deepgemm version by @grimoire in https://github.com/InternLM/lmdeploy/pull/3606
- [Ascend] set default distrbuted backend as ray for ascend device by @JackWeiw in https://github.com/InternLM/lmdeploy/pull/3603
- Blocked fp8 tma by @grimoire in https://github.com/InternLM/lmdeploy/pull/3470
- [PDDisaggreagtion] Async migration by @JimyMa in https://github.com/InternLM/lmdeploy/pull/3610
- move dp loop to model agent by @grimoire in https://github.com/InternLM/lmdeploy/pull/3598
- update some logs of proxy_server and pt engine by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3621
- improve loading model performance by shuffling the weight files by @irexyc in https://github.com/InternLM/lmdeploy/pull/3625
- add benchmark scripts about pipeline api and inference engines according to the config file by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3622
🐞 Bug fixes
- [ascend] fix recompile on different rank by @jinminxi104 in https://github.com/InternLM/lmdeploy/pull/3513
- fix attention sm86 by @grimoire in https://github.com/InternLM/lmdeploy/pull/3519
- fix stopwords kv cache by @grimoire in https://github.com/InternLM/lmdeploy/pull/3494
- [bug fix] fix PD Disaggregation in DSV3 by @JimyMa in https://github.com/InternLM/lmdeploy/pull/3547
- fix proxy server heart beat by @irexyc in https://github.com/InternLM/lmdeploy/pull/3543
- fix dp>1 tp=1 ep=1 by @grimoire in https://github.com/InternLM/lmdeploy/pull/3555
- fix mixtral on new transformers by @grimoire in https://github.com/InternLM/lmdeploy/pull/3580
- [Fix]: reset step after eviction by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3589
- fix parsing dynamic rope param failed by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3575
- Fix batch infer for gemma3vl by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3592
- Fix symbol error when dlBLAS is not imported by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3597
- read distributed envs by @grimoire in https://github.com/InternLM/lmdeploy/pull/3600
- fix side-effect caused by PR 3590 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3608
- fix bug in qwen2 by @LKJacky in https://github.com/InternLM/lmdeploy/pull/3614
- fix awq kernel by @grimoire in https://github.com/InternLM/lmdeploy/pull/3618
- fix flash mla interface by @grimoire in https://github.com/InternLM/lmdeploy/pull/3617
- add sampling_vocab_size by @irexyc in https://github.com/InternLM/lmdeploy/pull/3607
- fix for default quant by @grimoire in https://github.com/InternLM/lmdeploy/pull/3640
- Fix log file env in ray worker by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3624
- fix qwen3 chat template by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3641
- fix vlm runtime quant by @grimoire in https://github.com/InternLM/lmdeploy/pull/3644
- Fix 'Namespace' object has no attribute 'num_tokens_per_iter' when serving by gradio by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3647
- Synchronize weight processing by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3649
- Fix zero scale in fp8 quantization by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3652
🌐 Other
- update doc for ascend 300I Duo docker image by @jinminxi104 in https://github.com/InternLM/lmdeploy/pull/3526
- simulate EPLB for benchmark only by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3490
- [ci] add test workflow for 3090 machine by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3561
- [ci] fix transformers version in prtest by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3584
- [Misc] minor api_server and tm loader, and upgrade docformatter to resolve lint error by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3590
- [ci] add qwen3 models into testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3593
- update Dockerfile by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3634
- check in lmdeploy-builder on cuda 12.4 and 12.8 platform by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3630
- fix blocked fp8 overflow by @grimoire in https://github.com/InternLM/lmdeploy/pull/3650
- Bump version to v0.9.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3609
New Contributors
- @JimyMa made their first contribution in https://github.com/InternLM/lmdeploy/pull/3304
- @jingyibo123 made their first contribution in https://github.com/InternLM/lmdeploy/pull/3550
- @bltcn made their first contribution in https://github.com/InternLM/lmdeploy/pull/3570
- @BUJIDAOVS made their first contribution in https://github.com/InternLM/lmdeploy/pull/3564
- @LKJacky made their first contribution in https://github.com/InternLM/lmdeploy/pull/3614
Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.8.0...v0.9.0