Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
lmdeploy-0.9.1+cu118-cp38-cp38-manylinux2014_x86_64.whl | 2025-07-10 | 90.4 MB | |
lmdeploy-0.9.1+cu118-cp38-cp38-win_amd64.whl | 2025-07-10 | 26.4 MB | |
lmdeploy-0.9.1+cu118-cp39-cp39-manylinux2014_x86_64.whl | 2025-07-10 | 90.4 MB | |
lmdeploy-0.9.1+cu118-cp39-cp39-win_amd64.whl | 2025-07-10 | 26.4 MB | |
lmdeploy-0.9.1+cu118-cp310-cp310-manylinux2014_x86_64.whl | 2025-07-10 | 90.4 MB | |
lmdeploy-0.9.1+cu118-cp310-cp310-win_amd64.whl | 2025-07-10 | 26.4 MB | |
lmdeploy-0.9.1+cu118-cp311-cp311-win_amd64.whl | 2025-07-10 | 26.4 MB | |
lmdeploy-0.9.1+cu118-cp312-cp312-manylinux2014_x86_64.whl | 2025-07-10 | 90.5 MB | |
lmdeploy-0.9.1+cu118-cp312-cp312-win_amd64.whl | 2025-07-10 | 26.4 MB | |
lmdeploy-0.9.1+cu118-cp311-cp311-manylinux2014_x86_64.whl | 2025-07-10 | 90.5 MB | |
README.md | 2025-07-04 | 3.1 kB | |
v0.9.1 source code.tar.gz | 2025-07-04 | 1.3 MB | |
v0.9.1 source code.zip | 2025-07-04 | 1.9 MB | |
Totals: 13 Items | 587.4 MB | 1 |
What's Changed
🚀 Features
- feature: enable tool_call and reasoning_content parsing for qwen3 by @ywx217 in https://github.com/InternLM/lmdeploy/pull/3615
- Support Mooncake migration backend for PD disaggregation by @Risc-lt in https://github.com/InternLM/lmdeploy/pull/3620
- Support load fused moe weights by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3672
- Seperate api_server and pytorch engine into different processors by @grimoire in https://github.com/InternLM/lmdeploy/pull/3627
- add reward model api by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3665
💥 Improvements
- [ascend]import patch at initiazing time by @JackWeiw in https://github.com/InternLM/lmdeploy/pull/3662
- [ascend]use custon transdata in python kernel by @JackWeiw in https://github.com/InternLM/lmdeploy/pull/3671
- move import transformers in patch by @grimoire in https://github.com/InternLM/lmdeploy/pull/3660
- set ray envs by @grimoire in https://github.com/InternLM/lmdeploy/pull/3643
- raise ImportError when enable ep and not install dlblas by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3636
- Reduce sampling memory usage by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3666
🐞 Bug fixes
- fix dockerfile by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3657
- Fix top-p only sampling with padded vocab size by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3661
- fix pt engine stop & cancel by @irexyc in https://github.com/InternLM/lmdeploy/pull/3681
- Fix convert bf16 to numpy by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3686
- disable torch.compile in cuda graph runner by @grimoire in https://github.com/InternLM/lmdeploy/pull/3691
- fix reward model api by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3703
📚 Documentations
- add reward model documents by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3706
🌐 Other
- upgrade torch and triton by @grimoire in https://github.com/InternLM/lmdeploy/pull/3677
- support do_preprocess=False for chat.completions by @irexyc in https://github.com/InternLM/lmdeploy/pull/3645
- [ci] change flash atten installation in pr test by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3688
- fix profile_throughput.py by @irexyc in https://github.com/InternLM/lmdeploy/pull/3692
- fix profile_generation.py by @irexyc in https://github.com/InternLM/lmdeploy/pull/3707
- update dlblas version in dockerfile by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3711
- bump version to v0.9.1 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3685
New Contributors
- @ywx217 made their first contribution in https://github.com/InternLM/lmdeploy/pull/3615
- @Risc-lt made their first contribution in https://github.com/InternLM/lmdeploy/pull/3620
Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.9.0...v0.9.1