| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2026-05-12 | 5.8 kB | |
| v0.13.0 source code.tar.gz | 2026-05-12 | 1.7 MB | |
| v0.13.0 source code.zip | 2026-05-12 | 2.5 MB | |
| Totals: 3 Items | 4.2 MB | 0 | |
What's Changed
๐ Features
- [Ascend] support qwen3.5 35BA3B by @wanfengcxz in https://github.com/InternLM/lmdeploy/pull/4485
- feat: Add TurboQuant (quant_policy=42) support for KV Cache Quantization by @windreamer in https://github.com/InternLM/lmdeploy/pull/4510
- [refactor] [api_server] [2/N] improve tool parsers by abstracting xml parser by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4548
- feat(turbomind): integrate cublasGemmGroupedBatchedEx for Qwen3.5 MoE inference on Blackwell GPUs with memory copy optimizations by @hd9568 in https://github.com/InternLM/lmdeploy/pull/4490
- feat: add Anthropic-compatible serving endpoints by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4538
- Support InternS2 Preview by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4575
๐ฅ Improvements
- lmdeploy support kernel block size by @Tsundoku958 in https://github.com/InternLM/lmdeploy/pull/4421
- Reject requests on stale session or sleeping engine by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4496
- Add modern logging utils by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4486
- refine dlinfer update_weights by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/4519
- feat(serve): expose repetition n-gram params on OpenAI routes by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4522
- Refactor step inputs by @grimoire in https://github.com/InternLM/lmdeploy/pull/4504
- fix lite module for transformers>=5.0 by @43758726 in https://github.com/InternLM/lmdeploy/pull/4488
- [refactor] [api_server] [1/N] Improve reasoning and tool-call parsers by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4468
- fix: prevent prefill starvation under high decode load by @grimoire in https://github.com/InternLM/lmdeploy/pull/4532
- Mixed modality by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4531
- optimize get_sorted_idx in moe by @grimoire in https://github.com/InternLM/lmdeploy/pull/4529
- Map user-input session_id to internal session_id to maintain session identity by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4523
- support more message item types by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4501
- add explicit trust_remote_code controls to resolve the security issue by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4511
๐ Bug fixes
- [ascend] fix prefix caching by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/4448
- fix update params by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4514
- fix ray mem leak by @grimoire in https://github.com/InternLM/lmdeploy/pull/4487
- Fix mtp by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4517
- fix kernel-block-size by @grimoire in https://github.com/InternLM/lmdeploy/pull/4521
- fix: use
is not Nonecheck for seed to prevent seed=0 being silently ignored by @kuishou68 in https://github.com/InternLM/lmdeploy/pull/4526 - Fix qwen35 dp by @grimoire in https://github.com/InternLM/lmdeploy/pull/4535
- Fix mtp for rl by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4520
- cancel request and block new inputs when sleeping by @grimoire in https://github.com/InternLM/lmdeploy/pull/4541
- Fix mp engine by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4540
- Fix cache sizing and cache block layout edge cases by @grimoire in https://github.com/InternLM/lmdeploy/pull/4552
- Fix qwen3.5-moe mtp with tp>1 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4568
- block_offsets padding 0 by @grimoire in https://github.com/InternLM/lmdeploy/pull/4569
- hotfix: resolve test issues for v0.13.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4571
- ResponseParser forget to strip <think> tag in non-stream mode by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4576
- yield error when prompt processing suffers exception by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4574
- Fix the reprefill of evicted seqs with invalid draft tokens by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4564
- Support mtp fp8 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4572
๐ Other
- Use env LMDEPLOY_FP32_MAMBA_SSM_DTYPE to control the dtype of recurrent state by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4518
- add tool and reasoning test by @littlegy in https://github.com/InternLM/lmdeploy/pull/4388
- update h config and add glm4.7 mtp test by @littlegy in https://github.com/InternLM/lmdeploy/pull/4424
- [ci] change test whl into python 312 and use test images by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/4513
- [Misc] fix typos in turbomind.py and model.py by @ZhijunLStudio in https://github.com/InternLM/lmdeploy/pull/4543
- [Misc] fix mutable default arguments by @ZhijunLStudio in https://github.com/InternLM/lmdeploy/pull/4544
- Add docker/Dockerfile_patch; minor tweaks in messages.py and setup.py. by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4546
- remove barely used skills and checkin docker-build skill by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4560
- bump version to v0.13.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4549
New Contributors
- @kuishou68 made their first contribution in https://github.com/InternLM/lmdeploy/pull/4526
- @ZhijunLStudio made their first contribution in https://github.com/InternLM/lmdeploy/pull/4543
- @hd9568 made their first contribution in https://github.com/InternLM/lmdeploy/pull/4490
Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.12.3...v0.13.0