LMDeploy Files

LMDeploy is a toolkit for compressing, deploying, and serving LLMs

This is an exact mirror of the LMDeploy project, hosted at https://github.com/InternLM/lmdeploy. SourceForge is not affiliated with LMDeploy.

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2026-05-12	5.8 kB	0
v0.13.0 source code.tar.gz	2026-05-12	1.7 MB	0
v0.13.0 source code.zip	2026-05-12	2.5 MB	0
Totals: 3 Items		4.2 MB	0

What's Changed

🚀 Features

[Ascend] support qwen3.5 35BA3B by @wanfengcxz in https://github.com/InternLM/lmdeploy/pull/4485
feat: Add TurboQuant (quant_policy=42) support for KV Cache Quantization by @windreamer in https://github.com/InternLM/lmdeploy/pull/4510
[refactor] [api_server] [2/N] improve tool parsers by abstracting xml parser by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4548
feat(turbomind): integrate cublasGemmGroupedBatchedEx for Qwen3.5 MoE inference on Blackwell GPUs with memory copy optimizations by @hd9568 in https://github.com/InternLM/lmdeploy/pull/4490
feat: add Anthropic-compatible serving endpoints by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4538
Support InternS2 Preview by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4575

💥 Improvements

lmdeploy support kernel block size by @Tsundoku958 in https://github.com/InternLM/lmdeploy/pull/4421
Reject requests on stale session or sleeping engine by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4496
Add modern logging utils by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/4486
refine dlinfer update_weights by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/4519
feat(serve): expose repetition n-gram params on OpenAI routes by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4522
Refactor step inputs by @grimoire in https://github.com/InternLM/lmdeploy/pull/4504
fix lite module for transformers>=5.0 by @43758726 in https://github.com/InternLM/lmdeploy/pull/4488
[refactor] [api_server] [1/N] Improve reasoning and tool-call parsers by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4468
fix: prevent prefill starvation under high decode load by @grimoire in https://github.com/InternLM/lmdeploy/pull/4532
Mixed modality by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4531
optimize get_sorted_idx in moe by @grimoire in https://github.com/InternLM/lmdeploy/pull/4529
Map user-input session_id to internal session_id to maintain session identity by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4523
support more message item types by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4501
add explicit trust_remote_code controls to resolve the security issue by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4511

🐞 Bug fixes

[ascend] fix prefix caching by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/4448
fix update params by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/4514
fix ray mem leak by @grimoire in https://github.com/InternLM/lmdeploy/pull/4487
Fix mtp by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4517
fix kernel-block-size by @grimoire in https://github.com/InternLM/lmdeploy/pull/4521
fix: use is not None check for seed to prevent seed=0 being silently ignored by @kuishou68 in https://github.com/InternLM/lmdeploy/pull/4526
Fix qwen35 dp by @grimoire in https://github.com/InternLM/lmdeploy/pull/4535
Fix mtp for rl by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4520
cancel request and block new inputs when sleeping by @grimoire in https://github.com/InternLM/lmdeploy/pull/4541
Fix mp engine by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4540
Fix cache sizing and cache block layout edge cases by @grimoire in https://github.com/InternLM/lmdeploy/pull/4552
Fix qwen3.5-moe mtp with tp>1 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4568
block_offsets padding 0 by @grimoire in https://github.com/InternLM/lmdeploy/pull/4569
hotfix: resolve test issues for v0.13.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4571
ResponseParser forget to strip <think> tag in non-stream mode by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4576
yield error when prompt processing suffers exception by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4574
Fix the reprefill of evicted seqs with invalid draft tokens by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4564
Support mtp fp8 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/4572

🌐 Other

Use env LMDEPLOY_FP32_MAMBA_SSM_DTYPE to control the dtype of recurrent state by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4518
add tool and reasoning test by @littlegy in https://github.com/InternLM/lmdeploy/pull/4388
update h config and add glm4.7 mtp test by @littlegy in https://github.com/InternLM/lmdeploy/pull/4424
[ci] change test whl into python 312 and use test images by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/4513
[Misc] fix typos in turbomind.py and model.py by @ZhijunLStudio in https://github.com/InternLM/lmdeploy/pull/4543
[Misc] fix mutable default arguments by @ZhijunLStudio in https://github.com/InternLM/lmdeploy/pull/4544
Add docker/Dockerfile_patch; minor tweaks in messages.py and setup.py. by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4546
remove barely used skills and checkin docker-build skill by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4560
bump version to v0.13.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/4549

New Contributors

@kuishou68 made their first contribution in https://github.com/InternLM/lmdeploy/pull/4526
@ZhijunLStudio made their first contribution in https://github.com/InternLM/lmdeploy/pull/4543
@hd9568 made their first contribution in https://github.com/InternLM/lmdeploy/pull/4490

Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.12.3...v0.13.0

Source: README.md, updated 2026-05-12

Other Useful Business Software

Go From AI Idea to AI App Fast Icon

Go From AI Idea to AI App Fast

One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free

$300 in Free Credit Towards Top Cloud Services Icon

$300 in Free Credit Towards Top Cloud Services

Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started

Go From AI Idea to AI App Fast

One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free

Recommended Projects

FlashInfer
FlashInfer: Kernel Library for LLM Serving
optillm
Optimizing inference proxy for LLMs
LoRAX
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
OpenLLM
Operating LLMs in production
BentoML
Unified Model Serving Framework