LMDeploy - Browse /v0.8.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
lmdeploy-0.8.0+cu118-cp38-cp38-manylinux2014_x86_64.whl	2025-05-04	101.5 MB	0
lmdeploy-0.8.0+cu118-cp38-cp38-win_amd64.whl	2025-05-04	25.8 MB	0
lmdeploy-0.8.0+cu118-cp39-cp39-manylinux2014_x86_64.whl	2025-05-04	101.5 MB	0
lmdeploy-0.8.0+cu118-cp39-cp39-win_amd64.whl	2025-05-04	25.8 MB	0
lmdeploy-0.8.0+cu118-cp310-cp310-manylinux2014_x86_64.whl	2025-05-04	101.5 MB	0
lmdeploy-0.8.0+cu118-cp310-cp310-win_amd64.whl	2025-05-04	25.8 MB	0
lmdeploy-0.8.0+cu118-cp311-cp311-manylinux2014_x86_64.whl	2025-05-04	101.6 MB	0
lmdeploy-0.8.0+cu118-cp311-cp311-win_amd64.whl	2025-05-04	25.8 MB	0
lmdeploy-0.8.0+cu118-cp312-cp312-manylinux2014_x86_64.whl	2025-05-04	101.6 MB	0
lmdeploy-0.8.0+cu118-cp312-cp312-win_amd64.whl	2025-05-04	25.8 MB	0
README.md	2025-05-04	10.3 kB	0
v0.8.0 source code.tar.gz	2025-05-04	1.2 MB	0
v0.8.0 source code.zip	2025-05-04	1.8 MB	0
Totals: 13 Items		639.8 MB	0

What's Changed

🚀 Features

Torch dp support by @grimoire in https://github.com/InternLM/lmdeploy/pull/3207
Add deep gemm with tma pre allocated by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3287
Add mixed DP + TP by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3229
Add Qwen3 and Qwen3MoE by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3305
[ascend] support multi nodes on ascend device by @tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/3260
[Feature] support qwen3 and qwen3-moe for pytorch engine by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3315
[ascend]support deepseekv2 by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3206
add deepep by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3313
support ascend w8a8 graph_mode by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3267
support all2all ep by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3370
optimize ep in decoding stage by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3383
Warmup deepgemm by @grimoire in https://github.com/InternLM/lmdeploy/pull/3387
support Llama4 by @grimoire in https://github.com/InternLM/lmdeploy/pull/3408
add twomicrobatch support by @SHshenhao in https://github.com/InternLM/lmdeploy/pull/3381
Support phi4 mini by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3467
[Dlinfer][Ascend] support 310P by @JackWeiw in https://github.com/InternLM/lmdeploy/pull/3484
support qwen3 fp8 by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3505

💥 Improvements

Add spaces_between_special_tokens to /v1/interactive and make compatible with empty text by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3283
add env var to control timeout by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3291
refactor attn param by @irexyc in https://github.com/InternLM/lmdeploy/pull/3164
Verbose log by @grimoire in https://github.com/InternLM/lmdeploy/pull/3329
optimize mla, remove load v by @grimoire in https://github.com/InternLM/lmdeploy/pull/3334
support dp decoding with cudagraph by @grimoire in https://github.com/InternLM/lmdeploy/pull/3311
optimize quant-fp8 kernel by @grimoire in https://github.com/InternLM/lmdeploy/pull/3345
refactor dlinfer rope by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3326
enable qwenvl2.5 graph mode on ascend by @jinminxi104 in https://github.com/InternLM/lmdeploy/pull/3367
Add AIOHTTP_TIMEOUT env var for proxy server by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3355
disable sync batch on dp eager mode by @grimoire in https://github.com/InternLM/lmdeploy/pull/3382
fix for deepgemm update by @grimoire in https://github.com/InternLM/lmdeploy/pull/3380
Add string before hash tokens in blocktrie by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3386
optimize moe get sorted idx by @grimoire in https://github.com/InternLM/lmdeploy/pull/3356
use half/bf16 lm_head output by @irexyc in https://github.com/InternLM/lmdeploy/pull/3213
remove ep eager check by @grimoire in https://github.com/InternLM/lmdeploy/pull/3392
Optimize ascend moe by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3364
optimize fp8 moe kernel by @grimoire in https://github.com/InternLM/lmdeploy/pull/3419
ray async forward execute by @grimoire in https://github.com/InternLM/lmdeploy/pull/3443
map internvl3 chat template to builtin chat template internvl2_5 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3450
Refactor turbomind (low-level abstractions) by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3423
remove barely used code to improve maintenance by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3462
optimize sm80 long context by @grimoire in https://github.com/InternLM/lmdeploy/pull/3465
move partial_json_parser from ’serve.txt‘ to ‘runtime.txt‘ by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3493
support qwen3-dense models awq quantization by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3503
Optimize MoE gate for Qwen3 by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3500
Pass num_tokens_per_iter and max_prefill_iters params through in lmdeploy serve api_server by @josephrocca in https://github.com/InternLM/lmdeploy/pull/3504
[Dlinfer][Ascend] Optimize performance of 310P device by @JackWeiw in https://github.com/InternLM/lmdeploy/pull/3486
optimize longcontext decoding by @grimoire in https://github.com/InternLM/lmdeploy/pull/3510
Support min_p in openai completions_v1 by @josephrocca in https://github.com/InternLM/lmdeploy/pull/3506

🐞 Bug fixes

fix activation grid oversize by @grimoire in https://github.com/InternLM/lmdeploy/pull/3282
Set ensure_ascii=False for tool calling by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3295
fix sliding window multi chat by @grimoire in https://github.com/InternLM/lmdeploy/pull/3302
add v check by @grimoire in https://github.com/InternLM/lmdeploy/pull/3307
Fix Qwen3MoE config parsing by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3336
Fix finish reasons by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3338
remove think_end_token_id in streaming content by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3327
Fix the finish_reason by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3350
set cmake policy minimum version as 3.5 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3376
fix dp cudagraph by @grimoire in https://github.com/InternLM/lmdeploy/pull/3372
fix flashmla eagermode by @grimoire in https://github.com/InternLM/lmdeploy/pull/3375
close engine after each benchmark-generation iter by @grimoire in https://github.com/InternLM/lmdeploy/pull/3269
[Fix] fix image_token_id error of qwen2-vl and deepseek by @ao-zz in https://github.com/InternLM/lmdeploy/pull/3358
fix stopping criteria by @grimoire in https://github.com/InternLM/lmdeploy/pull/3384
support List[dict] prompt input without do_preprocess by @irexyc in https://github.com/InternLM/lmdeploy/pull/3385
add rayexecutor release timeout by @grimoire in https://github.com/InternLM/lmdeploy/pull/3403
fix tensor dispatch in dynamo by @wanfengcxz in https://github.com/InternLM/lmdeploy/pull/3417
fix linting error by upgrade to ubuntu-latest by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3442
fix awq tp for pytorch engine by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3435
fix mllm testcase fail by @caikun-pjlab in https://github.com/InternLM/lmdeploy/pull/3458
remove paged attention autotune by @grimoire in https://github.com/InternLM/lmdeploy/pull/3452
Remove empty prompts in benchmark scripts by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3460
failed to end session properly by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3471
fix qwen2.5-vl chat template by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3475
Align forward arguments of deepgemm blockedf8 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3474
fix turbomind lib missing to link nccl by exporting nccl path by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3479
fix dsvl2 no attr config error by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3477
fix flash attention crash on triton3.1.0 by @grimoire in https://github.com/InternLM/lmdeploy/pull/3478
Fix disorder of ray execution by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3481
update dockerfile by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3482
fix output logprobs by @irexyc in https://github.com/InternLM/lmdeploy/pull/3488
Fix Qwen2MoE shared expert gate by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3491
fix replicate kv for qwen3-moe by @grimoire in https://github.com/InternLM/lmdeploy/pull/3499
fix sampling if data overflow after temperature penalty by @irexyc in https://github.com/InternLM/lmdeploy/pull/3508

📚 Documentations

update qwen2.5-vl-32b docs by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3446

🌐 Other

bump version to v0.7.2.post1 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3298
[ci] add think function testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3299
merge dev into main by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3348
[ci] add vl models into pipeline interface testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3374
merge dev to main branch by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3378
opt experts memory and permute by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3390
Revert "opt experts memory and permute" by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3406
merge dev to main by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3400
add Hopper GPU dockerfile by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3415
optimize internvit by @caikun-pjlab in https://github.com/InternLM/lmdeploy/pull/3433
fix stop/bad words by @irexyc in https://github.com/InternLM/lmdeploy/pull/3492
[ci] testcase bugfix and add more models into testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3463
bump version to v0.8.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3432

New Contributors

@zhaochaoxing made their first contribution in https://github.com/InternLM/lmdeploy/pull/3313
@ao-zz made their first contribution in https://github.com/InternLM/lmdeploy/pull/3358
@wanfengcxz made their first contribution in https://github.com/InternLM/lmdeploy/pull/3417
@SHshenhao made their first contribution in https://github.com/InternLM/lmdeploy/pull/3381
@josephrocca made their first contribution in https://github.com/InternLM/lmdeploy/pull/3504

Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.7.2...v0.8.0

Source: README.md, updated 2025-05-04

LMDeploy Files

LMDeploy is a toolkit for compressing, deploying, and serving LLMs

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

LMDeploy Files

LMDeploy is a toolkit for compressing, deploying, and serving LLMs

Get an email when there's a new version of LMDeploy

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors