Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
lmdeploy-0.8.0+cu118-cp38-cp38-manylinux2014_x86_64.whl | 2025-05-04 | 101.5 MB | |
lmdeploy-0.8.0+cu118-cp38-cp38-win_amd64.whl | 2025-05-04 | 25.8 MB | |
lmdeploy-0.8.0+cu118-cp39-cp39-manylinux2014_x86_64.whl | 2025-05-04 | 101.5 MB | |
lmdeploy-0.8.0+cu118-cp39-cp39-win_amd64.whl | 2025-05-04 | 25.8 MB | |
lmdeploy-0.8.0+cu118-cp310-cp310-manylinux2014_x86_64.whl | 2025-05-04 | 101.5 MB | |
lmdeploy-0.8.0+cu118-cp310-cp310-win_amd64.whl | 2025-05-04 | 25.8 MB | |
lmdeploy-0.8.0+cu118-cp311-cp311-manylinux2014_x86_64.whl | 2025-05-04 | 101.6 MB | |
lmdeploy-0.8.0+cu118-cp311-cp311-win_amd64.whl | 2025-05-04 | 25.8 MB | |
lmdeploy-0.8.0+cu118-cp312-cp312-manylinux2014_x86_64.whl | 2025-05-04 | 101.6 MB | |
lmdeploy-0.8.0+cu118-cp312-cp312-win_amd64.whl | 2025-05-04 | 25.8 MB | |
README.md | 2025-05-04 | 10.3 kB | |
v0.8.0 source code.tar.gz | 2025-05-04 | 1.2 MB | |
v0.8.0 source code.zip | 2025-05-04 | 1.8 MB | |
Totals: 13 Items | 639.8 MB | 0 |
What's Changed
🚀 Features
- Torch dp support by @grimoire in https://github.com/InternLM/lmdeploy/pull/3207
- Add deep gemm with tma pre allocated by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3287
- Add mixed DP + TP by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3229
- Add Qwen3 and Qwen3MoE by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3305
- [ascend] support multi nodes on ascend device by @tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/3260
- [Feature] support qwen3 and qwen3-moe for pytorch engine by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3315
- [ascend]support deepseekv2 by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3206
- add deepep by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3313
- support ascend w8a8 graph_mode by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3267
- support all2all ep by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3370
- optimize ep in decoding stage by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3383
- Warmup deepgemm by @grimoire in https://github.com/InternLM/lmdeploy/pull/3387
- support Llama4 by @grimoire in https://github.com/InternLM/lmdeploy/pull/3408
- add twomicrobatch support by @SHshenhao in https://github.com/InternLM/lmdeploy/pull/3381
- Support phi4 mini by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3467
- [Dlinfer][Ascend] support 310P by @JackWeiw in https://github.com/InternLM/lmdeploy/pull/3484
- support qwen3 fp8 by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3505
💥 Improvements
- Add spaces_between_special_tokens to /v1/interactive and make compatible with empty text by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3283
- add env var to control timeout by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3291
- refactor attn param by @irexyc in https://github.com/InternLM/lmdeploy/pull/3164
- Verbose log by @grimoire in https://github.com/InternLM/lmdeploy/pull/3329
- optimize mla, remove load
v
by @grimoire in https://github.com/InternLM/lmdeploy/pull/3334 - support dp decoding with cudagraph by @grimoire in https://github.com/InternLM/lmdeploy/pull/3311
- optimize quant-fp8 kernel by @grimoire in https://github.com/InternLM/lmdeploy/pull/3345
- refactor dlinfer rope by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3326
- enable qwenvl2.5 graph mode on ascend by @jinminxi104 in https://github.com/InternLM/lmdeploy/pull/3367
- Add AIOHTTP_TIMEOUT env var for proxy server by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3355
- disable sync batch on dp eager mode by @grimoire in https://github.com/InternLM/lmdeploy/pull/3382
- fix for deepgemm update by @grimoire in https://github.com/InternLM/lmdeploy/pull/3380
- Add string before hash tokens in blocktrie by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3386
- optimize moe get sorted idx by @grimoire in https://github.com/InternLM/lmdeploy/pull/3356
- use half/bf16 lm_head output by @irexyc in https://github.com/InternLM/lmdeploy/pull/3213
- remove ep eager check by @grimoire in https://github.com/InternLM/lmdeploy/pull/3392
- Optimize ascend moe by @yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3364
- optimize fp8 moe kernel by @grimoire in https://github.com/InternLM/lmdeploy/pull/3419
- ray async forward execute by @grimoire in https://github.com/InternLM/lmdeploy/pull/3443
- map internvl3 chat template to builtin chat template internvl2_5 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3450
- Refactor turbomind (low-level abstractions) by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3423
- remove barely used code to improve maintenance by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3462
- optimize sm80 long context by @grimoire in https://github.com/InternLM/lmdeploy/pull/3465
- move partial_json_parser from ’serve.txt‘ to ‘runtime.txt‘ by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3493
- support qwen3-dense models awq quantization by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3503
- Optimize MoE gate for Qwen3 by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3500
- Pass num_tokens_per_iter and max_prefill_iters params through in
lmdeploy serve api_server
by @josephrocca in https://github.com/InternLM/lmdeploy/pull/3504 - [Dlinfer][Ascend] Optimize performance of 310P device by @JackWeiw in https://github.com/InternLM/lmdeploy/pull/3486
- optimize longcontext decoding by @grimoire in https://github.com/InternLM/lmdeploy/pull/3510
- Support min_p in openai completions_v1 by @josephrocca in https://github.com/InternLM/lmdeploy/pull/3506
🐞 Bug fixes
- fix activation grid oversize by @grimoire in https://github.com/InternLM/lmdeploy/pull/3282
- Set ensure_ascii=False for tool calling by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3295
- fix sliding window multi chat by @grimoire in https://github.com/InternLM/lmdeploy/pull/3302
- add
v
check by @grimoire in https://github.com/InternLM/lmdeploy/pull/3307 - Fix Qwen3MoE config parsing by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3336
- Fix finish reasons by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3338
- remove think_end_token_id in streaming content by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3327
- Fix the finish_reason by @AllentDan in https://github.com/InternLM/lmdeploy/pull/3350
- set cmake policy minimum version as 3.5 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3376
- fix dp cudagraph by @grimoire in https://github.com/InternLM/lmdeploy/pull/3372
- fix flashmla eagermode by @grimoire in https://github.com/InternLM/lmdeploy/pull/3375
- close engine after each benchmark-generation iter by @grimoire in https://github.com/InternLM/lmdeploy/pull/3269
- [Fix] fix
image_token_id
error of qwen2-vl and deepseek by @ao-zz in https://github.com/InternLM/lmdeploy/pull/3358 - fix stopping criteria by @grimoire in https://github.com/InternLM/lmdeploy/pull/3384
- support List[dict] prompt input without do_preprocess by @irexyc in https://github.com/InternLM/lmdeploy/pull/3385
- add rayexecutor release timeout by @grimoire in https://github.com/InternLM/lmdeploy/pull/3403
- fix tensor dispatch in dynamo by @wanfengcxz in https://github.com/InternLM/lmdeploy/pull/3417
- fix linting error by upgrade to ubuntu-latest by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3442
- fix awq tp for pytorch engine by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3435
- fix mllm testcase fail by @caikun-pjlab in https://github.com/InternLM/lmdeploy/pull/3458
- remove paged attention autotune by @grimoire in https://github.com/InternLM/lmdeploy/pull/3452
- Remove empty prompts in benchmark scripts by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3460
- failed to end session properly by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3471
- fix qwen2.5-vl chat template by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3475
- Align forward arguments of deepgemm blockedf8 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3474
- fix turbomind lib missing to link nccl by exporting nccl path by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3479
- fix dsvl2 no attr config error by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3477
- fix flash attention crash on triton3.1.0 by @grimoire in https://github.com/InternLM/lmdeploy/pull/3478
- Fix disorder of ray execution by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/3481
- update dockerfile by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3482
- fix output logprobs by @irexyc in https://github.com/InternLM/lmdeploy/pull/3488
- Fix Qwen2MoE shared expert gate by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/3491
- fix replicate kv for qwen3-moe by @grimoire in https://github.com/InternLM/lmdeploy/pull/3499
- fix sampling if data overflow after temperature penalty by @irexyc in https://github.com/InternLM/lmdeploy/pull/3508
📚 Documentations
- update qwen2.5-vl-32b docs by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3446
🌐 Other
- bump version to v0.7.2.post1 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3298
- [ci] add think function testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3299
- merge dev into main by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3348
- [ci] add vl models into pipeline interface testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3374
- merge dev to main branch by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3378
- opt experts memory and permute by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3390
- Revert "opt experts memory and permute" by @zhaochaoxing in https://github.com/InternLM/lmdeploy/pull/3406
- merge dev to main by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3400
- add Hopper GPU dockerfile by @CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3415
- optimize internvit by @caikun-pjlab in https://github.com/InternLM/lmdeploy/pull/3433
- fix stop/bad words by @irexyc in https://github.com/InternLM/lmdeploy/pull/3492
- [ci] testcase bugfix and add more models into testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3463
- bump version to v0.8.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/3432
New Contributors
- @zhaochaoxing made their first contribution in https://github.com/InternLM/lmdeploy/pull/3313
- @ao-zz made their first contribution in https://github.com/InternLM/lmdeploy/pull/3358
- @wanfengcxz made their first contribution in https://github.com/InternLM/lmdeploy/pull/3417
- @SHshenhao made their first contribution in https://github.com/InternLM/lmdeploy/pull/3381
- @josephrocca made their first contribution in https://github.com/InternLM/lmdeploy/pull/3504
Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.7.2...v0.8.0