| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2025-09-01 | 5.6 kB | |
| v2.1.1 source code.tar.gz | 2025-09-01 | 3.5 MB | |
| v2.1.1 source code.zip | 2025-09-01 | 4.4 MB | |
| Totals: 3 Items | 7.9 MB | 0 | |
文档
- 新增多机张量并行部署文档
- 文心系列模型最佳实践文档更新到最新用法
- 更新CUDA Graph使用说明
新增功能
- 返回结果新增
completion_tokens与prompt_tokens,支持返回原始输入与模型原始输出文本 - completion接口支持
echo参数
Bug修复
- 修复V1 KVCache调度下LogProb无法返回问题
- 修复
chat_template_kwargs参数无法生效问题 - 修复混合架构部署下的EP并行问题
- 修复completion接口返回结果中输出Token计数错误问题
- 修复logprobs返回结果聚合问题
What's Changed
- [Docs] Add Multinode deployment document by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3416
- [docs] cherry-pick update docs by @zoooo0820 in https://github.com/PaddlePaddle/FastDeploy/pull/3422
- [Docs]update installation readme by @yongqiangma in https://github.com/PaddlePaddle/FastDeploy/pull/3435
- [Docs] release 2.1 by @ming1753 in https://github.com/PaddlePaddle/FastDeploy/pull/3441
- [Docs]Updata docs of graph opt backend by @gongshaotian in https://github.com/PaddlePaddle/FastDeploy/pull/3443
- [Feature] Support logprob in scheduler v1 for release/2.1 by @rainyfly in https://github.com/PaddlePaddle/FastDeploy/pull/3446
- [Bugfix]fix config bug in dynamic_weight_manager by @gzy19990617 in https://github.com/PaddlePaddle/FastDeploy/pull/3432
- [Feature] Pass through the chat_template_kwargs to the data processing module by @luukunn in https://github.com/PaddlePaddle/FastDeploy/pull/3469
- [CI] fix run_ci error in release/2.1 by @EmmonsCurse in https://github.com/PaddlePaddle/FastDeploy/pull/3499
- [BugFix] fix ep real_bsz by @lizexu123 in https://github.com/PaddlePaddle/FastDeploy/pull/3396
- [Feature] add prompt_tokens and completion_tokens by @memoryCoderC in https://github.com/PaddlePaddle/FastDeploy/pull/3505
- [fix] setting disable_chat_template while passing prompt_token_ids led to response error by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/3511
- [Excutor] Fixed the issue of CUDA graph execution failure caused by d… by @gongshaotian in https://github.com/PaddlePaddle/FastDeploy/pull/3512
- [Feature] add tool parser by @luukunn in https://github.com/PaddlePaddle/FastDeploy/pull/3518
- [BUGFIX] fix ep mixed bug by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3513
- [BugFix] Api server bugs by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3530
- [Feature] Support limit thinking len for text models by @K11OntheBoat in https://github.com/PaddlePaddle/FastDeploy/pull/3527
- [Bug Fix] Close get think_end_id for XPU for now. by @K11OntheBoat in https://github.com/PaddlePaddle/FastDeploy/pull/3563
- [Feature] Support mixed deployment with yiyan adapter by @rainyfly in https://github.com/PaddlePaddle/FastDeploy/pull/3533
- [Cherry-Pick] Launch expert_service before kv_cache initialization in worker_process by @zeroRains in https://github.com/PaddlePaddle/FastDeploy/pull/3558
- 【BugFix】completion接口echo回显支持 by @AuferGachet in https://github.com/PaddlePaddle/FastDeploy/pull/3477
- [fix] fix completion stream api output_tokens not in usage by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/3588
- [fix] fix ZmqIpcClient.close() error by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/3600
- [Bugfix] Correct logprobs aggregation for multiple prompts in /completions endpoint by @sunlei1024 in https://github.com/PaddlePaddle/FastDeploy/pull/3620
- [BugFix] ep mixed mode offline exit failed by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3623
- 【Bugfix】修复2.1分支上0.3B模型性能大幅下降 by @AuferGachet in https://github.com/PaddlePaddle/FastDeploy/pull/3624
- [CI] add cleanup logic in release/2.1 workflows by @EmmonsCurse in https://github.com/PaddlePaddle/FastDeploy/pull/3655
- [BugFix] fix parameter is 0 by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3663
- [fix] qwen output inconsistency when top_p=0 (#3634) by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/3662
- Revert "[BugFix] fix parameter is 0" by @Jiang-Jia-Jun in https://github.com/PaddlePaddle/FastDeploy/pull/3681
- [feat] add metrics for yiyan adapter by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/3615
- [bugfix]PR3663 parameter is 0 by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3679
- [BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHEDULER. by @lizexu123 in https://github.com/PaddlePaddle/FastDeploy/pull/3670
- Revert "[BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHEDULER." by @Jiang-Jia-Jun in https://github.com/PaddlePaddle/FastDeploy/pull/3719
- [Cherry-Pick] fix the bug when num_key_value_heads < tensor_parallel_size by @zeroRains in https://github.com/PaddlePaddle/FastDeploy/pull/3722
- [Optimize] Increase zmq buffer size to prevent apiserver too slowly t… by @gongshaotian in https://github.com/PaddlePaddle/FastDeploy/pull/3728
- [Fix] Do not drop result when request result slowly by @rainyfly in https://github.com/PaddlePaddle/FastDeploy/pull/3704
- [Bug fix] Fix prefix cache in v1 by @rainyfly in https://github.com/PaddlePaddle/FastDeploy/pull/3710
- [Bug fix] Fix mix deployment perf with yiyan adapter in release21 by @rainyfly in https://github.com/PaddlePaddle/FastDeploy/pull/3703
Full Changelog: https://github.com/PaddlePaddle/FastDeploy/compare/v2.1.0...v2.1.1