FastDeploy - Browse /v2.1.1 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-09-01	5.6 kB	0
v2.1.1 source code.tar.gz	2025-09-01	3.5 MB	0
v2.1.1 source code.zip	2025-09-01	4.4 MB	0
Totals: 3 Items		7.9 MB	0

文档

新增多机张量并行部署文档
文心系列模型最佳实践文档更新到最新用法
更新CUDA Graph使用说明

新增功能

返回结果新增completion_tokens与prompt_tokens，支持返回原始输入与模型原始输出文本
completion接口支持echo参数

Bug修复

修复V1 KVCache调度下LogProb无法返回问题
修复chat_template_kwargs参数无法生效问题
修复混合架构部署下的EP并行问题
修复completion接口返回结果中输出Token计数错误问题
修复logprobs返回结果聚合问题

What's Changed

[Docs] Add Multinode deployment document by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3416
[docs] cherry-pick update docs by @zoooo0820 in https://github.com/PaddlePaddle/FastDeploy/pull/3422
[Docs]update installation readme by @yongqiangma in https://github.com/PaddlePaddle/FastDeploy/pull/3435
[Docs] release 2.1 by @ming1753 in https://github.com/PaddlePaddle/FastDeploy/pull/3441
[Docs]Updata docs of graph opt backend by @gongshaotian in https://github.com/PaddlePaddle/FastDeploy/pull/3443
[Feature] Support logprob in scheduler v1 for release/2.1 by @rainyfly in https://github.com/PaddlePaddle/FastDeploy/pull/3446
[Bugfix]fix config bug in dynamic_weight_manager by @gzy19990617 in https://github.com/PaddlePaddle/FastDeploy/pull/3432
[Feature] Pass through the chat_template_kwargs to the data processing module by @luukunn in https://github.com/PaddlePaddle/FastDeploy/pull/3469
[CI] fix run_ci error in release/2.1 by @EmmonsCurse in https://github.com/PaddlePaddle/FastDeploy/pull/3499
[BugFix] fix ep real_bsz by @lizexu123 in https://github.com/PaddlePaddle/FastDeploy/pull/3396
[Feature] add prompt_tokens and completion_tokens by @memoryCoderC in https://github.com/PaddlePaddle/FastDeploy/pull/3505
[fix] setting disable_chat_template while passing prompt_token_ids led to response error by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/3511
[Excutor] Fixed the issue of CUDA graph execution failure caused by d… by @gongshaotian in https://github.com/PaddlePaddle/FastDeploy/pull/3512
[Feature] add tool parser by @luukunn in https://github.com/PaddlePaddle/FastDeploy/pull/3518
[BUGFIX] fix ep mixed bug by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3513
[BugFix] Api server bugs by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3530
[Feature] Support limit thinking len for text models by @K11OntheBoat in https://github.com/PaddlePaddle/FastDeploy/pull/3527
[Bug Fix] Close get think_end_id for XPU for now. by @K11OntheBoat in https://github.com/PaddlePaddle/FastDeploy/pull/3563
[Feature] Support mixed deployment with yiyan adapter by @rainyfly in https://github.com/PaddlePaddle/FastDeploy/pull/3533
[Cherry-Pick] Launch expert_service before kv_cache initialization in worker_process by @zeroRains in https://github.com/PaddlePaddle/FastDeploy/pull/3558
【BugFix】completion接口echo回显支持 by @AuferGachet in https://github.com/PaddlePaddle/FastDeploy/pull/3477
[fix] fix completion stream api output_tokens not in usage by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/3588
[fix] fix ZmqIpcClient.close() error by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/3600
[Bugfix] Correct logprobs aggregation for multiple prompts in /completions endpoint by @sunlei1024 in https://github.com/PaddlePaddle/FastDeploy/pull/3620
[BugFix] ep mixed mode offline exit failed by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3623
【Bugfix】修复2.1分支上0.3B模型性能大幅下降 by @AuferGachet in https://github.com/PaddlePaddle/FastDeploy/pull/3624
[CI] add cleanup logic in release/2.1 workflows by @EmmonsCurse in https://github.com/PaddlePaddle/FastDeploy/pull/3655
[BugFix] fix parameter is 0 by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3663
[fix] qwen output inconsistency when top_p=0 (#3634) by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/3662
Revert "[BugFix] fix parameter is 0" by @Jiang-Jia-Jun in https://github.com/PaddlePaddle/FastDeploy/pull/3681
[feat] add metrics for yiyan adapter by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/3615
[bugfix]PR3663 parameter is 0 by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3679
[BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHEDULER. by @lizexu123 in https://github.com/PaddlePaddle/FastDeploy/pull/3670
Revert "[BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHEDULER." by @Jiang-Jia-Jun in https://github.com/PaddlePaddle/FastDeploy/pull/3719
[Cherry-Pick] fix the bug when num_key_value_heads < tensor_parallel_size by @zeroRains in https://github.com/PaddlePaddle/FastDeploy/pull/3722
[Optimize] Increase zmq buffer size to prevent apiserver too slowly t… by @gongshaotian in https://github.com/PaddlePaddle/FastDeploy/pull/3728
[Fix] Do not drop result when request result slowly by @rainyfly in https://github.com/PaddlePaddle/FastDeploy/pull/3704
[Bug fix] Fix prefix cache in v1 by @rainyfly in https://github.com/PaddlePaddle/FastDeploy/pull/3710
[Bug fix] Fix mix deployment perf with yiyan adapter in release21 by @rainyfly in https://github.com/PaddlePaddle/FastDeploy/pull/3703

Full Changelog: https://github.com/PaddlePaddle/FastDeploy/compare/v2.1.0...v2.1.1

Source: README.md, updated 2025-09-01

FastDeploy Files

High-performance Inference and Deployment Toolkit for LLMs and VLMs

文档

新增功能

Bug修复

What's Changed

FastDeploy Files

High-performance Inference and Deployment Toolkit for LLMs and VLMs

Get an email when there's a new version of FastDeploy

文档

新增功能

Bug修复

What's Changed