本地Qwen2-72B-Instruct-GPTQ-Int8模型，stream=true情况下无返回

Brought to you by: xjfkkk

#1571 本地Qwen2-72B-Instruct-GPTQ-Int8模型，stream=true情况下无返回

Status: open

Owner: nobody

Labels: bug (44)

Updated: 2024-06-27

Created: 2024-06-27

Creator: Anonymous

Private: No

Originally created by: ludevica

例行检查

我已确认目前没有类似 issue
我已确认我已升级到最新版本
我已完整查看过项目 README，尤其是常见问题部分
我理解并愿意跟进此 issue，协助测试和提供反馈
我理解并认可上述内容，并理解项目维护者精力有限，不遵循规则的 issue 可能会被无视或直接关闭

问题描述
本地Qwen2-72B-Instruct-GPTQ-Int8模型，stream=true情况下无返回，stream=false能正常返回。
换gpt-3.5-tubo，两种场景都可以返回。
复现步骤
oneapi配置的gpt-3.5-turbo
curl --location '10.81.1.66:3001/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Accept: text/event-stream' \
--header 'Authorization: Bearer sk-dyjZYJ8xdzcFPp8y5597E57eA5354a808bE82dC4D1982515' \
--data '{
"model": "gpt-3.5-turbo",
"temperature": 1,
"max_tokens": 512,
"stream": true,
"messages":
{
"role": "user",
"content": "1+98等于几"
}

}'

oneapi配置的qwen2
curl --location '10.81.1.66:3001/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Accept: text/event-stream' \
--header 'Authorization: Bearer sk-dyjZYJ8xdzcFPp8y5597E57eA5354a808bE82dC4D1982515' \
--data '{
"model": "qwen2-72b-local",
"stream": true,
"messages":
{
"role": "user",
"content": "1+98等于几"
}

}'

预期结果
都能流式正常返回
相关截图

上图是不通过oneapi，直接访问模型，能正常流式输出，结果如下：

上图：通过onenapi，流式访问本地qwen模型，无返回内容