| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| gpustack-2.0.2-py3-none-any.whl | 2025-12-31 | 12.6 MB | |
| README.md | 2025-12-31 | 6.3 kB | |
| v2.0.2 source code.tar.gz | 2025-12-31 | 42.3 MB | |
| v2.0.2 source code.zip | 2025-12-31 | 42.7 MB | |
| Totals: 4 Items | 97.6 MB | 2 | |
Model Catalog Updates
- Added: GLM-4.7, DeepSeek-V3.2, DeepSeek-V3.2-Special, DeepSeek-OCR, Z-Image-Turbo, Qwen-Image-Edit.
- Tuned Model Deployments for Optimized Throughput:
- DeepSeek-V3.2: +57.0% token throughput on the ShareGPT dataset using H200 GPUs, and up to +153.6% in other scenarios. (Detailed report)
- GLM-4.7: +134.8% token throughput on the ShareGPT dataset using H200 GPUs, and up to +347.2% in other scenarios. (Detailed report)
Enhancements
- Enhanced sorting and filtering in APIs and UI list pages. (Issues [#1348], [#2589])
- Increased the password length limit. (Issue [#1367])
- Added support for Model Instance Direct Access Mode. (Issue [#3772])
- Added support for predefined worker configurations and external server URLs in cluster configuration. (Issues [#3775], [#3771])
- Improved bootstrap health check timeout handling. (Issue [#3788])
- Enhanced automatic worker IP address selection. (Issue [#3795])
- Added support for shell-like split style in custom backend execution commands. (Issue [#3860])
- Miscellaneous UX improvements. (Issues [#3757], [#3766], [#3824], [#3865], [#3866], [#3885])
Bug Fixes
- Fixed an issue where CIDR in
no_proxydid not work with port forwarding and adding workers to a cluster. (Issue [#1387]) - Fixed compatibility issues with some models supported by MindIE. (Issue [#2016])
- Fixed a bug where backend version updates did not require instance recreation to take effect. (Issue [#2574])
- Fixed an issue where the current OIDC SSO implementation only supported login but not logout. (Issue [#2826])
- Fixed the command length limit for custom backends. (Issue [#3555])
- Fixed an issue preventing the use of the CosyVoice model in the Dify GPUStack plugin. (Issue [#3595])
- Fixed DeepSeek-OCR deployment failures. (Issue [#3683])
- Fixed the use of incorrect container image names. (Issue [#3689])
- Fixed an issue where the
system-default-container-registryconfigured on the server did not take effect when adding a worker. (Issue [#3737]) - Fixed an issue where models could not start on 8GB VRAM GPUs. (Issue [#3745])
- Fixed automatic scheduling for multi-GPU inference, which failed due to
vocab_sizenot being divisible bytensor-parallel-size. (Issue [#3777]) - Fixed a GPUStack Server startup failure with the log:
"[INFO] gateway exited with code 1, shutting down all services...". (Issue [#3779]) - Fixed restricted access when evaluating the GLM-4.6 model. (Issue [#3780])
- Fixed an issue where the model search list incorrectly reported incompatibility if an empty cluster existed. (Issue [#3790])
- Fixed an error when revoking model permissions while using MySQL. (Issue [#3796])
- Fixed an incorrect HTTP response from the Completion API. (Issue [#3801])
- Fixed an issue where the vLLM backend was not correctly filtered for Ascend 310P NPU, causing a "failed to get vLLM image" error. (Issue [#3802])
- Fixed an incorrect ready status report for the Nvidia container toolkit check in CDI-enabled Docker environments. (Issue [#3808])
- Fixed a failure when upgrading from v0.7.1 to v2.0.1 due to
'OLLAMA_LIBRARY' is not among the defined enum values. (Issue [#3809]) - Fixed a model deployment error (
'libcuda.so.1: cannot open shared object file: No such file or directory') when the server and worker used the host network on the same host. (Issue [#3810]) - Fixed an abnormal format in the streaming output results when using MindIE on version 2.0.0. (Issue [#3826])
- Fixed an issue where the default value of
mem_fraction_staticin SGLang did not take effect, causing model deployment failures. (Issue [#3831]) - Fixed an issue where
--max-model-lenappeared twice in the vLLM serve command. (Issue [#3835]) - Fixed a Gateway timeout error when accessing models in the playground. (Issue [#3846])
- Fixed an error:
ERROR - Failed to register worker: 1 validation error for WorkerStatusPublic. (Issue [#3849]) - Fixed an issue where there was no default cluster when setting
--tokenfor the server. (Issue [#3855]) - Fixed a failure to deploy models in a Kubernetes cluster when using Persistent Volumes (PV) instead of
hostPathfor storage. (Issue [#3876]) - Fixed an error showing "API key not allowed to access model". (Issue [#3897])
- Fixed an incorrect framework name in the inference backend version list. (Issue [#3906])
- Fixed an issue where image generation parameters did not work. (Issue [#3911])
- Fixed an issue where models could not be stopped in some cases. (Issue [#3936])
- Fixed an issue where downloaded model logs in Kubernetes only contained GPUStack logs, missing instance logs. (Issue [#3938])
- Fixed an issue where the message returned when testing the API with curl was missing the
"data"tag. (Issue [#3950]) - Fixed a startup crash when there was no internet connection. (Issue [#3972])
- Fixed a failure to scale down model instances in some cases. (Issue [#3988])
- Fixed an issue where the load balancer routed traffic to model instances that were not ready. (Issue [#4010])
- Fixed an incorrect time display on the time axis of the system load chart on the monitoring panel. (Issue [#4014])
Built-in Inference Backend Updates
New Additions
- Added
SGLang 0.5.6.post2for CANN 8.3 (910B/A3), CUDA 12.9/12.8/12.6, and ROCm 7.0/6.4. - Added
vLLM 0.13.0for CUDA 12.9/12.8/12.6 and ROCm 7.0/6.4. - Added
vLLM 0.12.0for CANN 8.3 (910B/A3), CUDA 12.9/12.8/12.6, and ROCm 7.0/6.4. - Added
VoxBox 0.0.21for CUDA 12.8/12.6.
Updates
Please force-pull to update existing runner images.
- Updated
MindIE 2.2.rc1/2.1.rc2for CANN 8.2/8.3. - Updated
vLLM 0.11.0for CANN 8.3 (910B/A3). - Updated
vLLM 0.11.2for CUDA 12.9/12.8/12.6 and ROCm 7.0/6.4. - Updated
SGLang 0.5.5.post3for ROCm 6.4.
Removals
- Removed
vLLM 0.11.0support for CANN 8.2 (910B/A3) as the vLLM Ascend plugin released a stable versionv0.11.0for CANN 8.3.