GPUStack - Browse /v2.0.2 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
gpustack-2.0.2-py3-none-any.whl	2025-12-31	12.6 MB	1
README.md	2025-12-31	6.3 kB	0
v2.0.2 source code.tar.gz	2025-12-31	42.3 MB	1
v2.0.2 source code.zip	2025-12-31	42.7 MB	0
Totals: 4 Items		97.6 MB	2

Model Catalog Updates

Added: GLM-4.7, DeepSeek-V3.2, DeepSeek-V3.2-Special, DeepSeek-OCR, Z-Image-Turbo, Qwen-Image-Edit.
Tuned Model Deployments for Optimized Throughput:
- DeepSeek-V3.2: +57.0% token throughput on the ShareGPT dataset using H200 GPUs, and up to +153.6% in other scenarios. (Detailed report)
- GLM-4.7: +134.8% token throughput on the ShareGPT dataset using H200 GPUs, and up to +347.2% in other scenarios. (Detailed report)

Enhancements

Enhanced sorting and filtering in APIs and UI list pages. (Issues [#1348], [#2589])
Increased the password length limit. (Issue [#1367])
Added support for Model Instance Direct Access Mode. (Issue [#3772])
Added support for predefined worker configurations and external server URLs in cluster configuration. (Issues [#3775], [#3771])
Improved bootstrap health check timeout handling. (Issue [#3788])
Enhanced automatic worker IP address selection. (Issue [#3795])
Added support for shell-like split style in custom backend execution commands. (Issue [#3860])
Miscellaneous UX improvements. (Issues [#3757], [#3766], [#3824], [#3865], [#3866], [#3885])

Bug Fixes

Fixed an issue where CIDR in no_proxy did not work with port forwarding and adding workers to a cluster. (Issue [#1387])
Fixed compatibility issues with some models supported by MindIE. (Issue [#2016])
Fixed a bug where backend version updates did not require instance recreation to take effect. (Issue [#2574])
Fixed an issue where the current OIDC SSO implementation only supported login but not logout. (Issue [#2826])
Fixed the command length limit for custom backends. (Issue [#3555])
Fixed an issue preventing the use of the CosyVoice model in the Dify GPUStack plugin. (Issue [#3595])
Fixed DeepSeek-OCR deployment failures. (Issue [#3683])
Fixed the use of incorrect container image names. (Issue [#3689])
Fixed an issue where the system-default-container-registry configured on the server did not take effect when adding a worker. (Issue [#3737])
Fixed an issue where models could not start on 8GB VRAM GPUs. (Issue [#3745])
Fixed automatic scheduling for multi-GPU inference, which failed due to vocab_size not being divisible by tensor-parallel-size. (Issue [#3777])
Fixed a GPUStack Server startup failure with the log: "[INFO] gateway exited with code 1, shutting down all services...". (Issue [#3779])
Fixed restricted access when evaluating the GLM-4.6 model. (Issue [#3780])
Fixed an issue where the model search list incorrectly reported incompatibility if an empty cluster existed. (Issue [#3790])
Fixed an error when revoking model permissions while using MySQL. (Issue [#3796])
Fixed an incorrect HTTP response from the Completion API. (Issue [#3801])
Fixed an issue where the vLLM backend was not correctly filtered for Ascend 310P NPU, causing a "failed to get vLLM image" error. (Issue [#3802])
Fixed an incorrect ready status report for the Nvidia container toolkit check in CDI-enabled Docker environments. (Issue [#3808])
Fixed a failure when upgrading from v0.7.1 to v2.0.1 due to 'OLLAMA_LIBRARY' is not among the defined enum values. (Issue [#3809])
Fixed a model deployment error ('libcuda.so.1: cannot open shared object file: No such file or directory') when the server and worker used the host network on the same host. (Issue [#3810])
Fixed an abnormal format in the streaming output results when using MindIE on version 2.0.0. (Issue [#3826])
Fixed an issue where the default value of mem_fraction_static in SGLang did not take effect, causing model deployment failures. (Issue [#3831])
Fixed an issue where --max-model-len appeared twice in the vLLM serve command. (Issue [#3835])
Fixed a Gateway timeout error when accessing models in the playground. (Issue [#3846])
Fixed an error: ERROR - Failed to register worker: 1 validation error for WorkerStatusPublic. (Issue [#3849])
Fixed an issue where there was no default cluster when setting --token for the server. (Issue [#3855])
Fixed a failure to deploy models in a Kubernetes cluster when using Persistent Volumes (PV) instead of hostPath for storage. (Issue [#3876])
Fixed an error showing "API key not allowed to access model". (Issue [#3897])
Fixed an incorrect framework name in the inference backend version list. (Issue [#3906])
Fixed an issue where image generation parameters did not work. (Issue [#3911])
Fixed an issue where models could not be stopped in some cases. (Issue [#3936])
Fixed an issue where downloaded model logs in Kubernetes only contained GPUStack logs, missing instance logs. (Issue [#3938])
Fixed an issue where the message returned when testing the API with curl was missing the "data" tag. (Issue [#3950])
Fixed a startup crash when there was no internet connection. (Issue [#3972])
Fixed a failure to scale down model instances in some cases. (Issue [#3988])
Fixed an issue where the load balancer routed traffic to model instances that were not ready. (Issue [#4010])
Fixed an incorrect time display on the time axis of the system load chart on the monitoring panel. (Issue [#4014])

Built-in Inference Backend Updates

New Additions

Added SGLang 0.5.6.post2 for CANN 8.3 (910B/A3), CUDA 12.9/12.8/12.6, and ROCm 7.0/6.4.
Added vLLM 0.13.0 for CUDA 12.9/12.8/12.6 and ROCm 7.0/6.4.
Added vLLM 0.12.0 for CANN 8.3 (910B/A3), CUDA 12.9/12.8/12.6, and ROCm 7.0/6.4.
Added VoxBox 0.0.21 for CUDA 12.8/12.6.