Download Latest Version v2.1.0 source code.tar.gz (46.0 MB)
Email in envelope

Get an email when there's a new version of GPUStack

Home / v2.0.2
Name Modified Size InfoDownloads / Week
Parent folder
gpustack-2.0.2-py3-none-any.whl 2025-12-31 12.6 MB
README.md 2025-12-31 6.3 kB
v2.0.2 source code.tar.gz 2025-12-31 42.3 MB
v2.0.2 source code.zip 2025-12-31 42.7 MB
Totals: 4 Items   97.6 MB 2

Model Catalog Updates

  • Added: GLM-4.7, DeepSeek-V3.2, DeepSeek-V3.2-Special, DeepSeek-OCR, Z-Image-Turbo, Qwen-Image-Edit.
  • Tuned Model Deployments for Optimized Throughput:
    • DeepSeek-V3.2: +57.0% token throughput on the ShareGPT dataset using H200 GPUs, and up to +153.6% in other scenarios. (Detailed report)
    • GLM-4.7: +134.8% token throughput on the ShareGPT dataset using H200 GPUs, and up to +347.2% in other scenarios. (Detailed report)

Enhancements

  • Enhanced sorting and filtering in APIs and UI list pages. (Issues [#1348], [#2589])
  • Increased the password length limit. (Issue [#1367])
  • Added support for Model Instance Direct Access Mode. (Issue [#3772])
  • Added support for predefined worker configurations and external server URLs in cluster configuration. (Issues [#3775], [#3771])
  • Improved bootstrap health check timeout handling. (Issue [#3788])
  • Enhanced automatic worker IP address selection. (Issue [#3795])
  • Added support for shell-like split style in custom backend execution commands. (Issue [#3860])
  • Miscellaneous UX improvements. (Issues [#3757], [#3766], [#3824], [#3865], [#3866], [#3885])

Bug Fixes

  • Fixed an issue where CIDR in no_proxy did not work with port forwarding and adding workers to a cluster. (Issue [#1387])
  • Fixed compatibility issues with some models supported by MindIE. (Issue [#2016])
  • Fixed a bug where backend version updates did not require instance recreation to take effect. (Issue [#2574])
  • Fixed an issue where the current OIDC SSO implementation only supported login but not logout. (Issue [#2826])
  • Fixed the command length limit for custom backends. (Issue [#3555])
  • Fixed an issue preventing the use of the CosyVoice model in the Dify GPUStack plugin. (Issue [#3595])
  • Fixed DeepSeek-OCR deployment failures. (Issue [#3683])
  • Fixed the use of incorrect container image names. (Issue [#3689])
  • Fixed an issue where the system-default-container-registry configured on the server did not take effect when adding a worker. (Issue [#3737])
  • Fixed an issue where models could not start on 8GB VRAM GPUs. (Issue [#3745])
  • Fixed automatic scheduling for multi-GPU inference, which failed due to vocab_size not being divisible by tensor-parallel-size. (Issue [#3777])
  • Fixed a GPUStack Server startup failure with the log: "[INFO] gateway exited with code 1, shutting down all services...". (Issue [#3779])
  • Fixed restricted access when evaluating the GLM-4.6 model. (Issue [#3780])
  • Fixed an issue where the model search list incorrectly reported incompatibility if an empty cluster existed. (Issue [#3790])
  • Fixed an error when revoking model permissions while using MySQL. (Issue [#3796])
  • Fixed an incorrect HTTP response from the Completion API. (Issue [#3801])
  • Fixed an issue where the vLLM backend was not correctly filtered for Ascend 310P NPU, causing a "failed to get vLLM image" error. (Issue [#3802])
  • Fixed an incorrect ready status report for the Nvidia container toolkit check in CDI-enabled Docker environments. (Issue [#3808])
  • Fixed a failure when upgrading from v0.7.1 to v2.0.1 due to 'OLLAMA_LIBRARY' is not among the defined enum values. (Issue [#3809])
  • Fixed a model deployment error ('libcuda.so.1: cannot open shared object file: No such file or directory') when the server and worker used the host network on the same host. (Issue [#3810])
  • Fixed an abnormal format in the streaming output results when using MindIE on version 2.0.0. (Issue [#3826])
  • Fixed an issue where the default value of mem_fraction_static in SGLang did not take effect, causing model deployment failures. (Issue [#3831])
  • Fixed an issue where --max-model-len appeared twice in the vLLM serve command. (Issue [#3835])
  • Fixed a Gateway timeout error when accessing models in the playground. (Issue [#3846])
  • Fixed an error: ERROR - Failed to register worker: 1 validation error for WorkerStatusPublic. (Issue [#3849])
  • Fixed an issue where there was no default cluster when setting --token for the server. (Issue [#3855])
  • Fixed a failure to deploy models in a Kubernetes cluster when using Persistent Volumes (PV) instead of hostPath for storage. (Issue [#3876])
  • Fixed an error showing "API key not allowed to access model". (Issue [#3897])
  • Fixed an incorrect framework name in the inference backend version list. (Issue [#3906])
  • Fixed an issue where image generation parameters did not work. (Issue [#3911])
  • Fixed an issue where models could not be stopped in some cases. (Issue [#3936])
  • Fixed an issue where downloaded model logs in Kubernetes only contained GPUStack logs, missing instance logs. (Issue [#3938])
  • Fixed an issue where the message returned when testing the API with curl was missing the "data" tag. (Issue [#3950])
  • Fixed a startup crash when there was no internet connection. (Issue [#3972])
  • Fixed a failure to scale down model instances in some cases. (Issue [#3988])
  • Fixed an issue where the load balancer routed traffic to model instances that were not ready. (Issue [#4010])
  • Fixed an incorrect time display on the time axis of the system load chart on the monitoring panel. (Issue [#4014])

Built-in Inference Backend Updates

New Additions

  • Added SGLang 0.5.6.post2 for CANN 8.3 (910B/A3), CUDA 12.9/12.8/12.6, and ROCm 7.0/6.4.
  • Added vLLM 0.13.0 for CUDA 12.9/12.8/12.6 and ROCm 7.0/6.4.
  • Added vLLM 0.12.0 for CANN 8.3 (910B/A3), CUDA 12.9/12.8/12.6, and ROCm 7.0/6.4.
  • Added VoxBox 0.0.21 for CUDA 12.8/12.6.

Updates

Please force-pull to update existing runner images.

  • Updated MindIE 2.2.rc1/2.1.rc2 for CANN 8.2/8.3.
  • Updated vLLM 0.11.0 for CANN 8.3 (910B/A3).
  • Updated vLLM 0.11.2 for CUDA 12.9/12.8/12.6 and ROCm 7.0/6.4.
  • Updated SGLang 0.5.5.post3 for ROCm 6.4.

Removals

Source: README.md, updated 2025-12-31