Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
0.19.31 source code.tar.gz | 2025-10-02 | 31.6 MB | |
0.19.31 source code.zip | 2025-10-02 | 32.4 MB | |
README.md | 2025-10-02 | 4.0 kB | |
Totals: 3 Items | 63.9 MB | 0 |
Kubernetes
The kubernetes
backend introduces many significant improvements and has now graduated from alpha to beta. It is much more stable and can be reliably used on GPU clusters for all kinds of workloads, including distributed tasks.
Here's what changed:
- Resource allocation now fully respects the user’s
resources
specification. Previously, it ignored certain aspects, especially the proper selection of GPU labels according to the specifiedgpu
spec. - Distributed tasks now fully work on Kubernetes clusters with fast interconnect enabled. Previously, this caused many issues.
- Added support
privileged
.
We’ve also published a dedicated guide on how to get started with dstack
on Kubernetes, highlighting important nuances.
[!WARNING] Be aware of breaking changes if you used the
kubernetes
backend before. The following properties in the Kubernetes backend configuration have been renamed:
networking
→proxy_jump
ssh_host
→hostname
ssh_port
→port
Additionally, the "proxy jump" pod and service names now include a
dstack-
prefix.
GCP
A4 spot instances with B200 GPUs
The gcp
backend now supports A4 spot instances equipped with B200 GPUs. This includes provisioning both standalone A4 instances and A4 clusters with high-performance RoCE networking.
To use A4 clusters with high-performance networking, you must configure multiple VPCs in your backend settings (~/.dstack/server/config.yml
):
:::yaml
projects:
- name: main
backends:
- type: gcp
project_id: my-project
creds:
type: default
vpc_name: my-vpc-0 # regular, 1 subnet
extra_vpcs:
- my-vpc-1 # regular, 1 subnet
roce_vpcs:
- my-vpc-mrdma # RoCE profile, 8 subnets
Then, provision a cluster using a fleet configuration:
:::yaml
type: fleet
nodes: 2
placement: cluster
availability_zones: [us-west2-c]
backends: [gcp]
spot_policy: spot
resources:
gpu: B200:8
Each instance in the cluster will have 10 network interfaces: 1 regular interface in the main VPC, 1 regular interface in the extra VPC, and 8 RDMA interfaces in the RoCE VPC.
[!NOTE] Currently, the gcp backend only supports A4 spot instances. Support for other options, such as flex and calendar scheduling via Dynamic Workload Scheduler, is coming soon.
CLI
dstack project
is now faster
The USER
column in dstack project list
is now shown only when the --verbose
flag is used.
This significantly improves performance for users with many configured projects, reducing execution time from ~20 seconds to as little as 2 seconds in some cases.
What's changed
- [Kubernetes] Request resources according to
RequirementsSpec
by @un-def in https://github.com/dstackai/dstack/pull/3127 - [GCP] Support A4 spot instances with the B200 GPU by @jvstme in https://github.com/dstackai/dstack/pull/3100
- [CLI] Move
USER
todstack project list --verbose
by @jvstme in https://github.com/dstackai/dstack/pull/3134 - [Kubernetes] Configure
/dev/shm
if requested by @un-def in https://github.com/dstackai/dstack/pull/3135 - [Backward incompatible] Rename properties in Kubernetes backend config by @un-def in https://github.com/dstackai/dstack/pull/3137
- Support GCP A4 clusters by @jvstme in https://github.com/dstackai/dstack/pull/3142
- Kubernetes: add multi-node support by @un-def in https://github.com/dstackai/dstack/pull/3141
- Fix duplicate server log messages by @jvstme in https://github.com/dstackai/dstack/pull/3143
- [Docs] Improve Kubernetes documentation by @peterschmidt85 in https://github.com/dstackai/dstack/pull/3138
Full changelog: https://github.com/dstackai/dstack/compare/0.19.30...0.19.31