| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| 0.20.17 source code.tar.gz | 2026-04-16 | 32.2 MB | |
| 0.20.17 source code.zip | 2026-04-16 | 33.2 MB | |
| README.md | 2026-04-16 | 10.8 kB | |
| Totals: 3 Items | 65.3 MB | 0 | |
PD disaggregation
This update simplifies running SGLang with Prefill-Decode disaggregation.
Previously, PD disaggregation required configuring router on the gateway, which meant
the gateway had to run in the same cluster as the service to communicate with service
replicas.
With this update, router is configured on a service replica group instead. This allows
using a standard gateway outside the service cluster.
Below is an example service configuration for running zai-org/GLM-4.5-Air-FP8 using replica groups:
:::yaml
type: service
name: prefill-decode
image: lmsysorg/sglang:latest
env:
- HF_TOKEN
- MODEL_ID=zai-org/GLM-4.5-Air-FP8
replicas:
- count: 1
commands:
- pip install sglang_router
- |
python -m sglang_router.launch_router \
--host 0.0.0.0 \
--port 8000 \
--pd-disaggregation \
--prefill-policy cache_aware
router:
type: sglang
resources:
cpu: 4
- count: 1..4
scaling:
metric: rps
target: 3
commands:
- |
python -m sglang.launch_server \
--model-path $MODEL_ID \
--disaggregation-mode prefill \
--disaggregation-transfer-backend nixl \
--host 0.0.0.0 \
--port 8000 \
--disaggregation-bootstrap-port 8998
resources:
gpu: H200
- count: 1..8
scaling:
metric: rps
target: 2
commands:
- |
python -m sglang.launch_server \
--model-path $MODEL_ID \
--disaggregation-mode decode \
--disaggregation-transfer-backend nixl \
--host 0.0.0.0 \
--port 8000
resources:
gpu: H200
port: 8000
model: zai-org/GLM-4.5-Air-FP8
# Custom probe is required for PD disaggregation.
probes:
- type: http
url: /health
interval: 15s
Note: this setup requires the service fleet or cluster to provide a CPU node for the router replica.
Kubernetes
The kubernetes backend adds support for both network and instance volumes.
Network volumes
You can either create a new network volume or register an existing one. To create a new
network volume, specify size and optionally storage_class_name and/or
access_modes:
:::yaml
type: volume
backend: kubernetes
name: my-volume
size: 100GB
This automatically creates a PersistentVolumeClaim and associates it with the volume.
If you don't specify
storage_class_name, the decision is delegated to theDefaultStorageClassadmission controller, if enabled.If you don't specify
access_modes, it defaults to[ReadWriteOnce]. To attach volumes to multiple runs at the same time, set it to[ReadWriteMany]or[ReadWriteMany, ReadOnlyMany].
To reuse an existing PersistentVolumeClaim, specify its name in claim_name:
:::yaml
type: volume
backend: kubernetes
name: my-volume
claim_name: existing-pvc
Once a volume configuration is applied, you can attach it to your runs via volumes:
:::yaml
type: dev-environment
name: vscode-vol
ide: vscode
volumes:
- name: my-volume
path: /volume_data
Instance volumes
In addition to network volumes, the kubernetes backend now supports instance volumes:
:::yaml
type: dev-environment
name: vscode-vol
ide: vscode
volumes:
- instance_path: /mnt/volume
path: /volume_data
Unlike network volumes, which persist across instances, instance volumes persist data only within a particular instance. They are useful for storing caches or when you manually mount a shared filesystem into the instance path.
Note: using volumes with the
kubernetesbackend requires the corresponding permissions.
Performance
Fetching backend offers for the first time has been optimized and is now much faster. As
a result, dstack apply, dstack offer, and the offers UI are all more responsive.
Here are the improvements for some of the major backends:
- aws — 41.43s => 6.61s (6.3x)
- azure — 12.49s => 5.50s (2.3x)
- gcp — 13.51s => 5.20s (2.6x)
- nebius — 10.74s => 3.80s (2.8x)
- runpod — 9.36s => 0.09s (104x)
- verda — 9.49s => 2.33s (4.1x)
Fleets
In-place update
Backend fleets now support initial in-place updates. You can update nodes,
reservation, tags, resources, backends, regions, availability_zones,
instance_types, spot_policy, and max_price without re-creating the entire fleet.
If existing idle instances do not match the updated configuration, dstack replaces
them.
Default resources
Fleets used to have default resources set to cpu=2.. mem=8GB.. disk=100GB.. when
left unspecified. This meant any offers with fewer resources were excluded from such
fleets. If you wanted to run on a mem=4GB VM, you had to specify resources in both
the run and fleet configurations.
Now fleets have no default resources, so all offers are available by default. If you
need to add extra constraints on which offers can be provisioned in a fleet, specify
resources explicitly.
Run configurations continue to have default minimum resources set to
cpu=2.. mem=8GB.. disk=100GB.. to avoid provisioning instances that are too small.
Offers
The dstack offer CLI command now supports the --fleet argument, which allows you to
see only offers from the specified fleets.
:::shell
dstack offer --fleet my-fleet --fleet another-project/other-fleet
The same is now supported in the UI on both the Offers and Launch pages.
Exports
Importers can now delete an import via
dstack import delete <export-project>/<export-name>. This is useful when an export
was created by the exporter, but the importer no longer needs it and does not want to
wait until the exporter deletes it.
AWS
RTX Pro 6000
The aws backend adds support for g7e.* instances offering RTXPRO6000 GPUs.
Docker
Default Docker registry
If you'd like to cache Docker images through your own Docker registry, you can now
configure it when starting the dstack server:
:::bash
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY=<registry base hostname>
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY_USERNAME=<registry username>
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY_PASSWORD=<registry password>
These settings should only be used for registries that act as a pull-through cache for Docker Hub. This is useful if you would like to avoid rate limits when you have too many image pulls.
What's changed
- Drop deprecated scheduled tasks by @r4victor in https://github.com/dstackai/dstack/pull/3749
- [Docs]: Rename REST API -> HTTP API by @jvstme in https://github.com/dstackai/dstack/pull/3748
- Rework runner job submission flow by @un-def in https://github.com/dstackai/dstack/pull/3743
- Default Docker registry and credentials by @jvstme in https://github.com/dstackai/dstack/pull/3747
- Detect Verda provisioning errors earlier by @jvstme in https://github.com/dstackai/dstack/pull/3753
- Optimize Python DB tests by @r4victor in https://github.com/dstackai/dstack/pull/3755
- Add case study on Graphsignal's use of dstack for inference benchmarking by @peterschmidt85 in https://github.com/dstackai/dstack/pull/3751
- Allow combining on/off idle_duration between runs and fleets by @r4victor in https://github.com/dstackai/dstack/pull/3756
- Fix no offers retry for scheduled runs by @r4victor in https://github.com/dstackai/dstack/pull/3759
- Support dynamic run waiting CLI status with extra renderables by @r4victor in https://github.com/dstackai/dstack/pull/3760
- Kubernetes: add instance volumes support by @un-def in https://github.com/dstackai/dstack/pull/3758
- Init gateways in background by @r4victor in https://github.com/dstackai/dstack/pull/3762
- Store source backend config by @r4victor in https://github.com/dstackai/dstack/pull/3764
- Show offers in dstack apply for elastic container fleets by @peterschmidt85 in https://github.com/dstackai/dstack/pull/3754
- Support cloud fleet in-place update by @r4victor in https://github.com/dstackai/dstack/pull/3766
- Set up HTTP ALB listener for ACM gateway by @r4victor in https://github.com/dstackai/dstack/pull/3767
- Evict jobs if instance is no longer imported by @jvstme in https://github.com/dstackai/dstack/pull/3772
- Implement cloud fleet in-place update for provisioning fields by @r4victor in https://github.com/dstackai/dstack/pull/3775
- Drop fleet default min resources by @r4victor in https://github.com/dstackai/dstack/pull/3776
- Support --fleet in dstack offer by @peterschmidt85 in https://github.com/dstackai/dstack/pull/3774
- Support imported fleets in
dstack fleet getby @jvstme in https://github.com/dstackai/dstack/pull/3773 - Limit fleet consolidation attempts by @r4victor in https://github.com/dstackai/dstack/pull/3777
- [Docs]: Examples cleanup and installation updates by @peterschmidt85 in https://github.com/dstackai/dstack/pull/3765
- Support AWS G7e (
RTXPRO6000) instances by @jvstme in https://github.com/dstackai/dstack/pull/3752 - Support imported fleets in
dstack eventby @jvstme in https://github.com/dstackai/dstack/pull/3779 - Drop autocreated fleets by @r4victor in https://github.com/dstackai/dstack/pull/3782
- Support fleet filters in the Offers and Launch UI by @peterschmidt85 in https://github.com/dstackai/dstack/pull/3780
- Support router as replica with pipelines by @Bihan in https://github.com/dstackai/dstack/pull/3721
- Pre-load offers catalog by @r4victor in https://github.com/dstackai/dstack/pull/3785
- Parallelize get_project_backends_with_models by @r4victor in https://github.com/dstackai/dstack/pull/3787
- Kubernetes: add support for volumes by @un-def in https://github.com/dstackai/dstack/pull/3781
- Allow project admins to delete imports by @jvstme in https://github.com/dstackai/dstack/pull/3783
- Skip best fleet search for dstack offer by @peterschmidt85 in https://github.com/dstackai/dstack/pull/3788
- Disable go-integration-tests for release by @r4victor in https://github.com/dstackai/dstack/pull/3791
Full changelog: https://github.com/dstackai/dstack/compare/0.20.16...0.20.17