Download Latest Version 0.20.19 source code.tar.gz (32.2 MB)
Email in envelope

Get an email when there's a new version of dstack

Home / 0.20.17
Name Modified Size InfoDownloads / Week
Parent folder
0.20.17 source code.tar.gz 2026-04-16 32.2 MB
0.20.17 source code.zip 2026-04-16 33.2 MB
README.md 2026-04-16 10.8 kB
Totals: 3 Items   65.3 MB 0

PD disaggregation

This update simplifies running SGLang with Prefill-Decode disaggregation.

Previously, PD disaggregation required configuring router on the gateway, which meant the gateway had to run in the same cluster as the service to communicate with service replicas.

With this update, router is configured on a service replica group instead. This allows using a standard gateway outside the service cluster.

Below is an example service configuration for running zai-org/GLM-4.5-Air-FP8 using replica groups:

:::yaml
type: service
name: prefill-decode
image: lmsysorg/sglang:latest

env:

  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

replicas:

  - count: 1
    commands:
      - pip install sglang_router
      - |
        python -m sglang_router.launch_router \
          --host 0.0.0.0 \
          --port 8000 \
          --pd-disaggregation \
          --prefill-policy cache_aware
    router:
      type: sglang
    resources:
      cpu: 4


  - count: 1..4
    scaling:
      metric: rps
      target: 3
    commands:
      - |
        python -m sglang.launch_server \
          --model-path $MODEL_ID \
          --disaggregation-mode prefill \
          --disaggregation-transfer-backend nixl \
          --host 0.0.0.0 \
          --port 8000 \
          --disaggregation-bootstrap-port 8998
    resources:
      gpu: H200


  - count: 1..8
    scaling:
      metric: rps
      target: 2
    commands:
      - |
        python -m sglang.launch_server \
          --model-path $MODEL_ID \
          --disaggregation-mode decode \
          --disaggregation-transfer-backend nixl \
          --host 0.0.0.0 \
          --port 8000
    resources:
      gpu: H200

port: 8000
model: zai-org/GLM-4.5-Air-FP8

# Custom probe is required for PD disaggregation.
probes:

  - type: http
    url: /health
    interval: 15s

Note: this setup requires the service fleet or cluster to provide a CPU node for the router replica.

Kubernetes

The kubernetes backend adds support for both network and instance volumes.

Network volumes

You can either create a new network volume or register an existing one. To create a new network volume, specify size and optionally storage_class_name and/or access_modes:

:::yaml
type: volume
backend: kubernetes
name: my-volume

size: 100GB

This automatically creates a PersistentVolumeClaim and associates it with the volume.

If you don't specify storage_class_name, the decision is delegated to the DefaultStorageClass admission controller, if enabled.

If you don't specify access_modes, it defaults to [ReadWriteOnce]. To attach volumes to multiple runs at the same time, set it to [ReadWriteMany] or [ReadWriteMany, ReadOnlyMany].

To reuse an existing PersistentVolumeClaim, specify its name in claim_name:

:::yaml
type: volume
backend: kubernetes
name: my-volume

claim_name: existing-pvc

Once a volume configuration is applied, you can attach it to your runs via volumes:

:::yaml
type: dev-environment
name: vscode-vol

ide: vscode

volumes:

  - name: my-volume
    path: /volume_data

Instance volumes

In addition to network volumes, the kubernetes backend now supports instance volumes:

:::yaml
type: dev-environment
name: vscode-vol

ide: vscode

volumes:

  - instance_path: /mnt/volume
    path: /volume_data

Unlike network volumes, which persist across instances, instance volumes persist data only within a particular instance. They are useful for storing caches or when you manually mount a shared filesystem into the instance path.

Note: using volumes with the kubernetes backend requires the corresponding permissions.

Performance

Fetching backend offers for the first time has been optimized and is now much faster. As a result, dstack apply, dstack offer, and the offers UI are all more responsive. Here are the improvements for some of the major backends:

- aws — 41.43s => 6.61s (6.3x)
- azure — 12.49s => 5.50s (2.3x)
- gcp — 13.51s => 5.20s (2.6x)
- nebius — 10.74s => 3.80s (2.8x)
- runpod — 9.36s => 0.09s (104x)
- verda — 9.49s => 2.33s (4.1x)

Fleets

In-place update

Backend fleets now support initial in-place updates. You can update nodes, reservation, tags, resources, backends, regions, availability_zones, instance_types, spot_policy, and max_price without re-creating the entire fleet. If existing idle instances do not match the updated configuration, dstack replaces them.

Default resources

Fleets used to have default resources set to cpu=2.. mem=8GB.. disk=100GB.. when left unspecified. This meant any offers with fewer resources were excluded from such fleets. If you wanted to run on a mem=4GB VM, you had to specify resources in both the run and fleet configurations.

Now fleets have no default resources, so all offers are available by default. If you need to add extra constraints on which offers can be provisioned in a fleet, specify resources explicitly.

Run configurations continue to have default minimum resources set to cpu=2.. mem=8GB.. disk=100GB.. to avoid provisioning instances that are too small.

Offers

The dstack offer CLI command now supports the --fleet argument, which allows you to see only offers from the specified fleets.

:::shell
dstack offer --fleet my-fleet --fleet another-project/other-fleet

The same is now supported in the UI on both the Offers and Launch pages.

Exports

Importers can now delete an import via dstack import delete <export-project>/<export-name>. This is useful when an export was created by the exporter, but the importer no longer needs it and does not want to wait until the exporter deletes it.

AWS

RTX Pro 6000

The aws backend adds support for g7e.* instances offering RTXPRO6000 GPUs.

Docker

Default Docker registry

If you'd like to cache Docker images through your own Docker registry, you can now configure it when starting the dstack server:

:::bash
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY=<registry base hostname>
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY_USERNAME=<registry username>
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY_PASSWORD=<registry password>

These settings should only be used for registries that act as a pull-through cache for Docker Hub. This is useful if you would like to avoid rate limits when you have too many image pulls.

What's changed

Full changelog: https://github.com/dstackai/dstack/compare/0.20.16...0.20.17

Source: README.md, updated 2026-04-16