| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| 0.20.20 source code.tar.gz | < 12 hours ago | 32.2 MB | |
| 0.20.20 source code.zip | < 12 hours ago | 33.2 MB | |
| README.md | < 12 hours ago | 10.9 kB | |
| Totals: 3 Items | 65.4 MB | 0 | |
Services
NVIDIA Dynamo
This update adds support for Prefill-Decode (PD) disaggregated inference with NVIDIA Dynamo.
Previously, dstack supported PD disaggregation only with Shepherd Model Gateway as the router and SGLang as the inference engine for workers. With this update, a replica group can declare router: { type: dynamo }, allowing workers to use inference engines such as SGLang, vLLM, or TensorRT-LLM.
:::yaml
type: service
name: dynamo-pd
env:
- HF_TOKEN
- MODEL_ID=zai-org/GLM-4.5-Air-FP8
replicas:
- count: 1
docker: true
commands:
- apt-get update
- apt-get install -y python3-dev python3-venv
- python3 -m venv ~/dyn-venv
- source ~/dyn-venv/bin/activate
- pip install -U pip
- pip install "ai-dynamo[sglang]==1.1.1"
- git clone https://github.com/ai-dynamo/dynamo.git
# Brings up the NATS / etcd compose stack and runs the Dynamo HTTP frontend.
- docker compose -f dynamo/deploy/docker-compose.yml up -d
- |
python3 -m dynamo.frontend \
--http-host 0.0.0.0 --http-port 8000 \
--discovery-backend etcd --router-mode kv \
--kv-cache-block-size 64
resources:
cpu: 4
router:
type: dynamo
- count: 1..4
scaling:
metric: rps
target: 3
python: "3.12"
nvcc: true
commands:
# dstack injects DSTACK_ROUTER_INTERNAL_IP after the router replica
# is provisioned. Compose the etcd/NATS endpoints from it.
- export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
- export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
# Set to enable /health endpoint required by dstack probes.
- export DYN_SYSTEM_PORT="8000"
# Wait until the router's etcd and NATS ports are actually accepting connections.
- |
until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
&& (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
done
- pip install "ai-dynamo[sglang]==1.1.1"
- |
python3 -m dynamo.sglang \
--model-path $MODEL_ID --served-model-name $MODEL_ID \
--discovery-backend etcd --host 0.0.0.0 \
--page-size 64 \
--disaggregation-mode prefill --disaggregation-transfer-backend nixl
resources:
gpu: H200
- count: 1..8
scaling:
metric: rps
target: 2
python: "3.12"
nvcc: true
commands:
- export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
- export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
- export DYN_SYSTEM_PORT="8000"
- |
until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
&& (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
done
- pip install "ai-dynamo[sglang]==1.1.1"
- |
python3 -m dynamo.sglang \
--model-path $MODEL_ID --served-model-name $MODEL_ID \
--discovery-backend etcd --host 0.0.0.0 \
--page-size 64 \
--disaggregation-mode decode --disaggregation-transfer-backend nixl
resources:
gpu: H200
port: 8000
model: zai-org/GLM-4.5-Air-FP8
# Custom probe is required for PD disaggregation.
probes:
- type: http
url: /health
interval: 15s
dstack provisions the router replica, injects DSTACK_ROUTER_INTERNAL_IP into non-router replicas, and lets Dynamo workers connect directly to the router’s etcd and NATS services.
Refer to the Dynamo example for full deployment instructions.
Replica groups
It's now possible to configure the image, docker, python, nvcc, and privileged properties at the replica group level. This enables complex multi-component services like NVIDIA Dynamo, where different replicas require different runtime environments.
Exports
Gateways
Gateways can now be exported and shared across projects, enabling centralized gateway management in multi-project setups.
:::shell
$ dstack export --project main create my-export --gateway shared-gateway --importer team
NAME FLEETS GATEWAYS IMPORTERS
my-export - shared-gateway team
Now, if you list gateways in the team project, you'll see the exported gateway:
:::shell
$ dstack gateway --project team
NAME BACKEND HOSTNAME DOMAIN DEFAULT STATUS
main/shared-gateway aws (eu-west-1) 108.131.126.35 gtw.mycompany.example running
Additionally, gateway domains now support optional project name interpolation using ${{ run.project_name }}, allowing different projects to use different domains on the same shared gateway.
:::yaml
type: gateway
name: shared-gateway
backend: aws
region: eu-west-1
domain: ${{ run.project_name }}.mycompany.example
Global exports
Users with global admin privileges can now export SSH fleets and gateways to all projects at once, enabling organization-wide resource sharing.
:::bash
$ dstack export create global-export --gateway shared-gateway --global
NAME FLEETS GATEWAYS IMPORTERS
global-export - shared-gateway *
AWS
EFA clusters
Previously, fleets that used EFA (Elastic Fabric Adapter) with multiple network interfaces required public_ips: False. With this release, dstack allows creating such fleets with public IPs. This simplifies the use of interconnected clusters on AWS by removing the need to run the dstack server and CLI inside a private VPC.
Kubernetes
Backend configuration
The namespace property of the kubernetes backend configuration is now formally deprecated. It still takes effect and remains the source of truth in this version, but future versions will read the namespace from the current kubeconfig context instead.
Migration guide
#### Migration guide - If `namespace` is unset or set to `default` in both the backend config and the kubeconfig, no action is required — `default` continues to be used. - If `namespace` is set to the same value (e.g. `ns-a`) in both the backend config and the kubeconfig, no action is required. - If `namespace` is set to `ns-a` in the backend config but the kubeconfig has a different value (or none), set the namespace to `ns-a` in your kubeconfig context to prepare for future versions. - It is only safe to remove `namespace` from the backend config if its value is `default`.What's changed
- [Services] Allow to specify
image,docker,python,nvcc,privilegedat replica group level by @Bihan in https://github.com/dstackai/dstack/pull/3832 - [Internal]: Delete some unused classes by @jvstme in https://github.com/dstackai/dstack/pull/3842
- [Internal] Fix
pyrightfailing in CI by @jvstme in https://github.com/dstackai/dstack/pull/3846 - [Internal] Update
RunpodApiClientby @un-def in https://github.com/dstackai/dstack/pull/3847 - [Internal] Fix
openaiSDK failing in tests by @jvstme in https://github.com/dstackai/dstack/pull/3849 - [RunPod] Handle deleting non-existent volume by @r4victor in https://github.com/dstackai/dstack/pull/3853
- [Runpod] Fix broken
registry_authsupport by @un-def in https://github.com/dstackai/dstack/pull/3844 - [UX] Raise
ImportErroron Python 3.14 or later by @r4victor in https://github.com/dstackai/dstack/pull/3855 - [Exports] Gateway support by @jvstme in https://github.com/dstackai/dstack/pull/3845
- [Internal] Rename
docs/tomkdocs/, move examples under/docs/, inline source by @peterschmidt85 in https://github.com/dstackai/dstack/pull/3859 - [Kubernetes] Deprecate
namespacein backend config by @un-def in https://github.com/dstackai/dstack/pull/3858 - [Gateways] Allow setting imported gateway as project default by @jvstme in https://github.com/dstackai/dstack/pull/3860
- [Internal] Forbid exporting the built-in
dstackSky gateway by @jvstme in https://github.com/dstackai/dstack/pull/3864 - [AWS] Support multi-EFA instances with public IPs by @r4victor in https://github.com/dstackai/dstack/pull/3865
- [Internal] Add server-side validation for fleet configuration subtypes by @un-def in https://github.com/dstackai/dstack/pull/3848
- [Verda] Optimize terminating Verda instances by @jvstme in https://github.com/dstackai/dstack/pull/3811
- [Internal] Introduce
GatewayModel.forbid_new_servicesby @jvstme in https://github.com/dstackai/dstack/pull/3863 - [Docs] Introduce CLI & API guide; rework the HTTP API reference page by @peterschmidt85 in https://github.com/dstackai/dstack/pull/3869
- [Internal] Add script to set up Kubernetes cluster for dstack backend by @un-def in https://github.com/dstackai/dstack/pull/3866
- Fix Pyright errors with
requests==2.34.0by @jvstme in https://github.com/dstackai/dstack/pull/3873 - Add project name interpolation in gateway domains by @jvstme in https://github.com/dstackai/dstack/pull/3870
- [Bugfix] Fix duplicate headers with in-server proxy by @jvstme in https://github.com/dstackai/dstack/pull/3872
- [Docs]: Gateway Exports by @jvstme in https://github.com/dstackai/dstack/pull/3862
- [Kubernetes] Fail fast if job pod was not scheduled by @un-def in https://github.com/dstackai/dstack/pull/3874
- [Exports] Global exports support by @jvstme in https://github.com/dstackai/dstack/pull/3879
- [Services] Support PD with NVIDIA Dynamo by @Bihan in https://github.com/dstackai/dstack/pull/3868
- [Internal] Update text regarding billing based on the project type by @peterschmidt85 in https://github.com/dstackai/dstack/pull/3876
- [Docs] Add NVIDIA Dynamo docs by @Bihan in https://github.com/dstackai/dstack/pull/3877
- [Internal] Fix unreleased
global_exportslock on Postgres by @jvstme in https://github.com/dstackai/dstack/pull/3882
Full changelog: https://github.com/dstackai/dstack/compare/0.20.19...0.20.20