Skip to main content

Set up Kubernetes agents

Dagster+ Hybrid runs a lightweight agent in your Kubernetes cluster. The agent polls Dagster+ for work, launches run pods, and streams logs back — your code never leaves your infrastructure.

Each environment gets its own agent installed in its own namespace, with values files that reflect the environment's resource requirements.

Step 1: Add the Helm chart repository

helm repo add dagster-cloud https://dagster-io.github.io/helm-user-cloud
helm repo update

Step 2: Create agent tokens

Each agent authenticates with an API token scoped to its deployment. Create a token for each environment in the Dagster+ UI under Organization Settings → Tokens, then store each token as a Kubernetes secret:

kubectl create namespace dagster-dev
kubectl create secret generic dagster-cloud-agent-token \
--from-literal=DAGSTER_CLOUD_AGENT_TOKEN=<dev-token> \
-n dagster-dev

kubectl create namespace dagster-staging
kubectl create secret generic dagster-cloud-agent-token \
--from-literal=DAGSTER_CLOUD_AGENT_TOKEN=<staging-token> \
-n dagster-staging

kubectl create namespace dagster-prod
kubectl create secret generic dagster-cloud-agent-token \
--from-literal=DAGSTER_CLOUD_AGENT_TOKEN=<prod-token> \
-n dagster-prod

Step 3: Create environment secrets for user code

The Helm values files reference a per-environment Kubernetes secret (dagster-dev-env, dagster-staging-env, dagster-prod-env) that injects connection strings into run pods. Create these secrets now, substituting your actual connection details:

kubectl create secret generic dagster-dev-env \
-n dagster-dev \
--from-literal=SOURCE_DATABASE_URL='postgres://dev-host/mydb' \
--from-literal=WAREHOUSE_URL='snowflake://dev/warehouse'

kubectl create secret generic dagster-staging-env \
-n dagster-staging \
--from-literal=SOURCE_DATABASE_URL='postgres://staging-host/mydb' \
--from-literal=WAREHOUSE_URL='snowflake://staging/warehouse'

kubectl create secret generic dagster-prod-env \
-n dagster-prod \
--from-literal=SOURCE_DATABASE_URL='postgres://prod-host/mydb' \
--from-literal=WAREHOUSE_URL='snowflake://prod/warehouse'

These values are automatically available as environment variables in every run pod. The assets read them with os.getenv("SOURCE_DATABASE_URL") and os.getenv("WAREHOUSE_URL"), so each environment targets its own data sources without any code changes.

Step 4: Configure each environment

The Helm values files capture the differences between environments. Common things to tune per environment: replica count, run resource limits, server TTL, branch deployment support, and which node pool to schedule on.

Dev

The dev agent runs a single replica, reclaims idle servers quickly, and serves branch deployments:

helm/dagster-agent/values-dev.yaml
dagsterCloud:
organization: "your-org"
deployment: "dev"
apiToken:
secretName: dagster-cloud-agent-token
secretKey: DAGSTER_CLOUD_AGENT_TOKEN
branchDeployments: true

workspace:
serverTTL:
enabled: true
ttlSeconds: 7200

dagsterCloudAgent:
replicas: 1
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"

runLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
envSecrets:
- name: dagster-dev-env
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
nodeSelector:
cloud.google.com/gke-nodepool: "dev-pool"

serviceAccount:
create: true
name: dagster-agent-dev
annotations: {}

Staging

Staging mirrors prod behavior but with lighter resources. Branch deployments are disabled — only code merged to staging is tested here:

helm/dagster-agent/values-staging.yaml
dagsterCloud:
organization: "your-org"
deployment: "staging"
apiToken:
secretName: dagster-cloud-agent-token
secretKey: DAGSTER_CLOUD_AGENT_TOKEN
branchDeployments: false

workspace:
serverTTL:
enabled: true
ttlSeconds: 14400

dagsterCloudAgent:
replicas: 1
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"

runLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
envSecrets:
- name: dagster-staging-env
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
nodeSelector:
cloud.google.com/gke-nodepool: "staging-pool"

serviceAccount:
create: true
name: dagster-agent-staging
annotations: {}

Prod

Prod runs two agent replicas for high availability. The server TTL is long (24 h) to keep code servers warm, and a Pod Disruption Budget (PDB) prevents the agent from being evicted during node maintenance:

helm/dagster-agent/values-prod.yaml
dagsterCloud:
organization: "your-org"
deployment: "prod"
apiToken:
secretName: dagster-cloud-agent-token
secretKey: DAGSTER_CLOUD_AGENT_TOKEN
branchDeployments: false

workspace:
serverTTL:
enabled: true
ttlSeconds: 86400

dagsterCloudAgent:
replicas: 2
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"

runLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
envSecrets:
- name: dagster-prod-env
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
nodeSelector:
cloud.google.com/gke-nodepool: "prod-pool"
labels:
dagster/pdb-protected: "true"

serviceAccount:
create: true
name: dagster-agent-prod
annotations: {}

podDisruptionBudget:
enabled: true
minAvailable: 1

Step 5: Install the agents

Run helm upgrade --install for each environment. Using --install means the same command works for both the initial install and future upgrades:

# Dev
helm upgrade --install dagster-agent-dev dagster-cloud/dagster-cloud-agent \
-n dagster-dev --create-namespace \
-f helm/dagster-agent/values-dev.yaml

# Staging
helm upgrade --install dagster-agent-staging dagster-cloud/dagster-cloud-agent \
-n dagster-staging --create-namespace \
-f helm/dagster-agent/values-staging.yaml

# Prod
helm upgrade --install dagster-agent-prod dagster-cloud/dagster-cloud-agent \
-n dagster-prod --create-namespace \
-f helm/dagster-agent/values-prod.yaml

Step 6: Verify the agents are running

Check that the agent pods started and connected to Dagster+:

kubectl get pods -n dagster-dev
kubectl logs -n dagster-dev -l app=dagster-cloud-agent --tail=50

In the Dagster+ UI, navigate to Deployment → Agents and confirm each agent shows as Active.

tip

If you're using Workload Identity (GKE) or IAM Roles for Service Accounts (EKS), add the appropriate annotation to the serviceAccount.annotations field in each values file. This lets run pods access cloud resources (e.g., BigQuery, S3) without storing credentials as secrets.

Next steps

Continue this example with setting up CI/CD.