Set up Kubernetes agents
Dagster+ Hybrid runs a lightweight agent in your Kubernetes cluster. The agent polls Dagster+ for work, launches run pods, and streams logs back — your code never leaves your infrastructure.
Each environment gets its own agent installed in its own namespace, with values files that reflect the environment's resource requirements.
Step 1: Add the Helm chart repository
helm repo add dagster-cloud https://dagster-io.github.io/helm-user-cloud
helm repo update
Step 2: Create agent tokens
Each agent authenticates with an API token scoped to its deployment. Create a token for each environment in the Dagster+ UI under Organization Settings → Tokens, then store each token as a Kubernetes secret:
kubectl create namespace dagster-dev
kubectl create secret generic dagster-cloud-agent-token \
--from-literal=DAGSTER_CLOUD_AGENT_TOKEN=<dev-token> \
-n dagster-dev
kubectl create namespace dagster-staging
kubectl create secret generic dagster-cloud-agent-token \
--from-literal=DAGSTER_CLOUD_AGENT_TOKEN=<staging-token> \
-n dagster-staging
kubectl create namespace dagster-prod
kubectl create secret generic dagster-cloud-agent-token \
--from-literal=DAGSTER_CLOUD_AGENT_TOKEN=<prod-token> \
-n dagster-prod
Step 3: Create environment secrets for user code
The Helm values files reference a per-environment Kubernetes secret (dagster-dev-env, dagster-staging-env, dagster-prod-env) that injects connection strings into run pods. Create these secrets now, substituting your actual connection details:
kubectl create secret generic dagster-dev-env \
-n dagster-dev \
--from-literal=SOURCE_DATABASE_URL='postgres://dev-host/mydb' \
--from-literal=WAREHOUSE_URL='snowflake://dev/warehouse'
kubectl create secret generic dagster-staging-env \
-n dagster-staging \
--from-literal=SOURCE_DATABASE_URL='postgres://staging-host/mydb' \
--from-literal=WAREHOUSE_URL='snowflake://staging/warehouse'
kubectl create secret generic dagster-prod-env \
-n dagster-prod \
--from-literal=SOURCE_DATABASE_URL='postgres://prod-host/mydb' \
--from-literal=WAREHOUSE_URL='snowflake://prod/warehouse'
These values are automatically available as environment variables in every run pod. The assets read them with os.getenv("SOURCE_DATABASE_URL") and os.getenv("WAREHOUSE_URL"), so each environment targets its own data sources without any code changes.
Step 4: Configure each environment
The Helm values files capture the differences between environments. Common things to tune per environment: replica count, run resource limits, server TTL, branch deployment support, and which node pool to schedule on.
Dev
The dev agent runs a single replica, reclaims idle servers quickly, and serves branch deployments:
dagsterCloud:
organization: "your-org"
deployment: "dev"
apiToken:
secretName: dagster-cloud-agent-token
secretKey: DAGSTER_CLOUD_AGENT_TOKEN
branchDeployments: true
workspace:
serverTTL:
enabled: true
ttlSeconds: 7200
dagsterCloudAgent:
replicas: 1
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
runLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
envSecrets:
- name: dagster-dev-env
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
nodeSelector:
cloud.google.com/gke-nodepool: "dev-pool"
serviceAccount:
create: true
name: dagster-agent-dev
annotations: {}
Staging
Staging mirrors prod behavior but with lighter resources. Branch deployments are disabled — only code merged to staging is tested here:
dagsterCloud:
organization: "your-org"
deployment: "staging"
apiToken:
secretName: dagster-cloud-agent-token
secretKey: DAGSTER_CLOUD_AGENT_TOKEN
branchDeployments: false
workspace:
serverTTL:
enabled: true
ttlSeconds: 14400
dagsterCloudAgent:
replicas: 1
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
runLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
envSecrets:
- name: dagster-staging-env
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
nodeSelector:
cloud.google.com/gke-nodepool: "staging-pool"
serviceAccount:
create: true
name: dagster-agent-staging
annotations: {}
Prod
Prod runs two agent replicas for high availability. The server TTL is long (24 h) to keep code servers warm, and a Pod Disruption Budget (PDB) prevents the agent from being evicted during node maintenance:
dagsterCloud:
organization: "your-org"
deployment: "prod"
apiToken:
secretName: dagster-cloud-agent-token
secretKey: DAGSTER_CLOUD_AGENT_TOKEN
branchDeployments: false
workspace:
serverTTL:
enabled: true
ttlSeconds: 86400
dagsterCloudAgent:
replicas: 2
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
runLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
envSecrets:
- name: dagster-prod-env
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
nodeSelector:
cloud.google.com/gke-nodepool: "prod-pool"
labels:
dagster/pdb-protected: "true"
serviceAccount:
create: true
name: dagster-agent-prod
annotations: {}
podDisruptionBudget:
enabled: true
minAvailable: 1
Step 5: Install the agents
Run helm upgrade --install for each environment. Using --install means the same command works for both the initial install and future upgrades:
# Dev
helm upgrade --install dagster-agent-dev dagster-cloud/dagster-cloud-agent \
-n dagster-dev --create-namespace \
-f helm/dagster-agent/values-dev.yaml
# Staging
helm upgrade --install dagster-agent-staging dagster-cloud/dagster-cloud-agent \
-n dagster-staging --create-namespace \
-f helm/dagster-agent/values-staging.yaml
# Prod
helm upgrade --install dagster-agent-prod dagster-cloud/dagster-cloud-agent \
-n dagster-prod --create-namespace \
-f helm/dagster-agent/values-prod.yaml
Step 6: Verify the agents are running
Check that the agent pods started and connected to Dagster+:
kubectl get pods -n dagster-dev
kubectl logs -n dagster-dev -l app=dagster-cloud-agent --tail=50
In the Dagster+ UI, navigate to Deployment → Agents and confirm each agent shows as Active.
If you're using Workload Identity (GKE) or IAM Roles for Service Accounts (EKS), add the appropriate annotation to the serviceAccount.annotations field in each values file. This lets run pods access cloud resources (e.g., BigQuery, S3) without storing credentials as secrets.
Next steps
Continue this example with setting up CI/CD.