Deploying on Kubernetes Part 1
We publish a Dagster Helm chart that you can use to get up and running quickly on a Kubernetes cluster. Even if you don't use Helm, you may find the Dagster Helm chart useful as a reference for all the components you will probably want as part of a Kubernetes-based deployment of Dagster.
You can install a simple demo of Dagster on your Kubernetes cluster by running the shell commands:
helm repo add dagster https://dagster-io.github.io/helm helm install dagster dagster/dagster
All of the configurable parameters for this chart live in the default
Deploying and executing your pipelines on Kubernetes¶
The most important thing to provide is an image that contains your repository/pipelines, along with the required Dagster Python packages. It's worth looking at the example Dockerfile and example code for what's needed.
Once you've defined the image, publish the image to a registry that is accessible from the
Kubernetes cluster. Then, to deploy Dagster, specify the image within your
will override the existing defaults to use your image for deploying Dagit and for pipeline execution:
dagit: image: repository: 'your_repo/your_image:latest' tag: 'latest' # This image is used to launch the run coordinator job pipeline_run: image: repository: 'your_repo/your_image:latest' tag: 'latest'
To install the Helm chart with the image configured:
helm install dagster dagster/dagster -f /path/to/values.yaml
Note: In this example, Dagit runs with the user code image. In Deploying on Kubernetes Part 2, we relax this constraint.
The Helm chart installs several different components, including Dagit.
By default, we configure Dagster to deploy with Celery.
In this system, the
with the corresponding
handles physical execution of Dagster pipelines via Celery-on-K8s. This deployment aims to provide:
- User Code Separation: In this system, the Celery workers can be deployed via fixed images without user code, but rest of the system still uses the user code image. In the next part, we walk through how user code can be deployed separately from all Dagster system components.
- Priorities & Queueing: We include Celery to take advantage of its support for task priorities and queue widths. When using this system, you can annotate your solids with priority tags to prioritize certain solid executions, and with queue tags to manage parallelism and limit the number of connections to resources. This works well but is not yet documented, so please reach out to us if we can help.
The deployed system looks like:
When a pipeline is executed, Dagit (or the scheduler) instigates execution via the
CeleryK8sRunLauncher. The run launcher launches a
run coordinator Job with the
user code image specified in the
pipeline_run: values YAML configuration (the run coordinator
Job will be named
dagster-run-<run_id>, to make debugging in kubectl easier). The role of the run
coordinator Job is to traverse the execution plan, and enqueue execution steps as Celery tasks.
Celery workers poll for new tasks, and for each step execution task that it fetches, a
step execution Job is launched to execute that step. These step execution Jobs are also launched
pipeline_run: image, and each will be named
The published helm chart also supports a celery-less deployment. In this case, we configure the
K8sRunLauncher on the Dagster instance. This will launch each
pipeline and its associated step executions in a single Kubernetes Job. We can use this option
with the following command:
helm install dagster dagster/dagster -f /path/to/values.yaml --set celery.enabled=false --set k8sRunLauncher.enabled=true
Note: It is also possible to deploy with a custom Run Launcher. If you have questions, please reach out to us on Slack and we're happy to help!
As a full inventory, the Helm chart sets up the following components by default:
- Dagit running as a Deployment behind a Service. By default, we deploy cron running in the background on this container to instigate scheduled executions. In the next part, we will walk through our scheduler implementation that leverages Kubernetes CronJobs. By default, Dagit is launched running a simple example pipeline; see the Dockerfile and the code.
- PostgreSQL for Dagster's storage. In a real deployment, you'll likely want to configure the system to use a PostgreSQL database hosted elsewhere.
- RabbitMQ as a dagster-celery broker. In a real deployment, you'll likely want to configure the system to use a separately-hosted queueing service, like Redis.
- A set of
dagster-celeryworkers running as a Deployment. By default, these deploy the image
dagster/k8s-celery-worker:<<DAGSTER_VERSION>>. The Dockerfile for this image can be found here if you need to customize the Celery workers for some reason.
- A ServiceAccount (to launch the Jobs) bound to a properly scoped Role.
- A Secret,
dagster-postgresql-secret, with the PG password; used for connecting to the PostgreSQL database.
- Several ConfigMaps:
dagster-*-env: Environment variables for Dagit, the Celery workers, and pipeline execution.
dagster-celery: Configures the backend/broker which the Celery workers connect to.
dagster-instance: Defines the Instance YAML for all nodes in the system. Configures Dagster storages to use PostgreSQL, schedules to use cron, and sets the run launcher as
CeleryK8sRunLauncherto launch pipeline runs as Kubernetes Jobs.
Other components are optional, including:
- Flower, a useful service for monitoring Celery tasks and debugging issues
- User Code Deployments, which we will cover in the next part.