Deploying on Kubernetes Part 2

In this section, we will discuss more sophisticated deployment options for the Dagster system in Kubernetes. We will focus primarily on:

  1. The K8sScheduler, a new scheduler implementation built on Kubernetes CronJob

  2. User Code Deployments, the ability to load repository information from user code images.

System Diagram for User Code Deployments with K8s-native Scheduler

k8s_deployment_part2.png

K8s Scheduler

This section introduces our integration with Kubernetes CronJob to handle scheduling recurring pipeline runs.

Motivation:

Previously, the only scheduler option was the Dagster SystemCronScheduler, which is built on top of crontab. This ran in the Dagit Deployment, which meant that users were restricted to only having one Pod in the Dagit Deployment and meant that schedule ticks could be missed if Dagit was unavailable (such as during deploys).

The motivation for adding the Dagster K8sScheduler implementation is to enable users to have multiple Dagit Pods in the Deployment without the risk of duplicate pipeline runs or missing pipeline runs. We increase the robustness of the scheduler by leveraging native Kubernetes objects.

How to enable:

In order to enable the Dagster K8sScheduler, set scheduler.k8sEnabled to true in the Helm values.yaml file and fill in the other fields in the scheduler section.

How it works:

When a new schedule is turned on, we will create a corresponding Kubernetes CronJob object. When a schedule is updated, we will find and patch the existing CronJob so that there is no downtime. At execution time, the Kubernetes CronJob will create a Kubernetes Job specifically to instantiate the Run Launcher, which in turn creates the Run Coordinator Job. Kubernetes CronJob names are generated by hashing the schedule properties, so different repositories can create schedules that have the same schedule name without causing conflicts in Kubernetes.

K8s User Code Deployments (over gRPC)

In the previous section, we presented a deployment option where Dagit, Run Coordinator, and Step Job all use the same image. This section introduces the option to use a different Docker image for each repository of user code.

Motivation:

Previously, we deployed the same Docker image for every component which meant that updating user code meant having to redeploy the entire system. Additionally, this meant that the dependencies of Dagit/Dagster and the dependencies for every user code repository, were inter-mixed together.

With the new system, changes to user code only requires updating the corresponding user code Deployments (ie foo and fiz in the diagram above). This means that if a pipeline within foo changes, only the foo Deployment needs to updated. This enables Dagit to be a long-standing process that communicates with the user code Deployments (ie foo/ fiz) via the gRPC layer to fetch the latest repository information. This also separates user code from Dagit/Dagster system code, which increases robustness.

Users can push different Docker images per repository, which allows separate teams within an organization to manage their own images and reduce inter-dependencies. This also unlocks the ability to have conflicting dependencies between repositories.

How to enable:

Configure the user code deployments by setting userDeployments.enabled to true in the Helm values.yaml and specify a list of deployments under userDeployments.deployments.

How it works:

This feature operates over gRPC. First, Dagit will communicate with the user code Deployments over gRPC to populate its UI. Second, the Run Launcher will communicate with the user code Deployments over gRPC to fetch the most recently deployed Docker image, which will be used to execute the pipeline run.

We store the most recently deployed Docker image in the DAGSTER_CURRENT_IMAGE environment variable in the user code Deployment and is updated at every deploy. It consists of a simple string of the form <image_respository>:<image_tag>. When using the CeleryK8sRunLauncher, all steps in the pipeline run will use the same <image_respository>:<image_tag>. As such, we recommend using a unique Docker image tag per user code Deployment to guarantee that each step job in a given pipeline run uses a Docker image with the same hash.