Running Dagit as a service

The core of any deployment of Dagster is a Dagit process that serves a user interface and responds to GraphQL queries.

Ensure that you are running a recent Python version. Typically, you'll want to run Dagit inside a virtualenv. Then, you can install Dagit and any additional libraries you might need.

pip install dagit

To run Dagit, use a command like the following:

DAGSTER_HOME=/opt/dagster/dagster_home dagit -h 0.0.0.0 -p 3000

In this configuration, Dagit will write execution logs to $DAGSTER_HOME/logs and listen on 0.0.0.0:3000.

Dagit in Docker

If you are running on AWS ECS, Kubernetes, or some other container-based orchestration system, you'll likely want to package Dagit using a Docker image.

A minimal skeleton Dockerfile and entrypoint shell script that will run Dagit and the cron scheduler are shown below:

Dockerfile
FROM python:3.7-slim

# Cron is required to use scheduling in Dagster
RUN apt-get update && apt-get install -yqq cron

RUN mkdir -p /opt/dagster/dagster_home /opt/dagster/app

RUN pip install dagit

# Copy your pipeline code and entrypoint.sh to /opt/dagster/app
COPY pipelines.py entrypoint.sh /opt/dagster/app/

# Copy dagster instance YAML to $DAGSTER_HOME
COPY dagster.yaml /opt/dagster/dagster_home/

WORKDIR /opt/dagster/app

RUN chmod +x entrypoint.sh

EXPOSE 3000

ENTRYPOINT ["/opt/dagster/app/entrypoint.sh"]

In this setup, the contents of entrypoint.sh should be something like the following. This script ensures that cron will run in the Docker container alongside Dagit:

entrypoint.sh
#!/bin/sh
export DAGSTER_HOME=/opt/dagster/dagster_home

# This block may be omitted if not packaging a repository with cron schedules:
####################################################################################################
# see: https://unix.stackexchange.com/a/453053 - fixes inflated hard link count
touch /etc/crontab /etc/cron.*/*

service cron start

# Add all schedules defined by the user
dagster schedule up
####################################################################################################

# Launch Dagit as a service
DAGSTER_HOME=/opt/dagster/dagster_home dagit -h 0.0.0.0 -p 3000

Finally, you should include a dagster.yaml file in $DAGSTER_HOME to configure the Dagster instance that Dagit will use:

dagster.yaml
run_storage:
  module: dagster_postgres.run_storage
  class: PostgresRunStorage
  config:
    postgres_db:
      username:
        env: DAGSTER_PG_USERNAME
      password:
        env: DAGSTER_PG_PASSWORD
      hostname:
        env: DAGSTER_PG_HOST
      db_name:
        env: DAGSTER_PG_DB
      port: 5432

event_log_storage:
  module: dagster_postgres.event_log
  class: PostgresEventLogStorage
  config:
    postgres_db:
      username:
        env: DAGSTER_PG_USERNAME
      password:
        env: DAGSTER_PG_PASSWORD
      hostname:
        env: DAGSTER_PG_HOST
      db_name:
        env: DAGSTER_PG_DB
      port: 5432

scheduler:
  module: dagster_cron.cron_scheduler
  class: SystemCronScheduler

schedule_storage:
  module: dagster_postgres.schedule_storage
  class: PostgresScheduleStorage
  config:
    postgres_db:
      username:
        env: DAGSTER_PG_USERNAME
      password:
        env: DAGSTER_PG_PASSWORD
      hostname:
        env: DAGSTER_PG_HOST
      db_name:
        env: DAGSTER_PG_DB
      port: 5432

compute_logs:
  module: dagster_aws.s3.compute_log_manager
  class: S3ComputeLogManager
  config:
    bucket: "mycorp-dagster-compute-logs"
    prefix: "dagster-test-"

local_artifact_storage:
  module: dagster.core.storage.root
  class: LocalArtifactStorage
  config:
    base_dir: "/opt/dagster/local/"

In cases where you're using env vars to configure the instance database, you should ensure these environment variables are exposed in the running Dagit container.

In practice, you may want to volume mount your pipeline code into your containers to enable deployment patterns such as git-sync sidecars that avoid the need to rebuild images and redeploy containers when pipeline code changes.

Dagit servers expose a health check endpoint at /dagit_info, which returns a JSON response like:

{
  "dagit_version": "0.6.6",
  "dagster_graphql_version": "0.6.6",
  "dagster_version": "0.6.6"
}

Non-Docker deployment using systemd

To run Dagit as a long-lived service, you can install a systemd service such as the following:

dagit.service
[Unit]
Description=Run Dagit
After=network.target

[Service]
Type=simple
User=ubuntu
ExecStart=/bin/bash -c '\
    export DAGSTER_HOME=/opt/dagster/dagster_home && \
    export PYTHONPATH=$PYTHONPATH:/opt/dagster/app && \
    export LC_ALL=C.UTF-8 && \
    export LANG=C.UTF-8 && \
    source /opt/dagster/venv/bin/activate && \
    /opt/dagster/venv/bin/dagit \
        -h 0.0.0.0 \
        -p 3000 \
        -y /opt/dagster/app/workspace.yaml'
Restart=always
WorkingDirectory=/opt/dagster/app/

[Install]
WantedBy=multi-user.target

Note that this assumes you've got a virtualenv for Dagster at /opt/dagster/venv and that your pipeline code and workspace.yaml are located under /opt/dagster/app.