API Docs

These docs aim to cover the entire public surface of the dagster APIs, as well as, eventually, public APIs from all the associated libraries.

Dagster follows SemVer, and is currently pre-1.0. We are attempting to isolate breaking changes to the public APIs to minor versions (on a roughly 8-week cadence) and will announce deprecations in Slack and in the release notes to patch versions (on a roughly weekly cadence).

Core

APIs from the core dagster package are divided roughly by topic:

  • Solids APIs to define or decorate functions as solids, declare their inputs and outputs, compose solids with each other, as well as the datatypes that solid execution can return or yield.

  • Pipelines APIs to define pipelines, dependencies and fan-in dependencies between solids, and aliased instances of solids; pipeline modes, resources, loggers, presets, and repositories.

  • Execution APIs to execute and test pipelines and individual solids, the execution context available to solids, pipeline configuration, the default system storages used for intermediates, and the default executors available for executing pipelines.

  • Types Primitive types available for the input and output values of solids, and the APIs used to define and test new Dagster types.

  • Config The types available to describe config schemas.

  • Schedules APIs to define schedules on which pipelines are run, as well as a few built-in defaults.

  • Partitions APIs to define partitions of the config space over which pipeline runs can be backfilled.

  • Errors Errors thrown by the Dagster framework.

  • Dagster CLI Browse repositories and execute pipelines from the command line

  • Utilities Miscellaneous helpers used by Dagster that may be useful to users.

  • Internals Core internal APIs that are important if you are interested in understanding how Dagster works with an eye towards extending it: logging, executors, system storage, the Dagster instance & plugin machinery, storage, schedulers.

Libraries

Dagster includes a number of non-core libraries that provide integrations and additional functionality:

  • Airflow (dagster_airflow) Tools for compiling Dagster pipelines to Airflow DAGs.

  • AWS (dagster_aws) Tools for working with AWS, including using S3 for intermediates storage.

  • Celery (dagster_celery) Provides an executor built on top of the popular Celery task queue.

  • Cron (dagster_cron) Provides a simple scheduler implementation built on system cron.

  • Dask (dagster_dask) Provides an executor built on top of dask.distributed.

  • GCP (dagster_gcp) Tools for working with GCP, including using GCS for intermediates storage.

  • Jupyter (dagstermill) Wraps Jupyter notebooks as solids for integrated execution within pipeline runs.

  • Kubernetes (dagster_k8s) Tools for deploying Dagster to Kubernetes.

  • Postgres (dagster_postgres) Includes implementations of run and event log storage built on Postgres.