These docs aim to cover the entire public surface of the core dagster
APIs, as well as public APIs from all provided libraries.
Dagster follows SemVer. We attempt to isolate breaking changes to the public APIs to minor versions (on a roughly 12-week cadence) and will announce deprecations in Slack and in the release notes to patch versions (on a roughly weekly cadence).
APIs from the core dagster
package, divided roughly by topic:
Topic | Description |
---|---|
Asset definitions | APIs to define data assets. |
Asset checks (Experimental) | APIs to define checks that can be run on assets. |
Schedules & Sensors | APIs to define schedules and sensors that initiate job execution, as well as some built-in helpers for common cases. |
Partitions | APIs to define partitions of the config space over which job runs can be backfilled. |
Definitions (Code locations) | APIs to collect definitions so that tools like the Dagster CLI or Dagster UI can load them as code locations. |
Resources | APIs to define resources, which are typically used to model external services, tools, and storage for use within jobs. |
Config | The types available to describe config schemas. |
Loggers | APIs to define how logs are stored. |
Ops | APIs to define or decorate functions as ops, declare their inputs and outputs, compose ops with each other, as well as the datatypes that op execution can return or yield. |
Hooks | APIs to define Dagster hooks, which can be triggered on specific Dagster events. |
Op graphs | APIs to define a set of interconnected ops. |
Dynamic mapping and collect | APIs that allow graph structures to be determined at run time. |
Jobs | APIs to define jobs that execute a set of ops with specific parameters. |
Execution | APIs to execute and test jobs and individual ops, the execution context available to ops, job configuration, and the default executors available for executing jobs. |
I/O managers | APIs to define how inputs and outputs are handled and loaded. |
Types | The types available for use with the Dagster Type system, which helps describe and verify at runtime the values that ops accept and produce. |
Pipes | APIs for working with the Dagster Pipes protocol from the orchestration side. |
Dagster CLI | Browse repositories and execute jobs from the command line. |
Errors | Classes for errors thrown by the Dagster framework. |
Utilities | Miscellaneous helpers used by Dagster. |
Internals | Core internal APIs that are important if you are interested in understanding how Dagster works with an eye towards extending it: logging, executors, system storage, the Dagster instance and plugin machinery, storage, schedulers. |
Repositories (Legacy) | APIs to define collections of jobs and other definitions that tools like the Dagster CLI or Dagster UI can load. Note: Definitions have replaced repositories and are now considered best practice. |
Dagster also provides a growing set of optional add-on libraries to integrate with infrastructure and other components of the data ecosystem:
Integration | Description |
---|---|
Dagster Pipes (dagster-pipes ) | Library for inclusion in external processes when using Dagster Pipes protocol. |
Airbyte (dagster-airbyte ) | Dagster integrations to run Airbyte jobs. |
AWS (dagster-aws ) | Dagster integrations for working with AWS resources. |
Azure (dagster-azure ) | Dagster integrations for working with Microsoft Azure resources. |
Celery (dagster-celery ) | Provides an executor built on top of the popular Celery task queue, and an executor with support for using Celery on Kubernetes. |
Celery & Docker (dagster-celery-docker ) | Provides an executor that lets Celery workers execute in Docker containers. |
Celery & Kubernetes (dagster-celery-k8s ) | Provides an executor that lets Celery workers execute on Kubernetes. |
Dask (dagster-dask ) | Provides an executor built on top of dask.distributed. |
dbt (dagster-dbt ) | Provides ops and resources to run dbt projects. |
Databricks (dagster-databricks ) | Provides ops and resources for integrating with Databricks. |
Datadog (dagster-datadog ) | Provides an integration with Datadog, to support publishing metrics to Datadog from within Dagster ops. |
Datahub (dagster-datahub ) | Provides an integration with Datahub, to support pushing metadata to Datahub from within Dagster ops. |
Docker (dagster-docker ) | Provides components for deploying Dagster to Docker. |
DuckDB (dagster-duckdb ) | Provides resources for querying DuckDB from Dagster. |
DuckDB & Pandas (dagster-duckdb-pandas ) | Provides support for storing Pandas DataFrames in DuckDB. |
DuckDB & Polars (dagster-duckdb-polars ) | Provides support for storing Polars DataFrames in DuckDB. |
DuckDB & PySpark (dagster-duckdb-pyspark ) | Provides support for storing PySpark DataFrames in DuckDB. |
Embedded ELT (dagster-embedded-elt ) | Provides support for running embedded ELT within Dagster |
Fivetran (dagster-fivetran ) | Provides ops and resources to run Fivetran syncs. |
Google Cloud Platform (GCP) (dagster-gcp ) | Dagster integrations for working with Google Cloud Platform resources. |
GCP & Pandas (dagster-gcp-pandas ) | Dagster integrations for working with Google Cloud Platform resources with Pandas DataFrames. Currently contains integrations for BigQuery. |
GCP & PySpark (dagster-gcp-pyspark ) | Dagster integrations for working with Google Cloud Platform resources with PySpark DataFrames. Currently contains integrations for BigQuery. |
Great Expectations (GE) (dagster-ge ) | Dagster integrations for working with Great Expectations data quality tests. |
GitHub (dagster-github ) | Provides a resource for issuing GitHub GraphQL queries and filing GitHub issues from Dagster jobs. |
GraphQL (dagster-graphql ) | Provides resources for interfacing with a Dagster deployment over GraphQL. |
Kubernetes (dagster-k8s ) | Provides components for deploying Dagster to Kubernetes. |
Looker (dagster-looker ) | Provides an integration to represent a Looker project as a graph of assets. |
Microsoft Teams (dagster-msteams ) | Includes a simple integration with Microsoft Teams. |
MLflow (dagster-mlflow ) | Provides resources and hooks for using MLflow functionalities with Dagster runs. |
MySQL (dagster-mysql ) | Includes implementations of run and event log storage built on MySQL. |
PagerDuty (dagster-pagerduty ) | Provides an integration for generating PagerDuty events from Dagster ops. |
Pandas (dagster-pandas ) | Provides support for using Pandas DataFrames in Dagster and utilities for performing data validation. |
Pandera (dagster-pandera ) | Provides support for validating pandas dataframes using Pandera. |
Papertrail (dagster-papertrail ) | Provides support for sending Dagster logs to Papertrail. |
Polars (dagster-polars ) | Provides support for saving and loading Polars DataFrames in Dagster. |
PostgreSQL (dagster-postgres ) | Includes implementations of run and event log storage built on Postgres. |
Prometheus (dagster-prometheus ) | Provides support for sending metrics to Prometheus. |
Pyspark (dagster-pyspark ) | Provides an integration with Pyspark. |
Shell (dagster-shell ) | Provides utilities for issuing shell commands from Dagster jobs. |
Slack (dagster-slack ) | Provides a simple integration with Slack. |
Snowflake (dagster-snowflake ) | Provides resources for querying Snowflake from Dagster. |
Snowflake & Pandas (dagster-snowflake-pandas ) | Provides support for storing Pandas DataFrames in Snowflake. |
Snowflake & PySpark (dagster-snowflake-pyspark ) | Provides support for storing PySpark DataFrames in Snowflake. |
Spark (dagster-spark ) | Provides an integration for working with Spark in Dagster. |
SSH / SFTP (dagster-ssh ) | Provides an integration for running commands over SSH and retrieving / posting files via SFTP. |
Twilio (dagster-twilio ) | Provides a resource for posting SMS messages from ops via Twilio. |
Weights & Biases (dagster-wandb ) | Provides an integration with Weights & Biases (W&B). |