Skip to main content

Concepts

Dagster provides a variety of abstractions for building and orchestrating data pipelines. These concepts enable a modular, declarative approach to data engineering, making it easier to manage dependencies, monitor execution, and ensure data quality.

Asset

An asset represents a logical unit of data such as a table, dataset, or machine learning model. Assets can have dependencies on other assets, forming the data lineage for your pipelines. As the core abstraction in Dagster, assets can interact with many other Dagster concepts to facilitate certain tasks.

ConceptRelationship
asset checkasset may use an asset check
configasset may use a config
io managerasset may use a io manager
partitionasset may use a partition
resourceasset may use a resource
jobasset may be used in a job
scheduleasset may be used in a schedule
sensorasset may be used in a sensor
definitionsasset must be set in a definitions to be deployed

Asset Check

An asset_check is associated with an asset to ensure it meets certain expectations around data quality, freshness or completeness. Asset checks run when the asset is executed and store metadata about the related run and if all the conditions of the check were met.

ConceptRelationship
assetasset check may be used by an asset
definitionsasset check must be set in a definitions to be deployed

Code Location

A code location is a collection of Definitions deployed in a specific environment. A code location determines the Python environment (including the version of Dagster being used as well as any other Python dependencies). A Dagster project can have multiple code locations, helping isolate dependencies.

ConceptRelationship
definitionscode location must contain at least one definitions

Config

A RunConfig is a set schema applied to a Dagster object that is input at the time of execution. This allows for parameterization and the reuse of pipelines to serve multiple purposes.

ConceptRelationship
assetconfig may be used by an asset
jobconfig may be used by a job
scheduleconfig may be used by a schedule
sensorconfig may be used by a sensor

Definitions

Definitions is a top-level construct that contains references to all the objects of a Dagster project, such as assets, jobs and ScheduleDefinitions. Only objects included in the definitions will be deployed and visible within the Dagster UI.

ConceptRelationship
assetdefinitions may contain one or more assets
asset checkdefinitions may contain one or more asset checks
io managerdefinitions may contain one or more io managers
jobdefinitions may contain one or more jobs
resourcedefinitions may contain one or more resources
scheduledefinitions may contain one or more schedules
sensordefinitions may contain one or more sensors
code locationdefinitions must be deployed in a code location

Graph

A GraphDefinition connects multiple ops together to form a DAG. If you are using assets, you will not need to use graphs directly.

ConceptRelationship
configgraph may use a config
opgraph must include one or more ops
jobgraph must be part of job to execute

IO Manager

An IOManager defines how data is stored and retrieved between the execution of assets and ops. This allows for a customizable storage and format at any interaction in a pipeline.

ConceptRelationship
assetio manager may be used by an asset
definitionsio manager must be set in a definitions to be deployed

Job

A job is a subset of assets or the GraphDefinition of ops. Jobs are the main form of execution in Dagster.

ConceptRelationship
assetjob may contain a selection of assets
configjob may use a config
graphjob may contain a graph
schedulejob may be used by a schedule
sensorjob may be used by a sensor
definitionsjob must be set in a definitions to be deployed

Op

An op is a computational unit of work. Ops are arranged into a GraphDefinition to dictate their order. Ops have largely been replaced by assets.

ConceptRelationship
typeop may use a type
graphop must be contained in graph to execute

Partition

A PartitionsDefinition represents a logical slice of a dataset or computation mapped to a certain segments (such as increments of time). Partitions enable incremental processing, making workflows more efficient by only running on relevant subsets of data.

ConceptRelationship
assetpartition may be used by an asset

Resource

A ConfigurableResource is a configurable external dependency. These can be databases, APIs, or anything outside of Dagster.

ConceptRelationship
assetresource may be used by an asset
scheduleresource may be used by a schedule
sensorresource may be used by a sensor
definitionsresource must be set in a definitions to be deployed

Type

A type is a way to define and validate the data passed between ops.

ConceptRelationship
optype may be used by an op

Schedule

A ScheduleDefinition is a way to automate jobs or assets to occur on a specified interval. In the cases that a job or asset is parameterized, the schedule can also be set with a run configuration (RunConfig) to match.

ConceptRelationship
assetschedule may include a job or selection of assets
configschedule may include a config if the job or assets include a config
jobschedule may include a job or selection of assets
definitionsschedule must be set in a definitions to be deployed

Sensor

A sensor is a way to trigger jobs or assets when an event occurs, such as a file being uploaded or a push notification. In the cases that a job or asset is parameterized, the sensor can also be set with a run configuration (RunConfig) to match.

ConceptRelationship
assetsensor may include a job or selection of assets
configsensor may include a config if the job or assets include a config
jobsensor may include a job or selection of assets
definitionssensor must be set in a definitions to be deployed