Concepts
Dagster provides a variety of abstractions for building and orchestrating data pipelines. These concepts enable a modular, declarative approach to data engineering, making it easier to manage dependencies, monitor execution, and ensure data quality.
Asset
An asset
represents a logical unit of data such as a table, dataset, or machine learning model. Assets can have dependencies on other assets, forming the data lineage for your pipelines. As the core abstraction in Dagster, assets can interact with many other Dagster concepts to facilitate certain tasks.
Concept | Relationship |
---|---|
asset check | asset may use an asset check |
config | asset may use a config |
io manager | asset may use a io manager |
partition | asset may use a partition |
resource | asset may use a resource |
job | asset may be used in a job |
schedule | asset may be used in a schedule |
sensor | asset may be used in a sensor |
definitions | asset must be set in a definitions to be deployed |
Asset Check
An asset_check
is associated with an asset
to ensure it meets certain expectations around data quality, freshness or completeness. Asset checks run when the asset is executed and store metadata about the related run and if all the conditions of the check were met.
Concept | Relationship |
---|---|
asset | asset check may be used by an asset |
definitions | asset check must be set in a definitions to be deployed |
Code Location
A code location
is a collection of Definitions
deployed in a specific environment. A code location determines the Python environment (including the version of Dagster being used as well as any other Python dependencies). A Dagster project can have multiple code locations, helping isolate dependencies.
Concept | Relationship |
---|---|
definitions | code location must contain at least one definitions |
Config
A RunConfig
is a set schema applied to a Dagster object that is input at the time of execution. This allows for parameterization and the reuse of pipelines to serve multiple purposes.
Concept | Relationship |
---|---|
asset | config may be used by an asset |
job | config may be used by a job |
schedule | config may be used by a schedule |
sensor | config may be used by a sensor |
Definitions
Definitions
is a top-level construct that contains references to all the objects of a Dagster project, such as assets
, jobs
and ScheduleDefinitions
. Only objects included in the definitions will be deployed and visible within the Dagster UI.
Concept | Relationship |
---|---|
asset | definitions may contain one or more assets |
asset check | definitions may contain one or more asset checks |
io manager | definitions may contain one or more io managers |
job | definitions may contain one or more jobs |
resource | definitions may contain one or more resources |
schedule | definitions may contain one or more schedules |
sensor | definitions may contain one or more sensors |
code location | definitions must be deployed in a code location |
Graph
A GraphDefinition
connects multiple ops
together to form a DAG. If you are using assets
, you will not need to use graphs directly.
Concept | Relationship |
---|---|
config | graph may use a config |
op | graph must include one or more ops |
job | graph must be part of job to execute |
IO Manager
An IOManager
defines how data is stored and retrieved between the execution of assets
and ops
. This allows for a customizable storage and format at any interaction in a pipeline.
Concept | Relationship |
---|---|
asset | io manager may be used by an asset |
definitions | io manager must be set in a definitions to be deployed |
Job
A job
is a subset of assets
or the GraphDefinition
of ops
. Jobs are the main form of execution in Dagster.
Concept | Relationship |
---|---|
asset | job may contain a selection of assets |
config | job may use a config |
graph | job may contain a graph |
schedule | job may be used by a schedule |
sensor | job may be used by a sensor |
definitions | job must be set in a definitions to be deployed |
Op
An op
is a computational unit of work. Ops are arranged into a GraphDefinition
to dictate their order. Ops have largely been replaced by assets
.
Concept | Relationship |
---|---|
type | op may use a type |
graph | op must be contained in graph to execute |
Partition
A PartitionsDefinition
represents a logical slice of a dataset or computation mapped to a certain segments (such as increments of time). Partitions enable incremental processing, making workflows more efficient by only running on relevant subsets of data.
Concept | Relationship |
---|---|
asset | partition may be used by an asset |
Resource
A ConfigurableResource
is a configurable external dependency. These can be databases, APIs, or anything outside of Dagster.
Concept | Relationship |
---|---|
asset | resource may be used by an asset |
schedule | resource may be used by a schedule |
sensor | resource may be used by a sensor |
definitions | resource must be set in a definitions to be deployed |
Type
A type
is a way to define and validate the data passed between ops
.
Concept | Relationship |
---|---|
op | type may be used by an op |
Schedule
A ScheduleDefinition
is a way to automate jobs
or assets
to occur on a specified interval. In the cases that a job or asset is parameterized, the schedule can also be set with a run configuration (RunConfig
) to match.
Concept | Relationship |
---|---|
asset | schedule may include a job or selection of assets |
config | schedule may include a config if the job or assets include a config |
job | schedule may include a job or selection of assets |
definitions | schedule must be set in a definitions to be deployed |
Sensor
A sensor
is a way to trigger jobs
or assets
when an event occurs, such as a file being uploaded or a push notification. In the cases that a job or asset is parameterized, the sensor can also be set with a run configuration (RunConfig
) to match.
Concept | Relationship |
---|---|
asset | sensor may include a job or selection of assets |
config | sensor may include a config if the job or assets include a config |
job | sensor may include a job or selection of assets |
definitions | sensor must be set in a definitions to be deployed |