Column-level lineage
For assets that produce database tables, column-level lineage can be a powerful tool for improving collaboration and debugging issues. Column lineage enables data and analytics engineers alike to understand how a column is created and used in your data platform.
How it works
Emitted as materialization metadata, column lineage can be:
- Specified on assets defined in Dagster
- Enabled for assets loaded from integrations like dbt
Dagster uses this metadata to display the column's upstream and downstream dependencies, accessible via the asset's details page in the Dagster UI. Note: Viewing column-level lineage in the UI is a Dagster+ feature.
Enabling column-level lineage
For assets defined in Dagster
To enable column-level lineage on Dagster assets that produce database tables, you'll need to:
- Return a
MaterializeResult
object containing ametadata
parameter - In
metadata
, use thedagster/column_lineage
key to create aTableColumnLineage
object - In this object, use
TableColumnLineage.deps_by_column
to define a list of columns - For each column, use
TableColumnDep
to define its dependencies. This object acceptsasset_key
andcolumn_name
arguments, allow you to specify the name of the asset and column that make up the dependency.
Let's take a look at an example:
from dagster import (
AssetKey,
MaterializeResult,
TableColumnDep,
TableColumnLineage,
asset,
)
@asset(deps=[AssetKey("source_bar"), AssetKey("source_baz")])
def my_asset():
yield MaterializeResult(
metadata={
"dagster/column_lineage": TableColumnLineage(
deps_by_column={
"new_column_foo": [
TableColumnDep(
asset_key=AssetKey("source_bar"),
column_name="column_bar",
),
TableColumnDep(
asset_key=AssetKey("source_baz"),
column_name="column_baz",
),
],
"new_column_qux": [
TableColumnDep(
asset_key=AssetKey("source_bar"),
column_name="column_quuz",
),
],
}
)
}
)
When materialized, the my_asset
asset will create two columns: new_column_foo
and new_column_qux
.
The new_column_foo
column is dependent on two other columns:
column_bar
from thesource_bar
assetcolumn_baz
from thesource_baz
asset
And the second column, new_column_qux
has is dependent on column_quuz
from the source_bar
asset.
If using Dagster+, you can view the column-level lineage in the Dagster UI.
For assets loaded from integrations
Column-level lineage is currently supported for the dbt integration. Refer to the dbt documentation for more information.
Viewing column-level lineage in the Dagster UI
Viewing column lineage in the UI is a Dagster+ feature.
-
In the Dagster UI, open the Asset details page for an asset with column-level lineage enabled.
-
Navigate to the Overview tab if it isn't already open.
-
In the Columns section, click the branch icon in the row of the column you want to view. The icon is on the far right side of the row:
The graph will display the column's column dependencies, grouped by asset:
To view another column's lineage, click the Column dropdown and select another column.