Assets & Materializations

In Dagster, an Asset is the abstract representation of a data artifact produced and persisted by a solid. These assets are generally persisted to storage external to Dagster (e.g. database table, analytics dashboard) as a side-effect of the solid's computation. The act of persisting an asset is called a "materialization" and solids can express materializations by yielding an asset materialization event.

Materializing an Asset

To generate an asset materialization, we can yield an AssetMaterialization event in our solid. This would involve changing the following solid:

materialization.py
@solid
def my_simple_solid(context, df):
    do_some_transform(df)
    persist_to_storage(df)
    return df

into something like this:

materialization.py
@solid
def my_materialization_solid(context, df):
    do_some_transform(df)
    persist_to_storage(df)
    yield AssetMaterialization(asset_key="my_dataset", description="Persisted result to storage")
    yield Output(df)

Note: Our materialization solid must now explicitly yield an Output event instead of relying on the implicit conversion of the return value into an Output event.

We should now see a materialization event in the event log when we execute this solid.

Attaching Metadata to the Asset Materialization

There are a variety of types of metadata that can be associated with a materialization event, all through the EventMetadataEntry class. Each materialization event optionally takes a list of metadata entries that are then displayed in the event log.

materialization.py
@solid
def my_metadata_materialization_solid(context, df):
    do_some_transform(df)
    persist_to_storage(df)
    yield AssetMaterialization(
        asset_key="my_dataset",
        description="Persisted result to storage",
        metadata_entries=[
            EventMetadataEntry.text("Text-based metadata for this event", label="text_metadata"),
            EventMetadataEntry.fspath("/path/to/data/on/filesystem"),
            EventMetadataEntry.url("http://mycoolsite.com/url_for_my_data", label="dashboard_url"),
            EventMetadataEntry.float(calculate_bytes(df), "size (bytes)"),
        ],
    )
    yield Output(df)

Check our API docs for EventMetadataEntry for more details on they types of event metadata available.

Indexing with Asset Keys

Asset materializations can be indexed by adding an AssetKey to the materialization event. The AssetKey is a normalized, structured identifier for an Asset.

materialization.py
@solid
def my_asset_key_materialization_solid(context, df):
    do_some_transform(df)
    persist_to_storage(df)
    yield AssetMaterialization(
        asset_key=AssetKey(["dashboard", "my_cool_site"]),
        description="Persisted result to storage",
        metadata_entries=[
            EventMetadataEntry.url("http://mycoolsite.com/dashboard", label="dashboard_url"),
            EventMetadataEntry.float(calculate_bytes(df), "size (bytes)"),
        ],
    )
    yield Output(df)

As soon as materialization events with asset keys are generated during pipeline execution, those assets should appear in the Assets dashboard in dagit. With these indexed assets, we can now explore the relationship between our units of computation (e.g. solids, pipelines) and the data they produce (e.g. assets). Specifically, that exploration can begin with the starting point of the asset rather than from the pipeline.