Skip to main content

Post-processing components

It is often useful to make modifications to the definitions generated by a component without needing to modify the component logic. Dagster provides a generic mechanism for this called post-processing.

Post-processing is available on all components. To add post-processing to a component instance, add a post_process field in defs.yaml.

note

Currently post-processing is only supported for the assets, not other definitions.

Setup

Let's look at a simple example using the DefsFolderComponent. DefsFolderComponent simply loads all definitions from a specified folder.

Starting from a blank project, let's scaffold a DefsFolderComponent called my_assets:

dg scaffold defs DefsFolderComponent my_assets
Creating defs at /.../my-project/src/my_project/defs/my_assets.

We now have a directory my_project/defs/my_assets with a single file, defs.yaml:

src/my_project/defs/my_assets/defs.yaml
type: dagster.DefsFolderComponent

attributes: {}

Let's add some assets. We'll create two files my_project/defs/my_assets/foo.py and my_project/defs/my_assets/bar.py, each containing a single asset:

src/my_project/defs/my_assets/foo.py
import dagster as dg


@dg.asset
def foo():
return None

Let's run dg list defs to see our assets:

dg list defs
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets │ ┏━━━━━┳━━━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┓ │
│ │ ┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃ │
│ │ ┡━━━━━╇━━━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━┩ │
│ │ │ bar │ default │ │ │ │ │
│ │ ├─────┼─────────┼──────┼───────┼─────────────┤ │
│ │ │ foo │ default │ │ │ │ │
│ │ └─────┴─────────┴──────┴───────┴─────────────┘ │
└─────────┴────────────────────────────────────────────────┘

Example 1: Adding kind tags to assets

Now suppose we want to add a compute kind to every asset defined in this folder. We could do this by manually adding the kind on each asset declaration or by using a factory. However, component post-processing provides a simpler solution. We modify our defs.yaml to add a post_processing field that specifies the kind:

src/my_project/defs/my_assets/defs.yaml
type: dagster.DefsFolderComponent

attributes: {}
post_processing:
assets:
- attributes:
kinds:
- "some_kind"

Let's break down the structure of the value we set for post_processing. The top-level key, assets, is currently the only supported key. assets holds a list of asset post-processors. Each post-processor transforms a set of asset attributes and applies to a subset of all of the assets generated by the component. In this case, we have a single post-processor with no defined subset, which means the specified transformation is applied to all assets.

Let's run dg list defs again to see the result:

dg list defs
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets │ ┏━━━━━┳━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┓ │
│ │ ┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃ │
│ │ ┡━━━━━╇━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━┩ │
│ │ │ bar │ default │ │ some_kind │ │ │
│ │ ├─────┼─────────┼──────┼───────────┼─────────────┤ │
│ │ │ foo │ default │ │ some_kind │ │ │
│ │ └─────┴─────────┴──────┴───────────┴─────────────┘ │
└─────────┴────────────────────────────────────────────────────┘

You can see that both assets now have the kind we defined in our post_processing field.

Example 2: Assigning assets to different groups

Adding a kind isn't the only thing we can do. The full schema for attributes contains many other fields:

Details
JSON Schema for asset attributes
{
"$defs": {
"DailyPartitionsDefinitionModel": {
"additionalProperties": false,
"properties": {
"type": {
"const": "daily",
"default": "daily",
"enum": [
"daily"
],
"title": "Type",
"type": "string"
},
"start_date": {
"title": "Start Date",
"type": "string"
},
"end_date": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "End Date"
},
"timezone": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Timezone"
},
"minute_offset": {
"default": 0,
"title": "Minute Offset",
"type": "integer"
},
"hour_offset": {
"default": 0,
"title": "Hour Offset",
"type": "integer"
}
},
"required": [
"start_date"
],
"title": "DailyPartitionsDefinitionModel",
"type": "object"
},
"HourlyPartitionsDefinitionModel": {
"additionalProperties": false,
"properties": {
"type": {
"const": "hourly",
"default": "hourly",
"enum": [
"hourly"
],
"title": "Type",
"type": "string"
},
"start_date": {
"title": "Start Date",
"type": "string"
},
"end_date": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "End Date"
},
"timezone": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Timezone"
},
"minute_offset": {
"default": 0,
"title": "Minute Offset",
"type": "integer"
}
},
"required": [
"start_date"
],
"title": "HourlyPartitionsDefinitionModel",
"type": "object"
},
"StaticPartitionsDefinitionModel": {
"additionalProperties": false,
"properties": {
"type": {
"const": "static",
"default": "static",
"enum": [
"static"
],
"title": "Type",
"type": "string"
},
"partition_keys": {
"items": {
"type": "string"
},
"title": "Partition Keys",
"type": "array"
}
},
"required": [
"partition_keys"
],
"title": "StaticPartitionsDefinitionModel",
"type": "object"
},
"TimeWindowPartitionsDefinitionModel": {
"additionalProperties": false,
"properties": {
"type": {
"const": "time_window",
"default": "time_window",
"enum": [
"time_window"
],
"title": "Type",
"type": "string"
},
"start_date": {
"title": "Start Date",
"type": "string"
},
"end_date": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "End Date"
},
"timezone": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Timezone"
},
"fmt": {
"title": "Fmt",
"type": "string"
},
"cron_schedule": {
"title": "Cron Schedule",
"type": "string"
}
},
"required": [
"start_date",
"fmt",
"cron_schedule"
],
"title": "TimeWindowPartitionsDefinitionModel",
"type": "object"
},
"WeeklyPartitionsDefinitionModel": {
"additionalProperties": false,
"properties": {
"type": {
"const": "weekly",
"default": "weekly",
"enum": [
"weekly"
],
"title": "Type",
"type": "string"
},
"start_date": {
"title": "Start Date",
"type": "string"
},
"end_date": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "End Date"
},
"timezone": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Timezone"
},
"minute_offset": {
"default": 0,
"title": "Minute Offset",
"type": "integer"
},
"hour_offset": {
"default": 0,
"title": "Hour Offset",
"type": "integer"
},
"day_offset": {
"default": 0,
"title": "Day Offset",
"type": "integer"
}
},
"required": [
"start_date"
],
"title": "WeeklyPartitionsDefinitionModel",
"type": "object"
}
},
"additionalProperties": false,
"properties": {
"deps": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The asset keys for the upstream assets that this asset depends on.",
"examples": [
[
"my_database/my_schema/upstream_table"
]
],
"title": "Deps"
},
"description": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Human-readable description of the asset.",
"examples": [
"Refined sales data"
],
"title": "Description"
},
"metadata": {
"anyOf": [
{
"type": "object"
},
{
"type": "string"
}
],
"default": "__DAGSTER_UNSET_DEFAULT__",
"description": "Additional metadata for the asset.",
"title": "Metadata"
},
"group_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Used to organize assets into groups, defaults to 'default'.",
"examples": [
"staging"
],
"title": "Group Name"
},
"skippable": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
},
{
"type": "string"
}
],
"default": null,
"description": "Whether this asset can be omitted during materialization, causing downstream dependencies to skip.",
"title": "Skippable"
},
"code_version": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "A version representing the code that produced the asset. Increment this value when the code changes.",
"examples": [
"3"
],
"title": "Code Version"
},
"owners": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
},
{
"type": "string"
}
],
"default": null,
"description": "A list of strings representing owners of the asset. Each string can be a user's email address, or a team name prefixed with `team:`, e.g. `team:finops`.",
"examples": [
[
"team:analytics",
"nelson@hooli.com"
]
],
"title": "Owners"
},
"tags": {
"anyOf": [
{
"additionalProperties": {
"type": "string"
},
"type": "object"
},
{
"type": "string"
}
],
"default": "__DAGSTER_UNSET_DEFAULT__",
"description": "Tags for filtering and organizing.",
"examples": [
{
"team": "analytics",
"tier": "prod"
}
],
"title": "Tags"
},
"kinds": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "string"
}
],
"default": "__DAGSTER_UNSET_DEFAULT__",
"description": "A list of strings representing the kinds of the asset. These will be made visible in the Dagster UI.",
"examples": [
[
"snowflake"
]
],
"title": "Kinds"
},
"automation_condition": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The condition under which the asset will be automatically materialized.",
"title": "Automation Condition"
},
"partitions_def": {
"anyOf": [
{
"$ref": "#/$defs/HourlyPartitionsDefinitionModel"
},
{
"$ref": "#/$defs/DailyPartitionsDefinitionModel"
},
{
"$ref": "#/$defs/WeeklyPartitionsDefinitionModel"
},
{
"$ref": "#/$defs/TimeWindowPartitionsDefinitionModel"
},
{
"$ref": "#/$defs/StaticPartitionsDefinitionModel"
},
{
"type": "string"
}
],
"default": null,
"description": "The partitions definition for the asset.",
"title": "Partitions Def"
}
},
"title": "SharedAssetKwargsModel",
"type": "object"
}

Let's change our defs.yaml to assign foo and bar to different groups. To do this, we'll add two more two post-processors, each with a separate target argument to specify which assets they apply to:

src/my_project/defs/my_assets/defs.yaml
type: dagster.DefsFolderComponent

post_processing:
assets:
- attributes:
kinds:
- "some_kind"
- target: foo
attributes:
group_name: "foo_group"
- target: bar
attributes:
group_name: "bar_group"
note

The target field supports the full Dagster asset selection syntax.

Now if we run dg list defs again, we can see that the assets are in different groups:

dg list defs
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets │ ┏━━━━━┳━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┓ │
│ │ ┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃ │
│ │ ┡━━━━━╇━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━┩ │
│ │ │ bar │ bar_group │ │ some_kind │ │ │
│ │ ├─────┼───────────┼──────┼───────────┼─────────────┤ │
│ │ │ foo │ foo_group │ │ some_kind │ │ │
│ │ └─────┴───────────┴──────┴───────────┴─────────────┘ │
└─────────┴──────────────────────────────────────────────────────┘

There are many other possibilities. Post-processors are a flexible and convenient way to efficiently specify the properties of large sets of assets.