Using partitions
Partitions allow you to divide your assets into subsets based on time, categories, or other dimensions. When working with Dagster components, you can add partitions to your assets using either YAML configuration or Python code.
Before adding partitions to a component, you must either create a components-ready Dagster project or migrate an existing project to dg
.
Adding partitions in YAML
If you've defined your component using a defs.yaml
file, you'll first want to define a new template_var
that returns the PartitionsDef
object. Create a template_vars.py
file in the same directory as your component's defs.yaml
:
import dagster as dg
@dg.template_var
def the_daily_partitions_def() -> dg.DailyPartitionsDefinition:
return dg.DailyPartitionsDefinition(start_date="2020-01-01")
From there, the simplest way to add a PartitionsDefinition
to assets in a component is with the post_processing
configuration in your defs.yaml
file:
type: ...
attributes: ...
template_vars_module: .template_vars
post_processing:
assets:
- target: "*"
attributes:
partitions_def: "{{ the_daily_partitions_def }}"
In general, it is best to avoid applying different partition definitions to different assets within a single component, since each execution step must map to only a single partitions definition at a time. Any integration that maps multiple assets to the same step must take care to ensure that all of those assets have the same partitions definition, which is easiest if all assets in the entire component share a single partitions definition.
Adding partitions in Python
If you are using the @dg.component_instance
decorator to define your component, you can create a subclass of your component class that applies partitions to all assets:
import dagster as dg
def add_partitions_def(spec: dg.AssetSpec) -> dg.AssetSpec:
return spec.replace_attributes(
partitions_def=dg.DailyPartitionsDefinition(start_date="2020-01-01")
)
class ExampleComponentWithPartitions(dg.Component, dg.Model, dg.Resolvable):
def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
# Use map_asset_specs to add a property to all assets
return super().build_defs(context).map_asset_specs(add_partitions_def)
Updating your execution logic
Once you've added partitions to your assets, you'll need to update your asset execution logic to work with the partition context. When Dagster executes a partitioned asset, it provides information about which partition is currently being processed.
In your asset functions, you can access partition information through the AssetExecutionContext
. The key properties are:
partition_key
: The string key identifying the current partition.partition_time_window
: For time-based partitions, the start and end times for this partition.
Your execution logic should use this partition information to process only the relevant subset of data.
import dagster as dg
def process_data_for_partition(partition_date):
"""Example function to process data for a specific partition."""
# Your actual data processing logic would go here
return f"Processed data for {partition_date}"
class MyPartitionedComponent:
def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
@dg.asset
def my_partitioned_asset(context: dg.AssetExecutionContext):
# Access the current partition key
partition_key = context.partition_key
# Use partition key to filter data or modify execution logic
# For example, if using a time-based partition:
partition_date = context.partition_time_window.start
# Your execution logic here, using the partition information
return process_data_for_partition(partition_date)
# Apply partitions and return definitions
return dg.Definitions(
assets=[
my_partitioned_asset.with_attributes(
partitions_def=dg.DailyPartitionsDefinition(start_date="2020-01-01")
)
]
)
Best practices
- Use consistent partitions: All assets within a single component should generally use the same partitions definition to avoid execution complexity.
- Handle partition context: Always use
AssetExecutionContext.partition_key
or related methods to access partition information in your execution logic. - Test partitioned assets: Ensure your partitioned assets work correctly by testing with different partition keys during development.