Using template UDFs
While there is no formal concept of user-defined functions (UDFs) in Dagster, it is possible to use template variables that define a function to accomplish this behavior.
This is a powerful feature that enables you to inject Python functions into your component YAML definitions. Those functions are evaluated at definition load time, and their output is injected into the YAML document. This can occur anywhere in the document, allowing you to "mix and match" Python and YAML seamlessly.
Why UDFs instead of template logic?
Many templating engines offer features like embedded conditionals ({% if %}
) and loops ({% for %}
) to handle complex use cases. UDFs are an alternative to this. Instead of using {% for %}
, you can invoke a Python function that has a for
loop.
This approach provides several advantages:
- Clean YAML: Nested conditionals and for loops often make template documents difficult to parse and reason about.
- Testing: You can write unit tests for your configuration logic.
- Reusability: You can reuse your UDFs across components.
- Full IDE support: Get autocomplete, type checking, and refactoring tools.
- Maintainability: Complex business rules are easier to understand and modify in Python.
Example: Dynamically generating tags
Let's walk through a common scenario where you might start with static YAML configuration, but need to evolve to dynamic generation.
Starting with static tags
Initially, you might have a simple component with hardcoded compliance tags:
type: dagster.PythonScriptComponent
attributes:
execution:
path: process_data.py
assets:
- key: processed_data
tags:
data_classification: internal
retention_days: "90"
pii_contains: "false"
This works well at first, but what happens when your compliance requirements become more complex? For example:
- Retention days need to be calculated based on data classification.
- Different rules apply when PII is present.
Evolving to dynamic tag generation
Instead of duplicating logic across multiple YAML files, you can create a template UDF that generates tags dynamically:
from typing import Callable
import dagster as dg
@dg.template_var
def generate_compliance_tags() -> Callable[[str, bool], dict[str, str]]:
"""Returns a function that generates compliance tags with computed retention logic.
This demonstrates how to push complex business logic into Python functions
instead of embedding it in template syntax.
"""
def _generate_compliance_tags(
classification: str, has_pii: bool = False
) -> dict[str, str]:
# Complex business logic with full Python tooling support
retention_mapping = {
"public": 30,
"internal": 90,
"confidential": 180,
"restricted": 365,
}
base_retention = retention_mapping.get(classification, 90)
# Increase retention if PII is present
if has_pii:
base_retention *= 2
return {
"data_classification": classification,
"retention_days": str(base_retention),
"pii_contains": str(has_pii).lower(),
}
return _generate_compliance_tags
Now you can use this function in your component definition:
type: dagster.PythonScriptComponent
template_vars_module: .template_vars
attributes:
execution:
path: process_data.py
assets:
- key: processed_data
tags: "{{ generate_compliance_tags('internal', has_pii=false) }}"
Using UDFs to incrementally move to Python
The UDF approach allows you to start with simple declarative YAML, then incrementally move to Python without having to alter the schema of the target components.
While you could do this by implementing custom schema and then writing the equivalent code within the build_defs
function of a custom component, the UDF provides a much more more incremental and less disruptive approach:
- You do not have to create a custom component.
- You can use UDFs in definitions where they are needed, but still use the simple YAML declarations where they are not, while keeping the component types the same.
- You do not have to teach your users about new schema formats.