Skip to main content

Using template UDFs

While there is no formal concept of user-defined functions (UDFs) in Dagster, it is possible to use template variables that define a function to accomplish this behavior.

This is a powerful feature that enables you to inject Python functions into your component YAML definitions. Those functions are evaluated at definition load time, and their output is injected into the YAML document. This can occur anywhere in the document, allowing you to "mix and match" Python and YAML seamlessly.

Why UDFs instead of template logic?

Many templating engines offer features like embedded conditionals ({% if %}) and loops ({% for %}) to handle complex use cases. UDFs are an alternative to this. Instead of using {% for %}, you can invoke a Python function that has a for loop.

This approach provides several advantages:

  • Clean YAML: Nested conditionals and for loops often make template documents difficult to parse and reason about.
  • Testing: You can write unit tests for your configuration logic.
  • Reusability: You can reuse your UDFs across components.
  • Full IDE support: Get autocomplete, type checking, and refactoring tools.
  • Maintainability: Complex business rules are easier to understand and modify in Python.

Example: Dynamically generating tags

Let's walk through a common scenario where you might start with static YAML configuration, but need to evolve to dynamic generation.

Starting with static tags

Initially, you might have a simple component with hardcoded compliance tags:

static_defs.yaml
type: dagster.PythonScriptComponent
attributes:
execution:
path: process_data.py
assets:
- key: processed_data
tags:
data_classification: internal
retention_days: "90"
pii_contains: "false"

This works well at first, but what happens when your compliance requirements become more complex? For example:

  • Retention days need to be calculated based on data classification.
  • Different rules apply when PII is present.

Evolving to dynamic tag generation

Instead of duplicating logic across multiple YAML files, you can create a template UDF that generates tags dynamically:

template_vars.py
from typing import Callable

import dagster as dg


@dg.template_var
def generate_compliance_tags() -> Callable[[str, bool], dict[str, str]]:
"""Returns a function that generates compliance tags with computed retention logic.

This demonstrates how to push complex business logic into Python functions
instead of embedding it in template syntax.
"""

def _generate_compliance_tags(
classification: str, has_pii: bool = False
) -> dict[str, str]:
# Complex business logic with full Python tooling support
retention_mapping = {
"public": 30,
"internal": 90,
"confidential": 180,
"restricted": 365,
}

base_retention = retention_mapping.get(classification, 90)
# Increase retention if PII is present
if has_pii:
base_retention *= 2

return {
"data_classification": classification,
"retention_days": str(base_retention),
"pii_contains": str(has_pii).lower(),
}

return _generate_compliance_tags

Now you can use this function in your component definition:

dynamic_defs.yaml
type: dagster.PythonScriptComponent
template_vars_module: .template_vars
attributes:
execution:
path: process_data.py
assets:
- key: processed_data
tags: "{{ generate_compliance_tags('internal', has_pii=false) }}"

Using UDFs to incrementally move to Python

The UDF approach allows you to start with simple declarative YAML, then incrementally move to Python without having to alter the schema of the target components.

While you could do this by implementing custom schema and then writing the equivalent code within the build_defs function of a custom component, the UDF provides a much more more incremental and less disruptive approach:

  • You do not have to create a custom component.
  • You can use UDFs in definitions where they are needed, but still use the simple YAML declarations where they are not, while keeping the component types the same.
  • You do not have to teach your users about new schema formats.