Adding a component to a project
Finding a component
You can view the available component types in your environment by running the following command:
dg component-type list
This will display a list of all the component types that are available in your project. If you'd like to see more information about a specific component, you can run:
dg component-type docs <component-name>
This will display a webpage containing documentation for the specified component type.
Scaffolding a component
Once you've selected the component type that you'd like to use, you can instantiate a new component by running:
dg component generate <component-type> <component-name>
This will create a new directory underneath your components/
folder that contains a component.yaml
file. Some components may also generate additional files as needed.
Basic configuration
The component.yaml
is the primary configuration file for a component. It contains two top-level fields:
type
: The type of the component defined in this directoryparams
: A dictionary of parameters that are specific to this component type. The schema for these params is defined by theget_schema
method on the component class.
To see a sample component.yaml
file for your specific component, you can run:
dg component-type docs <component-name>
Component templating
Each component.yaml
file supports a rich templating syntax, powered by jinja2
.
Templating environment variables
A common use case for templating is to avoid exposing environment variables (particularly secrets) in your yaml files. The Jinja scope for a component.yaml
file contains an env
function which can be used to insert environment variables into the template.
component_type: my_snowflake_component
params:
account: {{ env('SNOWFLAKE_ACCOUNT') }}
password: {{ env('SNOWFLAKE_PASSWORD') }}
Customizing a component
Sometimes, you may want to customize the behavior of a component beyond what is available in the component.yaml
file.
To do this, you can create a subclass of your desired component in the same directory as your component.yaml
file. By convention, this subclass should be created in a file named component.py
.
This subclass should be annotated with the @component_type
decorator, which will define a local name for this component:
from dagster_components import registered_component_type
from dagster_components.lib import SlingReplicationCollection
@registered_component_type(name="custom_subclass")
class CustomSubclass(SlingReplicationCollection): ...
You can then update the type:
field in your component.yaml
file to reference this new component type. The new type name will be .<component-name>
, where the leading .
indicates that this is a local component type.
type: .custom_subclass
params:
...
Customizing execution
By convention, most library components have an execute()
method that defines the core runtime behavior of the component. This can be overridden by subclasses of the component to customize this behavior.
For example, we can create a subclass of the SlingReplicationCollectionComponent
that adds a debug log message during execution:
from collections.abc import Iterator
from dagster_components import registered_component_type
from dagster_components.lib import SlingReplicationCollection
from dagster_sling import SlingResource
import dagster as dg
@registered_component_type(name="debug_sling_replication")
class DebugSlingReplicationComponent(SlingReplicationCollection):
def execute(
self, context: dg.AssetExecutionContext, sling: SlingResource
) -> Iterator:
context.log.info("*******************CUSTOM*************************")
return sling.replicate(context=context, debug=True)
Adding component-level templating scope
By default, the scopes available for use in the template are:
env
: A function that allows you to access environment variables.automation_condition
: A scope allowing you to access all static constructors of theAutomationCondition
class.
However, it can be useful to add additional scope options to your component type. For example, you may have a custom automation condition that you'd like to use in your component.
To do so, you can define a function that returns an AutomationCondition
and define a get_additional_scope
method on your subclass:
from collections.abc import Mapping
from typing import Any
from dagster_components import registered_component_type
from dagster_components.lib import SlingReplicationCollection
import dagster as dg
@registered_component_type(name="custom_subclass")
class SubclassWithScope(SlingReplicationCollection):
def get_additional_scope(self) -> Mapping[str, Any]:
def _custom_cron(cron_schedule: str) -> dg.AutomationCondition:
return (
dg.AutomationCondition.on_cron(cron_schedule)
& ~dg.AutomationCondition.in_progress()
)
return {"custom_cron": _custom_cron}
This can then be used in your component.yaml
file:
component_type: .custom_subclass
params:
...
transforms:
- attributes:
automation_condition: "{{ custom_cron('@daily') }}"