Creating and registering a component

The components system makes it easy to create new components that you and your teammates can reuse across your Dagster project.

In most cases, components map to a specific technology. For example, you might create a DockerScriptComponent that executes a script in a Docker container, or a SnowflakeQueryComponent that runs a query on Snowflake.

Prerequisites

Before creating and registering custom components, you will need to create a components-ready project.

Creating a new component

For this example, we'll create a ShellCommand component that executes a shell command.

1. Scaffold the new component file

First, scaffold the ShellCommand component. You can scaffold a component with either a YAML or Pythonic interface.

YAML interface
Pythonic interface

To scaffold a component with a YAML interface, use the dg scaffold component command:

dg scaffold component ShellCommand

Creating module at: /.../my-project/src/my_project/components
Scaffolded Dagster component at /.../my-project/src/my_project/components/shell_command.py.

The above command will add a new file to the components directory of your Dagster project that contains the basic structure for the new component.

The ShellCommand class inherits from Model, Component and Resolvable. Model is used to implement a YAML interface for the component, and makes the class that inherits from it (in this case, ShellCommand) into a Pydantic model:

src/my_project/components/shell_command.py
import dagster as dg

class ShellCommand(dg.Component, dg.Model, dg.Resolvable):
    """COMPONENT SUMMARY HERE.

    COMPONENT DESCRIPTION HERE.
    """

    # added fields here will define params when instantiated in Python, and yaml schema via Resolvable

    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
        # Add definition construction logic here.
        return dg.Definitions()

To scaffold a component with a Pythonic interface, use the dg scaffold component command with the --no-model flag:

dg scaffold component ShellCommand --no-model

Creating module at: /.../my-project/src/my_project/components
Scaffolded Dagster component at /.../my-project/src/my_project/components/shell_command.py.

The above command will add a new file to the components directory of your Dagster project that contains the basic structure for the new component.

Since this component only needs a Python interface, the ShellCommand class does not inherit from Model, and an empty __init__ method is included:

src/my_project/components/shell_command.py
import dagster as dg


class ShellCommand(dg.Component, dg.Resolvable):
    """COMPONENT SUMMARY HERE.

    COMPONENT DESCRIPTION HERE.
    """

    def __init__(
        self,
        # added arguments here will define yaml schema via Resolvable
    ):
        pass

    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
        # Add definition construction logic here.
        return dg.Definitions()

info

You can also use @dataclasses.dataclass to implement the __init__ method:

src/my_project/components/shell_command.py
from dataclasses import dataclass

import dagster as dg


@dataclass
class ShellCommand(dg.Component, dg.Resolvable):
    """COMPONENT SUMMARY HERE.

    COMPONENT DESCRIPTION HERE.
    """

    # Add schema fields here

    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
        # Add definition construction logic here.
        return dg.Definitions()

2. Define the component schema

The next step is to define the information the component will need when it is used. The ShellCommand component will need the following information:

The path to the shell script to be run (script_path)
The assets the shell script is expected to produce (asset_specs)

In this example, we annotate the ShellCommand class with script_path and asset_specs.

YAML interface
Pythonic interface

src/my_project/components/shell_command.py
from collections.abc import Sequence

import dagster as dg


class ShellCommand(dg.Component, dg.Model, dg.Resolvable):
    """Models a shell script as a Dagster asset."""

    script_path: str
    asset_specs: Sequence[dg.ResolvedAssetSpec]

    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions: ...

src/my_project/components/shell_command.py
from collections.abc import Sequence

import dagster as dg


class ShellCommand(dg.Component, dg.Resolvable):
    """Models a shell script as a Dagster asset."""

    def __init__(self, script_path: str, asset_specs: Sequence[dg.ResolvedAssetSpec]):
        self.script_path = script_path
        self.asset_specs = asset_specs


    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions: ...

Resolvable handles deriving a YAML schema for the class that inherits from it (in this case, ShellCommand) based on what the class is annotated with.

info

In the example above, we use the annotation asset_specs: Sequence[dg.ResolvedAssetSpec] because the ShellCommand component produces more than one AssetSpec.

If the component only produced one asset, the annotation would be asset_spec: ResolvedAssetSpec, and the Sequence import would be unnecessary.

Using Dagster models for common schema annotations

To simplify common use cases, Dagster provides models for common annotations, such as ResolvedAssetSpec, which handles exposing a schema for defining AssetSpecs from YAML and resolving them before instantiating the component.

The full list of models is:

For more information, see the Components Core Models API documentation.

3. (Optional) Add metadata to your component

You can optionally include metadata for your component by overriding the get_spec method. This allows you to set fields like owners and tags that will be visible in the generated documentation:

YAML interface
Pythonic interface

src/my_project/components/shell_command.py
from collections.abc import Sequence

import dagster as dg


class ShellCommand(dg.Component, dg.Model, dg.Resolvable):
    """Models a shell script as a Dagster asset."""

    script_path: str
    asset_specs: Sequence[dg.ResolvedAssetSpec]

    @classmethod
    def get_spec(cls) -> dg.ComponentTypeSpec:
        return dg.ComponentTypeSpec(
            owners=["john@dagster.io"],
            tags=["shell", "script"],
        )


    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions: ...

src/my_project/components/shell_command.py
from collections.abc import Sequence

import dagster as dg


class ShellCommand(dg.Component, dg.Resolvable):
    """Models a shell script as a Dagster asset."""

    def __init__(
        self,
        script_path: str,
        asset_specs: Sequence[dg.ResolvedAssetSpec],
    ):
        self.script_path = script_path
        self.asset_specs = asset_specs

    @classmethod
    def get_spec(cls) -> dg.ComponentTypeSpec:
        return dg.ComponentTypeSpec(
            owners=["john@dagster.io"],
            tags=["shell", "script"],
        )


    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions: ...

4. Update the `build_defs` method

Finally, you'll need to define how to turn the component parameters into a Definitions object.

To do so, you will need to update the build_defs method, which is responsible for returning a Definitions object containing all definitions related to the component.

In this example, the build_defs method creates a @multi_asset that executes the provided shell script. By convention, the code to execute this asset is placed inside of a function called execute, which will make it easier for future developers to create subclasses of this component:

note

The @multi_asset decorator is used to provide the flexibility of assigning multiple assets using asset_spec to a single shell script execution as our shell script may produce more than one object.

YAML interface
Pythonic interface

src/my_project/components/shell_command.py
import subprocess
from collections.abc import Sequence
from pathlib import Path

import dagster as dg


class ShellCommand(dg.Component, dg.Model, dg.Resolvable):
    """Models a shell script as a Dagster asset."""

    script_path: str
    asset_specs: Sequence[dg.ResolvedAssetSpec]

    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
        resolved_script_path = Path(context.path, self.script_path).absolute()

        @dg.multi_asset(name=Path(self.script_path).stem, specs=self.asset_specs)
        def _asset(context: dg.AssetExecutionContext):
            self.execute(resolved_script_path, context)

        return dg.Definitions(assets=[_asset])


    def execute(self, resolved_script_path: Path, context: dg.AssetExecutionContext):
        return subprocess.run(["sh", str(resolved_script_path)], check=True)

src/my_project/components/shell_command.py
import subprocess
from collections.abc import Sequence
from pathlib import Path

import dagster as dg


class ShellCommand(dg.Component, dg.Resolvable):
    """Models a shell script as a Dagster asset."""

    def __init__(self, script_path: str, asset_specs: Sequence[dg.ResolvedAssetSpec]):
        self.script_path = script_path
        self.asset_specs = asset_specs

    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
        resolved_script_path = Path(context.path, self.script_path).absolute()

        @dg.multi_asset(name=Path(self.script_path).stem, specs=self.asset_specs)
        def _asset(context: dg.AssetExecutionContext):
            self.execute(resolved_script_path, context)

        return dg.Definitions(assets=[_asset])


    def execute(self, resolved_script_path: Path, context: dg.AssetExecutionContext):
        return subprocess.run(["sh", str(resolved_script_path)], check=True)

Registering a new component in your environment

Following the steps above will automatically register your component in your environment. To see your new component in the list of available components, run dg list components:

dg list components

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Key                                              ┃ Summary                                                           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ dagster.DefinitionsComponent                     │ An arbitrary set of Dagster definitions.                          │
├──────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
│ dagster.DefsFolderComponent                      │ A component that represents a directory containing multiple       │
│                                                  │ Dagster definition modules.                                       │
├──────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
│ dagster.FunctionComponent                        │ Represents a Python function, alongside the set of assets or      │
│                                                  │ asset checks that it is responsible for executing.                │
├──────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
│ dagster.PythonScriptComponent                    │ Represents a Python script, alongside the set of assets and asset │
│                                                  │ checks that it is responsible for executing.                      │
├──────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
│ dagster.TemplatedSqlComponent                    │ A component which executes templated SQL from a string or file.   │
├──────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
│ dagster.UvRunComponent                           │ Represents a Python script, alongside the set of assets or asset  │
│                                                  │ checks that it is responsible for executing.                      │
├──────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
│ my_project.components.shell_command.ShellCommand │ Models a shell script as a Dagster asset.                         │
└──────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────┘

You can also view automatically generated documentation describing your new component by running dg dev to start the webserver and navigating to the Docs tab for your project's code location:

dg dev

Docs tab in Dagster webserver

Adding component definitions to your project

After you create and register your new component, you can use it to add component definitions to your Dagster project with the dg scaffold defs command:

dg scaffold defs 'my_project.components.shell_command.ShellCommand' my_shell_command

Creating defs at /.../my-project/src/my_project/defs/my_shell_command.

Creating a new component​

1. Scaffold the new component file​

2. Define the component schema​

Using Dagster models for common schema annotations​

3. (Optional) Add metadata to your component​

4. Update the build_defs method​

Registering a new component in your environment​

Adding component definitions to your project​