Skip to main content

Creating and registering a component

info

dg and Dagster Components are under active development. You may encounter feature gaps, and the APIs may change. To report issues or give feedback, please join the #dg-components channel in the Dagster Community Slack.

The components system makes it easy to create new components that you and your teammates can reuse across your Dagster project.

In most cases, components map to a specific technology. For example, you might create a DockerScriptComponent that executes a script in a Docker container, or a SnowflakeQueryComponent that runs a query on Snowflake.

Prerequisites

Before creating and registering custom components, you will need to create a components-ready project.

Creating a new component

For this example, we'll create a ShellCommand component that executes a shell command.

1. Create the new component file

First, use the dg scaffold component command to scaffold the ShellCommand component:

dg scaffold component ShellCommand
Creating module at: /.../my-project/src/my_project/components
Scaffolded Dagster component at /.../my-project/src/my_project/components/shell_command.py.

This will add a new file to the components directory of your Dagster project that contains the basic structure for the new component:

components/shell_command.py
import dagster as dg

class ShellCommand(dg.Component, dg.Model, dg.Resolvable):
"""COMPONENT SUMMARY HERE.

COMPONENT DESCRIPTION HERE.
"""

# added fields here will define yaml schema via Model

def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
# Add definition construction logic here.
return dg.Definitions()
tip

Model is used to implement a YAML interface for a component. If your component only needs a Pythonic interface, you can use the --no-model flag when creating it:

dg scaffold component ShellCommand --no-model

This will allow you to implement an __init__ method for your class, either manually or by using @dataclasses.dataclass.

2. Update the component Python class

The next step is to define the information the component needs when it is instantiated.

The ShellCommand component will need the following to be defined:

  • The path to the shell script to be run
  • The assets the shell script is expected to produce

The ShellCommand class inherits from Resolvable, in addition to Component. Resolvable handles deriving a YAML schema for the ShellCommand class based on what the class is annotated with. To simplify common use cases, Dagster provides annotations for common bits of configuration, such as ResolvedAssetSpec, which will handle exposing a schema for defining AssetSpecs from YAML and resolving them before instantiating the component.

You can define the schema for the ShellCommand component and add it to the ShellCommand class as follows:

components/shell_command.py
from collections.abc import Sequence
from dataclasses import dataclass

import dagster as dg


@dataclass
class ShellCommand(dg.Component, dg.Resolvable):
"""Models a shell script as a Dagster asset."""

script_path: str
asset_specs: Sequence[dg.ResolvedAssetSpec]

def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions: ...

Additionally, you can include metadata for your component by overriding the get_spec method. This allows you to set fields like owners and tags that will be visible in the generated documentation:

components/shell_command.py
from collections.abc import Sequence

import dagster as dg


class ShellCommand(dg.Component, dg.Resolvable):
"""Models a shell script as a Dagster asset."""

@classmethod
def get_spec(cls) -> dg.ComponentTypeSpec:
return dg.ComponentTypeSpec(
owners=["John Dagster"],
tags=["shell", "script"],
)


def __init__(
self,
script_path: str,
asset_specs: Sequence[dg.ResolvedAssetSpec],
):
self.script_path = script_path
self.asset_specs = asset_specs

def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions: ...
tip

When defining a field on a component that isn't on the schema, or is of a different type, the components system allows you to provide custom resolution logic for that field. For more information, see "Providing resolution logic for non-standard types".

3. Update the build_defs method

Next, you'll need to define how to turn the component parameters into a Definitions object.

To do so, you will need to update the build_defs method, which is responsible for returning a Definitions object containing all definitions related to the component.

In this example, the build_defs method creates a @multi_asset that executes the provided shell script. By convention, the code to execute this asset is placed inside of a function called execute, which will make it easier for future developers to create subclasses of this component:

note

The @multi_asset decorator is used to provide the flexibility of assigning multiple assets using asset_spec to a single shell script execution as our shell script may produce more than one object.

components/shell_command.py
import subprocess
from collections.abc import Sequence
from dataclasses import dataclass
from pathlib import Path

import dagster as dg


@dataclass
class ShellCommand(dg.Component, dg.Resolvable):
"""Models a shell script as a Dagster asset."""

script_path: str
asset_specs: Sequence[dg.ResolvedAssetSpec]

def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
resolved_script_path = Path(context.path, self.script_path).absolute()

@dg.multi_asset(name=Path(self.script_path).stem, specs=self.asset_specs)
def _asset(context: dg.AssetExecutionContext):
self.execute(resolved_script_path, context)

return dg.Definitions(assets=[_asset])

def execute(self, resolved_script_path: Path, context: dg.AssetExecutionContext):
return subprocess.run(["sh", str(resolved_script_path)], check=True)

Registering a new component in your environment

Following the steps above will automatically register your component in your environment. To see your new component in the list of available components, run dg list components:

dg list components
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Key ┃ Summary ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ dagster.DefinitionsComponent │ An arbitrary set of Dagster definitions. │
├──────────────────────────────────────────────────┼──────────────────────────────────────────────────────┤
│ dagster.DefsFolderComponent │ A folder which may contain multiple submodules, each │
│ │ which define components. │
├──────────────────────────────────────────────────┼──────────────────────────────────────────────────────┤
│ my_project.components.shell_command.ShellCommand │ Models a shell script as a Dagster asset. │
└──────────────────────────────────────────────────┴──────────────────────────────────────────────────────┘

You can also view automatically generated documentation describing your new component by running dg dev to start the webserver and navigating to the Docs tab for your project's code location:

dg dev

Docs tab in Dagster webserver

Adding component definition to your project

After you create and register your new component, you can use it to add component definitions to your Dagster project with the dg scaffold defs command:

dg scaffold defs 'my_project.components.shell_command.ShellCommand' my_shell_command
Creating a component at /.../my-project/src/my_project/defs/my_shell_command.