Modes and Resources

Modes and resources provide a way to control the behavior of multiple solids at pipeline execution time. For those familiar with dependency injection, modes and resources offer a similar capability, using pythonic idioms and Dagster’s configuration system. Each pipeline run uses exactly one mode, and each mode exposes a set of resources to the solids in that pipeline run.

A typical usage for modes is to vary pipeline behavior between different deployment environments. For example, you might define a “local_dev” mode for running pipelines on a laptop against synthetic data and a “prod” mode for running pipelines against production data in the cloud.

Modes affect solid behavior by providing resources to the solids. For example, the “local_dev” mode might reference a “SQLiteDatabase” resource that solids can execute queries against. The “prod” mode might instead reference an “PostgresDatabase” resource.

Defining Pipelines with Modes

Here’s what it looks like to define a pipeline with modes:

pipeline_with_modes.py
from dagster import ModeDefinition, pipeline

from .database_resources import postgres_database, sqlite_database
from .solids_with_resources import generate_table_1, generate_table_2


@pipeline(
    mode_defs=[
        ModeDefinition("local_dev", resource_defs={"database": sqlite_database}),
        ModeDefinition("prod", resource_defs={"database": postgres_database}),
    ],
)
def generate_tables_pipeline():
    generate_table_1()
    generate_table_2()

"database", the key in the ModeDefinition's resource_defs dict, is a "resource key". The same resource key can refer to a different ResourceDefinition in each mode - e.g. “database” can be a SQLiteDatabase in the local_dev mode and a PostgresDatabase in the production mode.

Accessing Resources in Solids

Solids use resource keys to access resources, like so:

solids_with_resources.py
from dagster import solid

CREATE_TABLE_1_QUERY = "create table_1 as select * from table_0"
CREATE_TABLE_2_QUERY = "create table_2 as select * from table_1"


@solid(required_resource_keys={"database"})
def generate_table_1(context):
    context.resources.database.execute_query(CREATE_TABLE_1_QUERY)


@solid(required_resource_keys={"database"})
def generate_table_2(context):
    context.resources.database.execute_query(CREATE_TABLE_2_QUERY)

After including the resource key in its set of required_resource_keys, the body of the solid can access the corresponding resource via the “resources” attribute of its context object.

Defining Resources

sqlite_database and postgres_database, in the ModeDefinition above, are ResourceDefinitions. Each describes how to instantiate a resource, given run config. Here's what it looks like to define these resources:

database_resources.py
from dagster import IntSource, StringSource, resource
from sqlalchemy import create_engine


@resource
def sqlite_database(_):
    class SQLiteDatabase:
        def execute_query(self, query):
            engine = create_engine("sqlite:///tmp.db")
            with engine.connect() as conn:
                conn.execute(query)

    return SQLiteDatabase()


@resource(
    config_schema={
        "hostname": StringSource,
        "port": IntSource,
        "username": StringSource,
        "password": StringSource,
        "db_name": StringSource,
    }
)
def postgres_database(init_context):
    class PostgresDatabase:
        def __init__(self, resource_config):
            self.hostname = resource_config["hostname"]
            self.port = resource_config["port"]
            self.username = resource_config["username"]
            self.password = resource_config["password"]
            self.db_name = resource_config["db_name"]

        def execute_query(self, query):
            engine = create_engine(
                f"postgresql://{self.username}:{self.password}@{self.hostname}:{self.port}/{self.db_name}"
            )
            with engine.connect() as conn:
                conn.execute(query)

    return PostgresDatabase(init_context.resource_config)

The @resource decorator decorates a function that returns a resource. The resource itself is a plain old python object - it doesn't need to inherit any particular interface or implement any particular functionality. ResourceDefinitions may have a config schema, which enables customizing them at runtime.

Selecting a Mode

When launching a pipeline via the Dagit Playground, you can select the mode from a dropdown. When launching a pipeline via the CLI, you can select the mode using the "-d" option. E.g.

dagster pipeline execute -d local_dev generate_tables_pipeline

Other Mode-Level Definitions

In addition to holding resource definitions, mode definitions also have arguments for logger_defs, system_storage_defs, and executor_defs. These arguments determine which LoggerDefinitions, SystemStorageDefinitions, and ExecutorDefinitions are available to the pipeline when executing in that mode. From the options made available from the ModeDefinition, the particular LoggerDefinition, SystemStorageDefinition, and ExecutorDefinition used for a pipeline run are chosen using the pipeline's run config.

Glossary

  • ResourceDefinition - defines how to instantiate a resource, given run config.
  • resource - a plain old python object instantiated from a ResourceDefinition and made available to solids at execution time.
  • @resource - a decorator that makes it easy to construct ResourceDefinitions. It’s used to decorate a function with a single init_context argument or a class whose constructor has a single init_context argument.
  • resource key - a string that’s used as a handle for a ResourceDefinition. A ModeDefinition maps resource keys to ResourceDefinitions, and solids access resources instantiated from those ResourceDefinitions using the same resource keys.