Execution

Executing Jobs

class dagster.JobDefinition(mode_def, graph_def, name=None, description=None, preset_defs=None, tags=None, hook_defs=None, op_retry_policy=None, version_strategy=None, _op_selection_data=None)[source]
execute_in_process(run_config=None, instance=None, partition_key=None, raise_on_error=True, op_selection=None)[source]

Execute the Job in-process, gathering results in-memory.

The executor_def on the Job will be ignored, and replaced with the in-process executor. If using the default io_manager, it will switch from filesystem to in-memory.

Parameters
  • (Optional[Dict[str (run_config) – The configuration for the run

  • Any]] – The configuration for the run

  • instance (Optional[DagsterInstance]) – The instance to execute against, an ephemeral one will be used if none provided.

  • partition_key – (Optional[str]) The string partition key that specifies the run config to execute. Can only be used to select run config for jobs with partitioned config.

  • raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur. Defaults to True.

  • op_selection (Optional[List[str]]) – A list of op selection queries (including single op names) to execute. For example: * ['some_op']: selects some_op itself. * ['*some_op']: select some_op and all its ancestors (upstream dependencies). * ['*some_op+++']: select some_op, all its ancestors, and its descendants (downstream dependencies) within 3 levels down. * ['*some_op', 'other_op_a', 'other_op_b+']: select some_op and all its ancestors, other_op_a itself, and other_op_b and its direct child ops.

Returns

ExecuteInProcessResult

Executing Graphs

class dagster.GraphDefinition(name, description=None, node_defs=None, dependencies=None, input_mappings=None, output_mappings=None, config=None, tags=None, **kwargs)[source]
execute_in_process(run_config=None, instance=None, resources=None, raise_on_error=True)[source]

Execute this graph in-process, collecting results in-memory.

Parameters
  • run_config (Optional[Dict[str, Any]]) – Run config to provide to execution. The configuration for the underlying graph should exist under the “ops” key.

  • instance (Optional[DagsterInstance]) – The instance to execute against, an ephemeral one will be used if none provided.

  • resources (Optional[Dict[str, Any]]) – The resources needed if any are required. Can provide resource instances directly, or resource definitions.

  • raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur. Defaults to True.

Returns

ExecuteInProcessResult

Execution results

class dagster.ExecuteInProcessResult(node_def, all_events, run_id, output_capture)[source]
property all_node_events

All dagster events from the in-process execution.

Type

List[DagsterEvent]

events_for_node(node_name)[source]

Retrieves all dagster events for a specific node.

Parameters

node_name (str) – The name of the node for which outputs should be retrieved.

Returns

A list of all dagster events associated with provided node name.

Return type

List[DagsterEvent]

output_for_node(node_str, output_name='result')[source]

Retrieves output value with a particular name from the in-process run of the job.

Parameters
  • node_str (str) – Name of the op/graph whose output should be retrieved. If the intended graph/op is nested within another graph, the syntax is outer_graph.inner_node.

  • output_name (Optional[str]) – Name of the output on the op/graph to retrieve. Defaults to result, the default output name in dagster.

Returns

The value of the retrieved output.

Return type

Any

output_value(output_name='result')[source]

Retrieves output of top-level job, if an output is returned.

If the top-level job has no output, calling this method will result in a DagsterInvariantViolationError.

Parameters

output_name (Optional[str]) – The name of the output to retrieve. Defaults to result, the default output name in dagster.

Returns

The value of the retrieved output.

Return type

Any

property run_id

The run id for the executed run

Type

str

property success

Whether execution was successful.

Type

bool

class dagster.DagsterEvent(event_type_value, pipeline_name, step_handle=None, solid_handle=None, step_kind_value=None, logging_tags=None, event_specific_data=None, message=None, pid=None, step_key=None)[source]

Events yielded by solid and pipeline execution.

Users should not instantiate this class.

event_type_value

Value for a DagsterEventType.

Type

str

pipeline_name
Type

str

solid_handle
Type

NodeHandle

step_kind_value

Value for a StepKind.

Type

str

logging_tags
Type

Dict[str, str]

event_specific_data

Type must correspond to event_type_value.

Type

Any

message
Type

str

pid
Type

int

step_key

DEPRECATED

Type

Optional[str]

property event_type

The type of this event.

Type

DagsterEventType

class dagster.DagsterEventType(value)[source]

The types of events that may be yielded by solid and pipeline execution.

ALERT_START = 'ALERT_START'
ALERT_SUCCESS = 'ALERT_SUCCESS'
ASSET_MATERIALIZATION = 'ASSET_MATERIALIZATION'
ASSET_STORE_OPERATION = 'ASSET_STORE_OPERATION'
ENGINE_EVENT = 'ENGINE_EVENT'
HANDLED_OUTPUT = 'HANDLED_OUTPUT'
HOOK_COMPLETED = 'HOOK_COMPLETED'
HOOK_ERRORED = 'HOOK_ERRORED'
HOOK_SKIPPED = 'HOOK_SKIPPED'
LOADED_INPUT = 'LOADED_INPUT'
LOGS_CAPTURED = 'LOGS_CAPTURED'
OBJECT_STORE_OPERATION = 'OBJECT_STORE_OPERATION'
PIPELINE_CANCELED = 'PIPELINE_CANCELED'
PIPELINE_CANCELING = 'PIPELINE_CANCELING'
PIPELINE_DEQUEUED = 'PIPELINE_DEQUEUED'
PIPELINE_ENQUEUED = 'PIPELINE_ENQUEUED'
PIPELINE_FAILURE = 'PIPELINE_FAILURE'
PIPELINE_START = 'PIPELINE_START'
PIPELINE_STARTING = 'PIPELINE_STARTING'
PIPELINE_SUCCESS = 'PIPELINE_SUCCESS'
RUN_CANCELED = 'PIPELINE_CANCELED'
RUN_CANCELING = 'PIPELINE_CANCELING'
RUN_DEQUEUED = 'PIPELINE_DEQUEUED'
RUN_ENQUEUED = 'PIPELINE_ENQUEUED'
RUN_FAILURE = 'PIPELINE_FAILURE'
RUN_START = 'PIPELINE_START'
RUN_STARTING = 'PIPELINE_STARTING'
RUN_SUCCESS = 'PIPELINE_SUCCESS'
STEP_EXPECTATION_RESULT = 'STEP_EXPECTATION_RESULT'
STEP_FAILURE = 'STEP_FAILURE'
STEP_INPUT = 'STEP_INPUT'
STEP_OUTPUT = 'STEP_OUTPUT'
STEP_RESTARTED = 'STEP_RESTARTED'
STEP_SKIPPED = 'STEP_SKIPPED'
STEP_START = 'STEP_START'
STEP_SUCCESS = 'STEP_SUCCESS'
STEP_UP_FOR_RETRY = 'STEP_UP_FOR_RETRY'

Reconstructable jobs

class dagster.reconstructable(target)[source]

Create a ReconstructablePipeline from a function that returns a PipelineDefinition/JobDefinition, or a function decorated with @pipeline/@job.

When your pipeline/job must cross process boundaries, e.g., for execution on multiple nodes or in different systems (like dagstermill), Dagster must know how to reconstruct the pipeline/job on the other side of the process boundary.

Passing a job created with ~dagster.GraphDefinition.to_job to reconstructable(), requires you to wrap that job’s definition in a module-scoped function, and pass that function instead:

from dagster import graph, reconstructable

@graph
def my_graph():
    ...

def define_my_job():
    return my_graph.to_job()

reconstructable(define_my_job)

This function implements a very conservative strategy for reconstruction, so that its behavior is easy to predict, but as a consequence it is not able to reconstruct certain kinds of pipelines or jobs, such as those defined by lambdas, in nested scopes (e.g., dynamically within a method call), or in interactive environments such as the Python REPL or Jupyter notebooks.

If you need to reconstruct objects constructed in these ways, you should use build_reconstructable_pipeline() instead, which allows you to specify your own reconstruction strategy.

Examples:

from dagster import job, reconstructable

@job
def foo_job():
    ...

reconstructable_foo_job = reconstructable(foo_job)


@graph
def foo():
    ...

def make_bar_job():
    return foo.to_job()

reconstructable_bar_job = reconstructable(make_bar_job)

Executors

dagster.in_process_executor ExecutorDefinition[source]

The in-process executor executes all steps in a single process.

For legacy pipelines, this will be the default executor. To select it explicitly, include the following top-level fragment in config:

execution:
  in_process:

Execution priority can be configured using the dagster/priority tag via solid/op metadata, where the higher the number the higher the priority. 0 is the default and both positive and negative numbers can be used.

dagster.multiprocess_executor ExecutorDefinition[source]

The multiprocess executor executes each step in an individual process.

Any job that does not specify custom executors will use the multiprocess_executor by default. For jobs or legacy pipelines, to configure the multiprocess executor, include a fragment such as the following in your config:

execution:
  multiprocess:
    config:
      max_concurrent: 4

The max_concurrent arg is optional and tells the execution engine how many processes may run concurrently. By default, or if you set max_concurrent to be 0, this is the return value of multiprocessing.cpu_count().

Execution priority can be configured using the dagster/priority tag via solid/op metadata, where the higher the number the higher the priority. 0 is the default and both positive and negative numbers can be used.

Contexts

class dagster.OpExecutionContext(step_execution_context)[source]
get_mapping_key()

Which mapping_key this execution is for if downstream of a DynamicOutput, otherwise None.

get_tag(key)

Get a logging tag.

Parameters

key (tag) – The tag to get.

Returns

The value of the tag, if present.

Return type

Optional[str]

property has_partition_key

Whether the current run is a partitioned run

has_tag(key)

Check if a logging tag is set.

Parameters

key (str) – The tag to check.

Returns

Whether the tag is set.

Return type

bool

property instance

The current Dagster instance

Type

DagsterInstance

property log

The log manager available in the execution context.

Type

DagsterLogManager

property mode_def

The mode of the current execution.

Type

ModeDefinition

property partition_key

The partition key for the current run.

Raises an error if the current run is not a partitioned run.

property pdb

Gives access to pdb debugging from within the op.

Example:

@op
def debug(context):
    context.pdb.set_trace()
Type

dagster.utils.forked_pdb.ForkedPdb

property pipeline_def

The currently executing pipeline.

Type

PipelineDefinition

property pipeline_name

The name of the currently executing pipeline.

Type

str

property pipeline_run

The current pipeline run

Type

PipelineRun

property resources

The currently available resources.

Type

Resources

property retry_number

Which retry attempt is currently executing i.e. 0 for initial attempt, 1 for first retry, etc.

property run_config

The run config for the current execution.

Type

dict

property run_id

The id of the current execution’s run.

Type

str

property solid_config

The parsed config specific to this solid.

property solid_def

The current solid definition.

Type

SolidDefinition

property step_launcher

The current step launcher, if any.

Type

Optional[StepLauncher]

dagster.build_op_context(resources=None, op_config=None, resources_config=None, instance=None, config=None, partition_key=None)[source]

Builds op execution context from provided parameters.

op is currently built on top of solid, and thus this function creates a SolidExecutionContext. build_op_context can be used as either a function or context manager. If there is a provided resource that is a context manager, then build_op_context must be used as a context manager. This function can be used to provide the context argument when directly invoking a op.

Parameters
  • resources (Optional[Dict[str, Any]]) – The resources to provide to the context. These can be either values or resource definitions.

  • config (Optional[Any]) – The op config to provide to the context.

  • instance (Optional[DagsterInstance]) – The dagster instance configured for the context. Defaults to DagsterInstance.ephemeral().

Examples

context = build_op_context()
op_to_invoke(context)

with build_op_context(resources={"foo": context_manager_resource}) as context:
    op_to_invoke(context)
class dagster.TypeCheckContext(run_id, log_manager, scoped_resources_builder, dagster_type)[source]

The context object available to a type check function on a DagsterType.

log

Centralized log dispatch from user code.

Type

DagsterLogManager

resources

An object whose attributes contain the resources available to this op.

Type

Any

run_id

The id of this job run.

Type

str

Job configuration

Run Config Schema

The run_config used for jobs has the following schema:

{
  # configuration for execution, required if executors require config
  execution: {
    # the name of one, and only one available executor, typically 'in_process' or 'multiprocess'
    __executor_name__: {
      # executor-specific config, if required or permitted
      config: {
        ...
      }
    }
  },

  # configuration for loggers, required if loggers require config
  loggers: {
    # the name of an available logger
    __logger_name__: {
      # logger-specific config, if required or permitted
      config: {
        ...
      }
    },
    ...
  },

  # configuration for resources, required if resources require config
  resources: {
    # the name of a resource
    __resource_name__: {
      # resource-specific config, if required or permitted
      config: {
        ...
      }
    },
    ...
  },

  # configuration for underlying ops, required if ops require config
  ops: {

    # these keys align with the names of the ops, or their alias in this job
    __op_name__: {

      # pass any data that was defined via config_field
      config: ...,

      # configurably specify input values, keyed by input name
      inputs: {
        __input_name__: {
          # if an dagster_type_loader is specified, that schema must be satisfied here;
          # scalar, built-in types will generally allow their values to be specified directly:
          value: ...
        }
      },

    }
  },

}