Solids

The core abstraction of Dagster is the solid. A solid is a functional unit of computation. It has defined inputs and outputs, and multiple solids can be wired together to form a Pipeline by defining dependencies between solid inputs and outputs.

A solid has a number of properties:

  • Coarse-grained and for use in batch computations.
  • Defines inputs and outputs, optionally typed within the Dagster type system.
  • Embeddable in a dependency graph (pipeline) that is constructed by connecting the inputs and outputs of multiple solids.
  • Emits a stream of typed, structured events such as expectations and materializations corresponding to the semantics of its computation.
  • Exposes self-describing, strongly typed configuration.
  • Testable and reusable.

Defining a solid

There are two ways to define a solid:

  1. Wrap a python function in the @solid decorator [Preferred]
  2. Construct a SolidDefinition object

Method 1: Using the decorator

To use the @solid decorator, wrap a function that takes a context argument as the first parameter. The context is provides access to system information such as resources and solid configuration. See Solid Context for more information.

solids.py
@solid
def my_solid(context):
    return 1

Method 2: Constructing the SolidDefinition object

To construct a SolidDefinition object, you need to pass the constructor a solid name, input definitions, output definitions, and a compute_fn. The compute function is the same as the function you would decorate using the @solid decorator.

solids.py
def _return_one(_context, inputs):
    yield Output(1)


solid = SolidDefinition(
    name="my_solid", input_defs=[], output_defs=[OutputDefinition(Int)], compute_fn=_return_one,
)

Solid inputs and outputs

Dependencies between solids in Dagster are defined using InputDefinitions and OutputDefinitions. Input and Output definitions are:

  • Named
  • Optionally typed
  • Optionally have human readable descriptions

Inputs:

Inputs are arguments to a solid's compute_fn, and are specified using InputDefinitionss. They can be passed from outputs of other solids, or stubbed using config.

A solid only executes once all of its inputs have been resolved, which means that the all of the outputs that the solid depends on have been successfully yielded.

The argument names of the compute_fn must match the InputDefinitionss names, and must be in the same order after the context argument.

For example, if we wanted a solid with an input of type str and an input of type int:

solids.py
@solid(input_defs=[InputDefinition("a", str), InputDefinition("b", int)])
def my_input_example_solid(context, a, b):
    pass

Outputs:

Outputs are yielded from a solid's compute_fn. A solid can yield multiple outputs.

solids.py
@solid(
    input_defs=[InputDefinition("a", int), InputDefinition("b", int)],
    output_defs=[OutputDefinition("sum", int), OutputDefinition("difference", int)],
)
def my_input_output_example_solid(context, a, b):
    yield Output(a + b, output_name="sum")
    yield Output(a - b, output_name="difference")

Solid context

A context object is passed as the first parameter to a solid's compute_fn. The context is an instance of SystemComputeExecutionContext, and provides access to:

  • solid configuration (context.solid_config)
  • loggers (context.log)
  • resources (context.resources)
  • run ID (context.run_id)

For example, to access the logger

solids.py
@solid
def my_logging_solid(context):
    context.log.info("Hello world")