Ops
The foundational unit of computation in Dagster.
Defining ops
- @dagster.op
Create an op with the specified parameters from the decorated function.
Ins and outs will be inferred from the type signature of the decorated function if not explicitly provided.
The decorated function will be used as the op’s compute function. The signature of the decorated function is more flexible than that of the
compute_fn
in the core API; it may:- Return a value. This value will be wrapped in an
Output
and yielded by the compute function. - Return an
Output
. This output will be yielded by the compute function. - Yield
Output
or other event objectsevent objects
. Same as default compute behavior. Note that options 1) and 2) are incompatible with yielding other events – if you would like to decorate a function that yields events, it must also wrap its eventual output in anOutput
and yield it.
@op supports
async def
functions as well, including async generators when yielding multiple events or outputs. Note that async ops will generally be run on their own unless using a customExecutor
implementation that supports running them together.Parameters:
- name (Optional[str]) – Name of op. Must be unique within any
GraphDefinition
using the op. - description (Optional[str]) – Human-readable description of this op. If not provided, and the decorated function has docstring, that docstring will be used as the description.
- ins (Optional[Dict[str, In]]) – Information about the inputs to the op. Information provided here will be combined with what can be inferred from the function signature.
- out (Optional[Union[Out, Dict[str, Out]]]) – Information about the op outputs. Information provided here will be combined with what can be inferred from the return type signature if the function does not use yield.
- config_schema (Optional[ConfigSchema) – The schema for the config. If set, Dagster will check that config provided for the op matches this schema and fail if it does not. If not set, Dagster will accept any config provided for the op.
- required_resource_keys (Optional[Set[str]]) – Set of resource handles required by this op.
- tags (Optional[Dict[str, Any]]) – Arbitrary metadata for the op. Frameworks may expect and require certain metadata to be attached to a op. Values that are not strings will be json encoded and must meet the criteria that json.loads(json.dumps(value)) == value.
- code_version (Optional[str]) – Version of the logic encapsulated by the op. If set, this is used as a default version for all outputs.
- retry_policy (Optional[RetryPolicy]) – The retry policy for this op.
Examples:
@op
def hello_world():
print('hello')
@op
def echo(msg: str) -> str:
return msg
@op(
ins={'msg': In(str)},
out=Out(str)
)
def echo_2(msg): # same as above
return msg
@op(
out={'word': Out(), 'num': Out()}
)
def multi_out() -> Tuple[str, int]:
return 'cool', 4- Return a value. This value will be wrapped in an
class
dagster.OpDefinitionDefines an op, the functional unit of user-defined computation.
End users should prefer the
@op
decorator. OpDefinition is generally intended to be used by framework authors or for programatically generated ops.Parameters:
-
name (str) – Name of the op. Must be unique within any
GraphDefinition
orJobDefinition
that contains the op. -
input_defs (List[InputDefinition]) – Inputs of the op.
-
compute_fn (Callable) –
The core of the op, the function that performs the actual computation. The signature of this function is determined by
input_defs
, and optionally, an injected first argument,context
, a collection of information provided by the system. -
output_defs (List[OutputDefinition]) – Outputs of the op.
-
config_schema (Optional[ConfigSchema) – The schema for the config. If set, Dagster will check that the config provided for the op matches this schema and will fail if it does not. If not set, Dagster will accept any config provided for the op.
-
description (Optional[str]) – Human-readable description of the op.
-
tags (Optional[Dict[str, Any]]) – Arbitrary metadata for the op. Frameworks may expect and require certain metadata to be attached to a op. Users should generally not set metadata directly. Values that are not strings will be json encoded and must meet the criteria that json.loads(json.dumps(value)) == value.
-
required_resource_keys (Optional[Set[str]]) – Set of resources handles required by this op.
-
code_version (Optional[str]) – Version of the code encapsulated by the op. If set, this is used as a default code version for all outputs.
-
retry_policy (Optional[RetryPolicy]) – The retry policy for this op.
-
pool (Optional[str]) – A string that identifies the pool that governs this op’s execution.
Examples:
def _add_one(_context, inputs):
yield Output(inputs["num"] + 1)
OpDefinition(
name="add_one",
ins={"num": In(int)},
outs={"result": Out(int)},
compute_fn=_add_one,
)- alias
Creates a copy of this op with the given name.
- tag
Creates a copy of this op with the given tags.
- with_hooks
Creates a copy of this op with the given hook definitions.
- with_retry_policy
Creates a copy of this op with the given retry policy.
property
config_schemaThe config schema for this op.
Type: IDefinitionConfigSchema
property
insA mapping from input name to the In object that represents that input.
Type: Mapping[str, In]
property
nameThe name of this op.
Type: str
property
outsA mapping from output name to the Out object that represents that output.
Type: Mapping[str, Out]
property
required_resource_keysA set of keys for resources that must be provided to this OpDefinition.
Type: AbstractSet[str]
property
retry_policyThe RetryPolicy for this op.
Type: Optional[RetryPolicy]
property
tagsThe tags for this op.
Type: Mapping[str, str]
property
version- deprecated
This API will be removed in version 2.0. Use
code_version
instead..Version of the code encapsulated by the op. If set, this is used as a default code version for all outputs.
Type: str
-
Ins & outs
class
dagster.InDefines an argument to an op’s compute function.
Inputs may flow from previous op’s outputs, or be stubbed using config. They may optionally be typed using the Dagster type system.
Parameters:
- dagster_type (Optional[Union[Type, DagsterType]]]) – The type of this input. Should only be set if the correct type can not be inferred directly from the type signature of the decorated function.
- description (Optional[str]) – Human-readable description of the input.
- default_value (Optional[Any]) – The default value to use if no input is provided.
- metadata (Optional[Dict[str, RawMetadataValue]]) – A dict of metadata for the input.
- asset_key (Optional[Union[AssetKey, InputContext -> AssetKey]]) – An AssetKey (or function that produces an AssetKey from the InputContext) which should be associated with this In. Used for tracking lineage information through Dagster.
- asset_partitions (Optional[Union[Set[str], InputContext -> Set[str]]]) – A set of partitions of the given asset_key (or a function that produces this list of partitions from the InputContext) which should be associated with this In.
- input_manager_key (Optional[str]) – The resource key for the
InputManager
used for loading this input when it is not connected to an upstream output.
class
dagster.OutDefines an output from an op’s compute function.
Ops can have multiple outputs, in which case outputs cannot be anonymous.
Many ops have only one output, in which case the user can provide a single output definition that will be given the default name, “result”.
Outs may be typed using the Dagster type system.
Parameters:
- dagster_type (Optional[Union[Type, DagsterType]]]) – The type of this output. Should only be set if the correct type can not be inferred directly from the type signature of the decorated function.
- description (Optional[str]) – Human-readable description of the output.
- is_required (bool) – Whether the presence of this field is required. (default: True)
- io_manager_key (Optional[str]) – The resource key of the output manager used for this output. (default: “io_manager”).
- metadata (Optional[Dict[str, Any]]) – A dict of the metadata for the output. For example, users can provide a file path if the data object will be stored in a filesystem, or provide information of a database table when it is going to load the data into the table.
- code_version (Optional[str]) – Version of the code that generates this output. In general, versions should be set only for code that deterministically produces the same output when given the same inputs.
Execution
class
dagster.RetryPolicyA declarative policy for when to request retries when an exception occurs during op execution.
Parameters:
- max_retries (int) – The maximum number of retries to attempt. Defaults to 1.
- delay (Optional[Union[int,float]]) – The time in seconds to wait between the retry being requested and the next attempt being started. This unit of time can be modulated as a function of attempt number with backoff and randomly with jitter.
- backoff (Optional[Backoff]) – A modifier for delay as a function of retry attempt number.
- jitter (Optional[Jitter]) – A randomizing modifier for delay, applied after backoff calculation.
class
dagster.BackoffA modifier for delay as a function of attempt number.
LINEAR: attempt_num * delay EXPONENTIAL: ((2 ^ attempt_num) - 1) * delay
class
dagster.JitterA randomizing modifier for delay, applied after backoff calculation.
FULL: between 0 and the calculated delay based on backoff: random() * backoff_delay PLUS_MINUS: +/- the delay: backoff_delay + ((2 * (random() * delay)) - delay)
Events
The objects that can be yielded by the body of ops’ compute functions to communicate with the Dagster framework.
(Note that Failure
and RetryRequested
are intended to be raised from ops rather than yielded.)
Event types
class
dagster.OutputEvent corresponding to one of an op’s outputs.
Op compute functions must explicitly yield events of this type when they have more than one output, or when they also yield events of other types, or when defining a op using the
OpDefinition
API directly.Outputs are values produced by ops that will be consumed by downstream ops in a job. They are type-checked at op boundaries when their corresponding
Out
or the downstreamIn
is typed.Parameters:
- value (Any) – The value returned by the compute function.
- output_name (str) – Name of the corresponding Out. (default: “result”)
- metadata (Optional[Dict[str, Union[str, float, int, MetadataValue]]]) – Arbitrary metadata about the output. Keys are displayed string labels, and values are one of the following: string, float, int, JSON-serializable dict, JSON-serializable list, and one of the data classes returned by a MetadataValue static method.
- data_version (Optional[DataVersion]) – beta (This parameter is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.) (Beta) A data version to manually set for the asset.
- tags (Optional[Mapping[str, str]]) – Tags that will be attached to the asset materialization event corresponding to this output, if there is one.
property
data_versionA data version that was manually set on the Output.
Type: Optional[DataVersion]
property
output_nameName of the corresponding
Out
.Type: str
property
valueThe value returned by the compute function.
Type: Any
class
dagster.AssetMaterializationEvent indicating that an op has materialized an asset.
Op compute functions may yield events of this type whenever they wish to indicate to the Dagster framework (and the end user) that they have produced a materialized value as a side effect of computation. Unlike outputs, asset materializations can not be passed to other ops, and their persistence is controlled by op logic, rather than by the Dagster framework.
Op authors should use these events to organize metadata about the side effects of their computations, enabling tooling like the Assets dashboard in the Dagster UI.
Parameters:
- asset_key (Union[str, List[str], AssetKey]) – A key to identify the materialized asset across job runs
- description (Optional[str]) – A longer human-readable description of the materialized value.
- partition (Optional[str]) – The name of the partition that was materialized.
- tags (Optional[Mapping[str, str]]) – A mapping containing tags for the materialization.
- metadata (Optional[Dict[str, RawMetadataValue]]) – Arbitrary metadata about the asset. Keys are displayed string labels, and values are one of the following: string, float, int, JSON-serializable dict, JSON-serializable list, and one of the data classes returned by a MetadataValue static method.
static
fileStatic constructor for standard materializations corresponding to files on disk.
Parameters:
- path (str) – The path to the file.
- description (Optional[str]) – A human-readable description of the materialization.
class
dagster.ExpectationResult- deprecated
This API will be removed in version 2.0. If using assets, use AssetCheckResult and @asset_check instead..
Event corresponding to a data quality test.
Op compute functions may yield events of this type whenever they wish to indicate to the Dagster framework (and the end user) that a data quality test has produced a (positive or negative) result.
Parameters:
- success (bool) – Whether the expectation passed or not.
- label (Optional[str]) – Short display name for expectation. Defaults to “result”.
- description (Optional[str]) – A longer human-readable description of the expectation.
- metadata (Optional[Dict[str, RawMetadataValue]]) – Arbitrary metadata about the failure. Keys are displayed string labels, and values are one of the following: string, float, int, JSON-serializable dict, JSON-serializable list, and one of the data classes returned by a MetadataValue static method.
class
dagster.TypeCheckEvent corresponding to a successful typecheck.
Events of this type should be returned by user-defined type checks when they need to encapsulate additional metadata about a type check’s success or failure. (i.e., when using
as_dagster_type()
,@usable_as_dagster_type
, or the underlyingPythonObjectDagsterType()
API.)Op compute functions should generally avoid yielding events of this type to avoid confusion.
Parameters:
- success (bool) –
True
if the type check succeeded,False
otherwise. - description (Optional[str]) – A human-readable description of the type check.
- metadata (Optional[Dict[str, RawMetadataValue]]) – Arbitrary metadata about the failure. Keys are displayed string labels, and values are one of the following: string, float, int, JSON-serializable dict, JSON-serializable list, and one of the data classes returned by a MetadataValue static method.
- success (bool) –
class
dagster.FailureEvent indicating op failure.
Raise events of this type from within op compute functions or custom type checks in order to indicate an unrecoverable failure in user code to the Dagster machinery and return structured metadata about the failure.
Parameters:
- description (Optional[str]) – A human-readable description of the failure.
- metadata (Optional[Dict[str, RawMetadataValue]]) – Arbitrary metadata about the failure. Keys are displayed string labels, and values are one of the following: string, float, int, JSON-serializable dict, JSON-serializable list, and one of the data classes returned by a MetadataValue static method.
- allow_retries (Optional[bool]) – Whether this Failure should respect the retry policy or bypass it and immediately fail. Defaults to True, respecting the retry policy and allowing retries.
class
dagster.RetryRequestedAn exception to raise from an op to indicate that it should be retried.
Parameters:
- max_retries (Optional[int]) – The max number of retries this step should attempt before failing
- seconds_to_wait (Optional[Union[float,int]]) – Seconds to wait before restarting the step after putting the step in to the up_for_retry state
Example:
@op
def flakes():
try:
flakey_operation()
except Exception as e:
raise RetryRequested(max_retries=3) from e