Pandas (dagster-pandas)
The dagster_pandas library provides utilities for using pandas with Dagster and for implementing validation on pandas DataFrames. A good place to start with dagster_pandas is the validation guide.
- dagster_pandas.create_dagster_pandas_dataframe_type
- beta
This API is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.
Constructs a custom pandas dataframe dagster type.
Parameters:
- name (str) – Name of the dagster pandas type.
- description (Optional[str]) – A markdown-formatted string, displayed in tooling.
- columns (Optional[List[PandasColumn]]) – A list of
PandasColumn
objects which express dataframe column schemas and constraints. - metadata_fn (Optional[Callable[[], Union[Dict[str, Union[str, float, int, Dict, MetadataValue]]) – A callable which takes your dataframe and returns a dict with string label keys and MetadataValue values.
- dataframe_constraints (Optional[List[DataFrameConstraint]]) – A list of objects that inherit from
DataFrameConstraint
. This allows you to express dataframe-level constraints. - loader (Optional[DagsterTypeLoader]) – An instance of a class that inherits from
DagsterTypeLoader
. If None, we will default to using dataframe_loader.
- class dagster_pandas.RowCountConstraint
- beta
This API is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.
A dataframe constraint that validates the expected count of rows.
Parameters:
- num_allowed_rows (int) – The number of allowed rows in your dataframe.
- error_tolerance (Optional[int]) – The acceptable threshold if you are not completely certain. Defaults to 0.
- class dagster_pandas.StrictColumnsConstraint
- beta
This API is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.
A dataframe constraint that validates column existence and ordering.
Parameters:
- strict_column_list (List[str]) – The exact list of columns that your dataframe must have.
- enforce_ordering (Optional[bool]) – If true, will enforce that the ordering of column names must match. Default is False.
- class dagster_pandas.PandasColumn
- beta
This API is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.
The main API for expressing column level schemas and constraints for your custom dataframe types.
Parameters:
- name (str) – Name of the column. This must match up with the column name in the dataframe you expect to receive.
- is_required (Optional[bool]) – Flag indicating the optional/required presence of the column. If th column exists, the validate function will validate the column. Defaults to True.
- constraints (Optional[List[Constraint]]) – List of constraint objects that indicate the validation rules for the pandas column.
- dagster_pandas.DataFrame
=
<dagster._core.types.dagster_type.DagsterType object> Define a type in dagster. These can be used in the inputs and outputs of ops.
Parameters:
-
type_check_fn (Callable[[TypeCheckContext, Any], [Union[bool, TypeCheck]]]) – The function that defines the type check. It takes the value flowing through the input or output of the op. If it passes, return either
True
or aTypeCheck
withsuccess
set toTrue
. If it fails, return eitherFalse
or aTypeCheck
withsuccess
set toFalse
. The first argument must be namedcontext
(or, if unused,_
,_context
, orcontext_
). Userequired_resource_keys
for access to resources. -
key (Optional[str]) –
The unique key to identify types programmatically. The key property always has a value. If you omit key to the argument to the init function, it instead receives the value of
name
. If neitherkey
norname
is provided, aCheckError
is thrown.In the case of a generic type such as
List
orOptional
, this is generated programmatically based on the type parameters. -
name (Optional[str]) – A unique name given by a user. If
key
isNone
,key
becomes this value. Name is not given in a case where the user does not specify a unique name for this type, such as a generic class. -
description (Optional[str]) – A markdown-formatted string, displayed in tooling.
-
loader (Optional[DagsterTypeLoader]) – An instance of a class that inherits from
DagsterTypeLoader
and can map config data to a value of this type. Specify this argument if you will need to shim values of this type using the config machinery. As a rule, you should use the@dagster_type_loader
decorator to construct these arguments. -
required_resource_keys (Optional[Set[str]]) – Resource keys required by the
type_check_fn
. -
is_builtin (bool) – Defaults to False. This is used by tools to display or filter built-in types (such as
String
,Int
) to visually distinguish them from user-defined types. Meant for internal use. -
kind (DagsterTypeKind) – Defaults to None. This is used to determine the kind of runtime type for InputDefinition and OutputDefinition type checking.
-
typing_type – Defaults to None. A valid python typing type (e.g. Optional[List[int]]) for the value contained within the DagsterType. Meant for internal use.
-