Great Expectations (dagster-ge)
- dagster_ge.ge_validation_op_factory
- beta
This API is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.
Generates ops for interacting with Great Expectations.
Parameters:
-
-
name (str) – the name of the op
-
datasource_name (str) – the name of your DataSource, see your great_expectations.yml
-
data_connector_name (str) – the name of the data connector for this datasource. This should point to a RuntimeDataConnector. For information on how to set this up, see: https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/how_to_create_a_batch_of_data_from_an_in_memory_spark_or_pandas_dataframe
-
data_asset_name (str) – the name of the data asset that this op will be validating.
-
suite_name (str) – the name of your expectation suite, see your great_expectations.yml
-
batch_identifier_fn (dict) – A dicitonary of batch identifiers to uniquely identify this batch of data. To learn more about batch identifiers, see: https://docs.greatexpectations.io/docs/reference/datasources#batches.
-
input_dagster_type (DagsterType) – the Dagster type used to type check the input to the op. Defaults to dagster_pandas.DataFrame.
-
runtime_method_type (str) – how GE should interperet the op input. One of (“batch_data”, “path”, “query”). Defaults to “batch_data”, which will interperet the input as an in-memory object. extra_kwargs (Optional[dict]) –
adds extra kwargs to the invocation of ge_data_context’s get_validator method. If not set, input will be:
{ "datasource_name": datasource_name, "data_connector_name": data_connector_name, "data_asset_name": data_asset_name, "runtime_parameters": { "<runtime_method_type>": <op input> }, "batch_identifiers": batch_identifiers, "expectation_suite_name": suite_name, }
-
Returns: An op that takes in a set of data and yields both an expectation with relevant metadata and an output with all the metadata (for user processing)
-