Skip to main content

Great Expectations (dagster-ge)

dagster_ge.ge_validation_op_factory
beta

This API is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.

Generates ops for interacting with Great Expectations.

Parameters:

    • name (str) – the name of the op

    • datasource_name (str) – the name of your DataSource, see your great_expectations.yml

    • data_connector_name (str) – the name of the data connector for this datasource. This should point to a RuntimeDataConnector. For information on how to set this up, see: https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/how_to_create_a_batch_of_data_from_an_in_memory_spark_or_pandas_dataframe

    • data_asset_name (str) – the name of the data asset that this op will be validating.

    • suite_name (str) – the name of your expectation suite, see your great_expectations.yml

    • batch_identifier_fn (dict) – A dicitonary of batch identifiers to uniquely identify this batch of data. To learn more about batch identifiers, see: https://docs.greatexpectations.io/docs/reference/datasources#batches.

    • input_dagster_type (DagsterType) – the Dagster type used to type check the input to the op. Defaults to dagster_pandas.DataFrame.

    • runtime_method_type (str) – how GE should interperet the op input. One of (“batch_data”, “path”, “query”). Defaults to “batch_data”, which will interperet the input as an in-memory object. extra_kwargs (Optional[dict]) –

      adds extra kwargs to the invocation of ge_data_context’s get_validator method. If not set, input will be:

      {     "datasource_name": datasource_name,     "data_connector_name": data_connector_name,     "data_asset_name": data_asset_name,     "runtime_parameters": {         "<runtime_method_type>": <op input>     },     "batch_identifiers": batch_identifiers,     "expectation_suite_name": suite_name, }

Returns: An op that takes in a set of data and yields both an expectation with relevant metadata and an output with all the metadata (for user processing)