Ask AI

Great Expectations (dagster-ge)

dagster_ge.ge_validation_op_factory(name, datasource_name, data_connector_name, data_asset_name, suite_name, batch_identifiers, input_dagster_type=<dagster._core.types.dagster_type.DagsterType object>, runtime_method_type='batch_data', extra_kwargs=None)[source]

Generates ops for interacting with Great Expectations.

Parameters:
  • name (str) – the name of the op

  • datasource_name (str) – the name of your DataSource, see your great_expectations.yml

  • data_connector_name (str) – the name of the data connector for this datasource. This should point to a RuntimeDataConnector. For information on how to set this up, see: https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/how_to_create_a_batch_of_data_from_an_in_memory_spark_or_pandas_dataframe

  • data_asset_name (str) – the name of the data asset that this op will be validating.

  • suite_name (str) – the name of your expectation suite, see your great_expectations.yml

  • batch_identifier_fn (dict) – A dicitonary of batch identifiers to uniquely identify this batch of data. To learn more about batch identifiers, see: https://docs.greatexpectations.io/docs/reference/datasources#batches.

  • input_dagster_type (DagsterType) – the Dagster type used to type check the input to the op. Defaults to dagster_pandas.DataFrame.

  • runtime_method_type (str) – how GE should interperet the op input. One of (“batch_data”, “path”, “query”). Defaults to “batch_data”, which will interperet the input as an in-memory object.

  • extra_kwargs (Optional[dict]) –

    adds extra kwargs to the invocation of ge_data_context’s get_validator method. If not set, input will be:

    {
        "datasource_name": datasource_name,
        "data_connector_name": data_connector_name,
        "data_asset_name": data_asset_name,
        "runtime_parameters": {
            "<runtime_method_type>": <op input>
        },
        "batch_identifiers": batch_identifiers,
        "expectation_suite_name": suite_name,
    }
    

Returns:

An op that takes in a set of data and yields both an expectation with relevant metadata and an output with all the metadata (for user processing)