GCP (dagster_gcp)

BigQuery

class dagster_gcp.BigQueryError[source]
dagster_gcp.bigquery_resource ResourceDefinition[source]
dagster_gcp.bq_create_dataset(*args, **kwargs)[source]

BigQuery Create Dataset.

This solid encapsulates creating a BigQuery dataset.

Expects a BQ client to be provisioned in resources as context.resources.bigquery.

dagster_gcp.bq_delete_dataset(*args, **kwargs)[source]

BigQuery Delete Dataset.

This solid encapsulates deleting a BigQuery dataset.

Expects a BQ client to be provisioned in resources as context.resources.bigquery.

dagster_gcp.bq_solid_for_queries(sql_queries)[source]

Executes BigQuery SQL queries.

Expects a BQ client to be provisioned in resources as context.resources.bigquery.

dagster_gcp.import_df_to_bq(*args, **kwargs)[source]
dagster_gcp.import_file_to_bq(*args, **kwargs)[source]
dagster_gcp.import_gcs_paths_to_bq(*args, **kwargs)[source]

Dataproc

dagster_gcp.dataproc_solid(*args, **kwargs)[source]
dagster_gcp.dataproc_resource ResourceDefinition[source]

GCS

dagster_gcp.gcs.gcs_intermediate_storage IntermediateStorageDefinition[source]
dagster_gcp.gcs_resource ResourceDefinition[source]
dagster_gcp.gcs.gcs_pickle_io_manager IOManagerDefinition[source]

Persistent IO manager using GCS for storage.

Serializes objects via pickling. Suitable for objects storage for distributed executors, so long as each execution node has network connectivity and credentials for GCS and the backing bucket.

Attach this resource definition to a ModeDefinition in order to make it available to your pipeline:

pipeline_def = PipelineDefinition(
    mode_defs=[
        ModeDefinition(
            resource_defs={'io_manager': gcs_pickle_io_manager, 'gcs': gcs_resource, ...},
        ), ...
    ], ...
)

You may configure this storage as follows:

resources:
    io_manager:
        config:
            gcs_bucket: my-cool-bucket
            gcs_prefix: good/prefix-for-files-