Using Dagster with Fivetran

note

If you are just getting started with the Fivetran integration, we recommend using the new Fivetran component.

This guide provides instructions for using Dagster with Fivetran using the dagster-fivetran library. Your Fivetran connector tables can be represented as assets in the Dagster asset graph, allowing you to track lineage and dependencies between Fivetran assets and data assets you are already modeling in Dagster. You can also use Dagster to orchestrate Fivetran connectors, allowing you to trigger syncs for these on a cadence or based on upstream data changes.

note

Your Fivetran connectors must have been synced at least once to be represented in Dagster.

What you'll learn

How to represent Fivetran assets in the Dagster asset graph, including lineage to other Dagster assets.
How to customize asset definition metadata for these Fivetran assets.
How to materialize Fivetran connector tables from Dagster.
How to customize how Fivetran connector tables are materialized.

Prerequisites

The dagster and dagster-fivetran libraries installed in your environment
Familiarity with asset definitions and the Dagster asset graph
Familiarity with Dagster resources
Familiarity with Fivetran concepts, like connectors and connector tables
A Fivetran workspace
A Fivetran API key and API secret. For more information, see Getting Started in the Fivetran REST API documentation.

Set up your environment

To get started, you'll need to install the dagster and dagster-fivetran Python packages:

uv add dagster-fivetran

pip install dagster-fivetran

Represent Fivetran assets in the asset graph

To load Fivetran assets into the Dagster asset graph, you must first construct a FivetranWorkspace resource, which allows Dagster to communicate with your Fivetran workspace. You'll need to supply your account ID, API key and API secret. See Getting Started in the Fivetran REST API documentation for more information on how to create your API key and API secret.

Dagster can automatically load all connector tables from your Fivetran workspace as asset specs. Call the load_fivetran_asset_specs function, which returns list of AssetSpecs representing your Fivetran assets. You can then include these asset specs in your Definitions object:

from dagster_fivetran import FivetranWorkspace, load_fivetran_asset_specs

import dagster as dg

fivetran_workspace = FivetranWorkspace(
    account_id=dg.EnvVar("FIVETRAN_ACCOUNT_ID"),
    api_key=dg.EnvVar("FIVETRAN_API_KEY"),
    api_secret=dg.EnvVar("FIVETRAN_API_SECRET"),
)

fivetran_specs = load_fivetran_asset_specs(fivetran_workspace)
defs = dg.Definitions(assets=fivetran_specs, resources={"fivetran": fivetran_workspace})

Sync and materialize Fivetran assets

You can use Dagster to sync Fivetran connectors and materialize Fivetran connector tables. You can use the build_fivetran_assets_definitions factory to create all assets definitions for your Fivetran workspace.

note

When syncing a Fivetran connector via Dagster, all Fivetran assets for this connector are materialized in Dagster.

from dagster_fivetran import FivetranWorkspace, build_fivetran_assets_definitions

import dagster as dg

fivetran_workspace = FivetranWorkspace(
    account_id=dg.EnvVar("FIVETRAN_ACCOUNT_ID"),
    api_key=dg.EnvVar("FIVETRAN_API_KEY"),
    api_secret=dg.EnvVar("FIVETRAN_API_SECRET"),
)

all_fivetran_assets = build_fivetran_assets_definitions(workspace=fivetran_workspace)

defs = dg.Definitions(
    assets=all_fivetran_assets,
    resources={"fivetran": fivetran_workspace},
)

Customize the materialization of Fivetran assets

If you want to customize the sync of your connectors, you can use the fivetran_assets decorator to do so. This allows you to execute custom code before and after the call to the Fivetran sync.

from dagster_fivetran import FivetranWorkspace, fivetran_assets

import dagster as dg

fivetran_workspace = FivetranWorkspace(
    account_id=dg.EnvVar("FIVETRAN_ACCOUNT_ID"),
    api_key=dg.EnvVar("FIVETRAN_API_KEY"),
    api_secret=dg.EnvVar("FIVETRAN_API_SECRET"),
)


@fivetran_assets(
    connector_id="fivetran_connector_id",  # Replace with your connector ID
    name="fivetran_connector_name",  # Replace with your connection name
    group_name="fivetran_connector_name",
    workspace=fivetran_workspace,
)
def fivetran_connector_assets(
    context: dg.AssetExecutionContext, fivetran: FivetranWorkspace
):
    # Do something before the materialization...
    yield from fivetran.sync_and_poll(context=context)
    # Do something after the materialization...


defs = dg.Definitions(
    assets=[fivetran_connector_assets],
    resources={"fivetran": fivetran_workspace},
)

Customize asset definition metadata for Fivetran assets

By default, Dagster will generate asset specs for each Fivetran asset and populate default metadata. You can further customize asset properties by passing an instance of the custom DagsterFivetranTranslator to the load_fivetran_asset_specs function.

from dagster_fivetran import (
    DagsterFivetranTranslator,
    FivetranConnectorTableProps,
    FivetranWorkspace,
    load_fivetran_asset_specs,
)

import dagster as dg

fivetran_workspace = FivetranWorkspace(
    account_id=dg.EnvVar("FIVETRAN_ACCOUNT_ID"),
    api_key=dg.EnvVar("FIVETRAN_API_KEY"),
    api_secret=dg.EnvVar("FIVETRAN_API_SECRET"),
)


# A translator class lets us customize properties of the built
# Fivetran assets, such as the owners or asset key
class MyCustomFivetranTranslator(DagsterFivetranTranslator):
    def get_asset_spec(self, props: FivetranConnectorTableProps) -> dg.AssetSpec:
        # We create the default asset spec using super()
        default_spec = super().get_asset_spec(props)
        # We customize the metadata and asset key prefix for all assets
        return default_spec.replace_attributes(
            key=default_spec.key.with_prefix("prefix"),
        ).merge_attributes(metadata={"custom": "metadata"})


fivetran_specs = load_fivetran_asset_specs(
    fivetran_workspace, dagster_fivetran_translator=MyCustomFivetranTranslator()
)

defs = dg.Definitions(assets=fivetran_specs, resources={"fivetran": fivetran_workspace})

Note that super() is called in each of the overridden methods to generate the default asset spec. It is best practice to generate the default asset spec before customizing it.

You can pass an instance of the custom DagsterFivetranTranslator to the fivetran_assets decorator or the build_fivetran_assets_definitions factory.

Fetching column-level metadata for Fivetran assets

Dagster allows you to emit column-level metadata, like column schema and column lineage, as materialization metadata.

With this metadata, you can view documentation in Dagster for all columns in your Fivetran connector tables.

To enable this feature, call fetch_column_metadata() on the fivetran_event_iterator.FivetranEventIterator returned by the sync_and_poll() call on the FivetranWorkspace resource.

from dagster_fivetran import FivetranWorkspace, fivetran_assets

import dagster as dg

fivetran_workspace = FivetranWorkspace(
    account_id=dg.EnvVar("FIVETRAN_ACCOUNT_ID"),
    api_key=dg.EnvVar("FIVETRAN_API_KEY"),
    api_secret=dg.EnvVar("FIVETRAN_API_SECRET"),
)


@fivetran_assets(
    # Replace with your connector ID
    connector_id="fivetran_connector_id",
    workspace=fivetran_workspace,
)
def fivetran_connector_assets(
    context: dg.AssetExecutionContext, fivetran: FivetranWorkspace
):
    yield from fivetran.sync_and_poll(context=context).fetch_column_metadata()


defs = dg.Definitions(
    assets=[fivetran_connector_assets],
    resources={"fivetran": fivetran_workspace},
)

Load Fivetran assets for selected connectors

To select a subset of Fivetran connectors for which your Fivetran assets will be loaded, you can use the ConnectorSelectorFn callback and define your selection conditions.

from dagster_fivetran import FivetranWorkspace, build_fivetran_assets_definitions

import dagster as dg

fivetran_workspace = FivetranWorkspace(
    account_id=dg.EnvVar("FIVETRAN_ACCOUNT_ID"),
    api_key=dg.EnvVar("FIVETRAN_API_KEY"),
    api_secret=dg.EnvVar("FIVETRAN_API_SECRET"),
)

all_fivetran_assets = build_fivetran_assets_definitions(
    workspace=fivetran_workspace,
    connector_selector_fn=(
        lambda connector: connector.id in {"some_connector_id", "another_connector_id"}
    ),
)

defs = dg.Definitions(
    assets=all_fivetran_assets,
    resources={"fivetran": fivetran_workspace},
)

Load Fivetran assets using a snapshot

Fivetran assets can be loaded using the snapshot of a Fivetran workspace, which allows organizations with large amounts of Fivetran data to speed up their deployment process.

from dagster_fivetran import FivetranWorkspace, load_fivetran_asset_specs

import dagster as dg

fivetran_workspace = FivetranWorkspace(
    account_id=dg.EnvVar("FIVETRAN_ACCOUNT_ID"),
    api_key=dg.EnvVar("FIVETRAN_API_KEY"),
    api_secret=dg.EnvVar("FIVETRAN_API_SECRET"),
    snapshot_path=dg.EnvVar("FIVETRAN_SNAPSHOT_PATH"),
)

fivetran_specs = load_fivetran_asset_specs(workspace=fivetran_workspace)

defs = dg.Definitions(assets=fivetran_specs)

To capture the snapshot, the dagster-fivetran snapshot CLI can be used.

dagster-fivetran snapshot --python-module my_dagster_package --output-path snapshot.snap

Load Fivetran assets from multiple workspaces

Definitions from multiple Fivetran workspaces can be combined by instantiating multiple FivetranWorkspace resources and merging their specs. This lets you view all your Fivetran assets in a single asset graph:

from dagster_fivetran import FivetranWorkspace, load_fivetran_asset_specs

import dagster as dg

sales_fivetran_workspace = FivetranWorkspace(
    account_id=dg.EnvVar("FIVETRAN_SALES_ACCOUNT_ID"),
    api_key=dg.EnvVar("FIVETRAN_SALES_API_KEY"),
    api_secret=dg.EnvVar("FIVETRAN_SALES_API_SECRET"),
)
marketing_fivetran_workspace = FivetranWorkspace(
    account_id=dg.EnvVar("FIVETRAN_MARKETING_ACCOUNT_ID"),
    api_key=dg.EnvVar("FIVETRAN_MARKETING_API_KEY"),
    api_secret=dg.EnvVar("FIVETRAN_MARKETING_API_SECRET"),
)

sales_fivetran_specs = load_fivetran_asset_specs(sales_fivetran_workspace)
marketing_fivetran_specs = load_fivetran_asset_specs(marketing_fivetran_workspace)

# Merge the specs into a single set of definitions
defs = dg.Definitions(
    assets=[*sales_fivetran_specs, *marketing_fivetran_specs],
    resources={
        "marketing_fivetran": marketing_fivetran_workspace,
        "sales_fivetran": sales_fivetran_workspace,
    },
)

Define upstream dependencies

By default, Dagster does not set upstream dependencies when generating asset specs for your Fivetran assets. You can set upstream dependencies on your Fivetran assets by passing an instance of the custom DagsterFivetranTranslator to the load_fivetran_asset_specs function.

class MyCustomFivetranTranslator(DagsterFivetranTranslator):
    def get_asset_spec(self, props: FivetranConnectorTableProps) -> dg.AssetSpec:
        # We create the default asset spec using super()
        default_spec = super().get_asset_spec(props)
        # We set an upstream dependency for our assets
        return default_spec.replace_attributes(deps=["my_upstream_asset_key"])


fivetran_specs = load_fivetran_asset_specs(
    fivetran_workspace, dagster_fivetran_translator=MyCustomFivetranTranslator()
)

Note that super() is called in each of the overridden methods to generate the default asset spec. It is best practice to generate the default asset spec before customizing it.

You can pass an instance of the custom DagsterFivetranTranslator to the fivetran_assets decorator or the build_fivetran_assets_definitions factory.

Define downstream dependencies

Dagster allows you to define assets that are downstream of specific Fivetran tables using their asset keys. The asset key for a Fivetran table can be retrieved using the asset definitions created using the fivetran_assets decorator. The below example defines my_downstream_asset as a downstream dependency of my_fivetran_table:

@fivetran_assets(
    # Replace with your connector ID
    connector_id="fivetran_connector_id",
    workspace=fivetran_workspace,
)
def fivetran_connector_assets(
    context: dg.AssetExecutionContext, fivetran: FivetranWorkspace
): ...


my_fivetran_table_asset_key = next(
    iter(
        [
            spec.key
            for spec in fivetran_connector_assets.specs
            if spec.metadata.get("dagster/table_name")
            == "my_database.my_schema.my_fivetran_table"
        ]
    )
)


@dg.asset(deps=[my_fivetran_table_asset_key])
def my_downstream_asset(): ...

In the downstream asset, you may want direct access to the contents of the Fivetran table. To do so, you can customize the code within your @asset-decorated function to load upstream data.

About Fivetran

Fivetran ingests data from SaaS applications, databases, and servers. The data is stored and typically used for analytics.

What you'll learn​

Set up your environment​

Represent Fivetran assets in the asset graph​

Sync and materialize Fivetran assets​

Customize the materialization of Fivetran assets​

Customize asset definition metadata for Fivetran assets​

Fetching column-level metadata for Fivetran assets​

Load Fivetran assets for selected connectors​

Load Fivetran assets using a snapshot​

Load Fivetran assets from multiple workspaces​

Define upstream dependencies​

Define downstream dependencies​

About Fivetran​

What you'll learn

Set up your environment

Represent Fivetran assets in the asset graph

Sync and materialize Fivetran assets

Customize the materialization of Fivetran assets

Customize asset definition metadata for Fivetran assets

Fetching column-level metadata for Fivetran assets

Load Fivetran assets for selected connectors

Load Fivetran assets using a snapshot

Load Fivetran assets from multiple workspaces

Define upstream dependencies

Define downstream dependencies

About Fivetran