Ask AI

Using Dagster with Sigma#

This feature is currently experimental.

This guide provides instructions for using Dagster with Sigma using the dagster-sigma library. Your Sigma assets, including datasets and workbooks, can be represented in the Dagster asset graph, allowing you to track lineage and dependencies between Sigma assets and upstream data assets you are already modeling in Dagster.

What you'll learn#

  • How to represent Sigma assets in the Dagster asset graph, including lineage to other Dagster assets.
  • How to customize asset definition metadata for these Sigma assets.
Prerequisites
  • The dagster-sigma library installed in your environment
  • Familiarity with asset definitions and the Dagster asset graph
  • Familiarity with Dagster resources
  • Familiarity with Sigma concepts, like datasets and workbooks
  • A Sigma organization
  • A Sigma client ID and client secret. For more information, see Generate API client credentials in the Sigma documentation.

Set up your environment#

To get started, you'll need to install the dagster and dagster-sigma Python packages:

pip install dagster dagster-sigma

Represent Sigma assets in the asset graph#

To load Sigma assets into the Dagster asset graph, you must first construct a SigmaOrganization resource, which allows Dagster to communicate with your Sigma organization. You'll need to supply your client ID and client secret alongside the base URL. See Identify your API request URL in the Sigma documentation for more information on how to find your base URL.

Dagster can automatically load all datasets and workbooks from your Sigma workspace as asset specs. Call the undefined.load_sigma_asset_specs function, which returns list of AssetSpecs representing your Sigma assets. You can then include these asset specs in your Definitions object:

from dagster_sigma import SigmaBaseUrl, SigmaOrganization, load_sigma_asset_specs

import dagster as dg

sigma_organization = SigmaOrganization(
    base_url=SigmaBaseUrl.AWS_US,
    client_id=dg.EnvVar("SIGMA_CLIENT_ID"),
    client_secret=dg.EnvVar("SIGMA_CLIENT_SECRET"),
)

sigma_specs = load_sigma_asset_specs(sigma_organization)
defs = dg.Definitions(assets=[*sigma_specs], resources={"sigma": sigma_organization})

Load Sigma assets from filtered workbooks#

It is possible to load a subset of your Sigma assets by providing a undefined.SigmaFilter to the undefined.load_sigma_asset_specs function. This SigmaFilter object allows you to specify the folders from which you want to load Sigma workbooks, and also will allow you to configure which datasets are represented as assets.

Note that the content and size of Sigma organization may affect the performance of your Dagster deployments. Filtering the workbooks selection from which your Sigma assets will be loaded is particularly useful for improving loading times.

from dagster_sigma import (
    SigmaBaseUrl,
    SigmaFilter,
    SigmaOrganization,
    load_sigma_asset_specs,
)

import dagster as dg

sigma_organization = SigmaOrganization(
    base_url=SigmaBaseUrl.AWS_US,
    client_id=dg.EnvVar("SIGMA_CLIENT_ID"),
    client_secret=dg.EnvVar("SIGMA_CLIENT_SECRET"),
)

sigma_specs = load_sigma_asset_specs(
    organization=sigma_organization,
    sigma_filter=SigmaFilter(
        # Filter down to only the workbooks in these folders
        workbook_folders=[
            ("my_folder", "my_subfolder"),
            ("my_folder", "my_other_subfolder"),
        ],
        # Specify whether to include datasets that are not used in any workbooks
        # default is True
        include_unused_datasets=False,
    ),
)
defs = dg.Definitions(assets=[*sigma_specs], resources={"sigma": sigma_organization})

Customize asset definition metadata for Sigma assets#

By default, Dagster will generate asset specs for each Sigma asset based on its type, and populate default metadata. You can further customize asset properties by passing a custom DagsterSigmaTranslator subclass to the undefined.load_sigma_asset_specs function. This subclass can implement methods to customize the asset specs for each Sigma asset type.

from dagster_sigma import (
    DagsterSigmaTranslator,
    SigmaBaseUrl,
    SigmaOrganization,
    SigmaWorkbook,
    load_sigma_asset_specs,
)

import dagster as dg

sigma_organization = SigmaOrganization(
    base_url=SigmaBaseUrl.AWS_US,
    client_id=dg.EnvVar("SIGMA_CLIENT_ID"),
    client_secret=dg.EnvVar("SIGMA_CLIENT_SECRET"),
)


# A translator class lets us customize properties of the built Sigma assets, such as the owners or asset key
class MyCustomSigmaTranslator(DagsterSigmaTranslator):
    def get_asset_spec(self, data: SigmaWorkbook) -> dg.AssetSpec:
        # We create the default asset spec using super()
        default_spec = super().get_asset_spec(data)
        # we customize the team owner tag for all Sigma assets
        return default_spec.replace_attributes(owners=["team:my_team"])


sigma_specs = load_sigma_asset_specs(
    sigma_organization, dagster_sigma_translator=MyCustomSigmaTranslator
)
defs = dg.Definitions(assets=[*sigma_specs], resources={"sigma": sigma_organization})

Note that super() is called in each of the overridden methods to generate the default asset spec. It is best practice to generate the default asset spec before customizing it.

Load Sigma assets from multiple organizations#

Definitions from multiple Sigma organizations can be combined by instantiating multiple SigmaOrganization resources and merging their specs. This lets you view all your Sigma assets in a single asset graph:

from dagster_sigma import SigmaBaseUrl, SigmaOrganization, load_sigma_asset_specs

import dagster as dg

sales_team_organization = SigmaOrganization(
    base_url=SigmaBaseUrl.AWS_US,
    client_id=dg.EnvVar("SALES_SIGMA_CLIENT_ID"),
    client_secret=dg.EnvVar("SALES_SIGMA_CLIENT_SECRET"),
)

marketing_team_organization = SigmaOrganization(
    base_url=SigmaBaseUrl.AWS_US,
    client_id=dg.EnvVar("MARKETING_SIGMA_CLIENT_ID"),
    client_secret=dg.EnvVar("MARKETING_SIGMA_CLIENT_SECRET"),
)

sales_team_specs = load_sigma_asset_specs(sales_team_organization)
marketing_team_specs = load_sigma_asset_specs(marketing_team_organization)

# Merge the specs into a single set of definitions
defs = dg.Definitions(
    assets=[*sales_team_specs, *marketing_team_specs],
    resources={
        "marketing_sigma": marketing_team_organization,
        "sales_sigma": sales_team_organization,
    },
)