This guide provides instructions for using Dagster with Sigma using the dagster-sigma library. Your Sigma assets, including datasets and workbooks, can be represented in the Dagster asset graph, allowing you to track lineage and dependencies between Sigma assets and upstream data assets you are already modeling in Dagster.
To load Sigma assets into the Dagster asset graph, you must first construct a SigmaOrganization resource, which allows Dagster to communicate with your Sigma organization. You'll need to supply your client ID and client secret alongside the base URL. See Identify your API request URL in the Sigma documentation for more information on how to find your base URL.
Dagster can automatically load all datasets and workbooks from your Sigma workspace as asset specs. Call the undefined.load_sigma_asset_specs function, which returns list of AssetSpecs representing your Sigma assets. You can then include these asset specs in your Definitions object:
It is possible to load a subset of your Sigma assets by providing a undefined.SigmaFilter to the undefined.load_sigma_asset_specs function. This SigmaFilter object allows you to specify the folders from which you want to load Sigma workbooks, and also will allow you to configure which datasets are represented as assets.
Note that the content and size of Sigma organization may affect the performance of your Dagster deployments. Filtering the workbooks selection from which your Sigma assets will be loaded is particularly useful for improving loading times.
from dagster_sigma import(
SigmaBaseUrl,
SigmaFilter,
SigmaOrganization,
load_sigma_asset_specs,)import dagster as dg
sigma_organization = SigmaOrganization(
base_url=SigmaBaseUrl.AWS_US,
client_id=dg.EnvVar("SIGMA_CLIENT_ID"),
client_secret=dg.EnvVar("SIGMA_CLIENT_SECRET"),)
sigma_specs = load_sigma_asset_specs(
organization=sigma_organization,
sigma_filter=SigmaFilter(# Filter down to only the workbooks in these folders
workbook_folders=[("my_folder","my_subfolder"),("my_folder","my_other_subfolder"),],# Specify whether to include datasets that are not used in any workbooks# default is True
include_unused_datasets=False,),)
defs = dg.Definitions(assets=[*sigma_specs], resources={"sigma": sigma_organization})
Customize asset definition metadata for Sigma assets#
By default, Dagster will generate asset specs for each Sigma asset based on its type, and populate default metadata. You can further customize asset properties by passing a custom DagsterSigmaTranslator subclass to the undefined.load_sigma_asset_specs function. This subclass can implement methods to customize the asset specs for each Sigma asset type.
from dagster_sigma import(
DagsterSigmaTranslator,
SigmaBaseUrl,
SigmaOrganization,
SigmaWorkbook,
load_sigma_asset_specs,)import dagster as dg
sigma_organization = SigmaOrganization(
base_url=SigmaBaseUrl.AWS_US,
client_id=dg.EnvVar("SIGMA_CLIENT_ID"),
client_secret=dg.EnvVar("SIGMA_CLIENT_SECRET"),)# A translator class lets us customize properties of the built Sigma assets, such as the owners or asset keyclassMyCustomSigmaTranslator(DagsterSigmaTranslator):defget_asset_spec(self, data: SigmaWorkbook)-> dg.AssetSpec:# We create the default asset spec using super()
default_spec =super().get_asset_spec(data)# we customize the team owner tag for all Sigma assetsreturn default_spec.replace_attributes(owners=["team:my_team"])
sigma_specs = load_sigma_asset_specs(
sigma_organization, dagster_sigma_translator=MyCustomSigmaTranslator
)
defs = dg.Definitions(assets=[*sigma_specs], resources={"sigma": sigma_organization})
Note that super() is called in each of the overridden methods to generate the default asset spec. It is best practice to generate the default asset spec before customizing it.
Definitions from multiple Sigma organizations can be combined by instantiating multiple SigmaOrganization resources and merging their specs. This lets you view all your Sigma assets in a single asset graph:
from dagster_sigma import SigmaBaseUrl, SigmaOrganization, load_sigma_asset_specs
import dagster as dg
sales_team_organization = SigmaOrganization(
base_url=SigmaBaseUrl.AWS_US,
client_id=dg.EnvVar("SALES_SIGMA_CLIENT_ID"),
client_secret=dg.EnvVar("SALES_SIGMA_CLIENT_SECRET"),)
marketing_team_organization = SigmaOrganization(
base_url=SigmaBaseUrl.AWS_US,
client_id=dg.EnvVar("MARKETING_SIGMA_CLIENT_ID"),
client_secret=dg.EnvVar("MARKETING_SIGMA_CLIENT_SECRET"),)
sales_team_specs = load_sigma_asset_specs(sales_team_organization)
marketing_team_specs = load_sigma_asset_specs(marketing_team_organization)# Merge the specs into a single set of definitions
defs = dg.Definitions(
assets=[*sales_team_specs,*marketing_team_specs],
resources={"marketing_sigma": marketing_team_organization,"sales_sigma": sales_team_organization,},)