Ask AI

Using dbt with Dagster, part two: Load dbt models as Dagster assets#

This is part two of the Using dbt with Dagster software-defined assets tutorial.

At this point, you should have a fully-configured dbt project that's ready to work with Dagster.

In this section, you'll finally begin integrating dbt with Dagster! To do this, you'll:


Step 1: Create a Dagster project that wraps your dbt project#

You can create a Dagster project that wraps your dbt project by using the dagster-dbt command line interface. Make sure you're in the directory where your dbt_project.yml is. If you're continuing from the previous section, then you'll already be in this directory. Then, run:

dagster-dbt project scaffold --project-name jaffle_dagster

This creates a directory called jaffle_dagster/ inside the current directory. The jaffle_dagster/ directory contains a set of files that define a Dagster project.

In general, it's up to you where to put your Dagster project. It's most common to put your Dagster project at the root of your git repository. Therefore, in this case, because the dbt_project.yml was at the root of the jaffle_shop git repository, we created our Dagster project there.

Note: The dagster-dbt project scaffold command creates the Dagster project in whatever directory you run it from. If that's a different directory from where your dbt_project.yml lives, then you'll need to provide a value for the --dbt-project-dir option so that Dagster knows where to look for your dbt project.


Step 2: Inspect your Dagster project in Dagster's UI#

Now that you have a Dagster project, you can run Dagster's UI to take a look at it.

  1. Change directories to the Dagster project directory:

    cd jaffle_dagster/
    
  2. To start Dagster's UI, run the following:

    DAGSTER_DBT_PARSE_PROJECT_ON_LOAD=1 dagster dev
    

    Which will result in output similar to:

    Serving dagster-webserver on http://127.0.0.1:3000 in process 70635
    

    Note: DAGSTER_DBT_PARSE_PROJECT_ON_LOAD is an environment variable. If using Microsoft PowerShell, set it before running dagster dev using this syntax:

    $env:DAGSTER_DBT_PARSE_PROJECT_ON_LOAD = "1"; dagster dev
    
  3. In your browser, navigate to http://127.0.0.1:3000. The page will display the assets:

Asset graph in Dagster's UI, containing dbt models loaded as Dagster assets

Step 3: Build your dbt models in Dagster#

You can do more than view your dbt models in Dagster – you can also run them. In Dagster, running a dbt model corresponds to materializing an asset. Materializing an asset means running some computation to update its contents in persistent storage. In this tutorial, that persistent storage is our local DuckDB database.

To build your dbt project, i.e. materialize your assets, click the Materialize all button near the top right corner of the page. This will launch a run to materialize the assets. When finished, the Materialized and Latest Run attributes in the asset will be populated:

Asset graph in Dagster's UI, showing materialized assets

After the run completes, you can:

  • Click the asset to open a sidebar containing info about the asset, including its last materialization stats and a link to view the Asset details page
  • Click the ID of the Latest Run in an asset to view the Run details page. This page contains detailed info about the run, including timing information, errors, and logs.

Step 4: Understand the Python code in your Dagster project#

You saw how you can create a Dagster project that loads a dbt project. How does this work? Understanding how Dagster loads a dbt project will give you a foundation for customizing how Dagster runs your dbt project, as well as for connecting it to other data assets outside of dbt.

The most important file is the Python file that contains the set of definitions for Dagster to load: jaffle_shop/definitions.py. Dagster executes the code in this file to find out what assets it should be aware of, as well as details about those assets. For example, when you ran dagster dev in the previous step, Dagster executed the code in this file to determine what assets to display in the UI.

In our definitions.py Python file, we import from assets.py, which contains the code to model our dbt models as Dagster assets. To return a Dagster asset for each dbt model, the code in this assets.py file needs to know what dbt models you have. It finds out what models you have by reading a file called a manifest.json, which is a file that dbt can generate for any dbt project and contains information about every model, seed, snapshot, test, etc. in the project.

To retrieve the manifest.json, assets.py imports from constants.py, which defines the path to the manifest.json file:

import os
from pathlib import Path

from dagster_dbt import DbtCliResource

dbt_project_dir = Path(__file__).joinpath("..", "..", "..").resolve()
dbt = DbtCliResource(project_dir=os.fspath(dbt_project_dir))

# If DAGSTER_DBT_PARSE_PROJECT_ON_LOAD is set, a manifest will be created at run time.
# Otherwise, we expect a manifest to be present in the project's target directory.
if os.getenv("DAGSTER_DBT_PARSE_PROJECT_ON_LOAD"):
    dbt_manifest_path = (
        dbt.cli(
            ["--quiet", "parse"],
            target_path=Path("target"),
        )
        .wait()
        .target_path.joinpath("manifest.json")
    )
else:
    dbt_manifest_path = dbt_project_dir.joinpath("target", "manifest.json")

As the comment explains, the code gives you a choice about how to create this manifest.json file. Based on the DAGSTER_DBT_PARSE_PROJECT_ON_LOAD environment variable, either:

  1. This code generates the manifest.json for you. This is the easiest option during development, because you never need to worry about the file being out-of-date with your dbt project. But generating the file can be time-consuming when you have a large dbt project, so it's generally recommended to go with the second option when you're deploying your dbt project to production.
  2. This code leaves it up to you to generate the manifest.json file on your own, and this code just reads it. This option allows you to only generate it once every time you deploy, rather than every time the code in this module gets executed.

In this tutorial, we're going with the first option: When we ran dagster dev, we set the DAGSTER_DBT_PARSE_PROJECT_ON_LOAD environment variable.

Once you've got a path to a manifest.json, it's time to define your Dagster assets using it. The following code, in your project's assets.py, does this:

from dagster import AssetExecutionContext
from dagster_dbt import DbtCliResource, dbt_assets

from .constants import dbt_manifest_path


@dbt_assets(manifest=dbt_manifest_path)
def jaffle_shop_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
    yield from dbt.cli(["build"], context=context).stream()

This code might look a bit fancy, because it uses a decorator. Here's a breakdown of what's going on:

  • It creates a variable named jaffle_shop_dbt_assets that holds an object that represents a set of Dagster assets.
  • These Dagster assets reflect the dbt models described in the manifest file. The manifest file is passed in using the manifest argument.
  • The decorated function defines what should happen when you materialize one of these Dagster assets, e.g. by clicking the Materialize button in the UI or materializing it automatically by putting it on a schedule. In this case, it will invoke the dbt build command on the selected assets. The context parameter that's provided along with dbt build carries the selection.

If you later want to customize how your dbt models are translated into Dagster assets, you'll do so by editing its definition in assets.py.


What's next?#

At this point, you've loaded your dbt models into Dagster as assets, viewed them in Dagster's asset graph UI, and materialized them. Next, you'll learn how to add upstream Dagster assets.