dagster-dbt integration reference
Using dbt Cloud? Check out the dbt Cloud with Dagster guide.
This reference provides a high-level look at working with dbt models through Dagster's software-defined assets framework using the dagster-dbt
integration library.
For a step-by-step implementation walkthrough, refer to the Using dbt with Dagster asset definitions tutorial.
Relevant APIs
Name | Description |
---|---|
dagster-dbt project scaffold | A CLI command to initialize a new Dagster project for an existing dbt project. |
@dagster_dbt.dbt_assets | A decorator used to define Dagster assets for dbt models defined in a dbt manifest. |
DbtCliResource | A class that defines a Dagster resource used to execute dbt CLI commands. |
DbtCliInvocation | A class that defines the representation of an invoked dbt command. |
DbtProject | A class that defines the representation of a dbt project and related settings that assist with managing dependencies and manifest.json preparation. |
DagsterDbtTranslator | A class that can be overridden to customize how Dagster asset metadata is derived from a dbt manifest. |
DagsterDbtTranslatorSettings | A class with settings to enable Dagster features for a dbt project. |
DbtManifestAssetSelection | A class that defines a selection of assets from a dbt manifest and a dbt selection string. |
build_dbt_asset_selection | A helper method that builds a DbtManifestAssetSelection from a dbt manifest and dbt selection string. |
build_schedule_from_dbt_selection | A helper method that builds a ScheduleDefinition from a dbt manifest, dbt selection string, and cron string. |
get_asset_key_for_model | A helper method that retrieves the AssetKey for a dbt model. |
get_asset_key_for_source | A helper method that retrieves the AssetKey for a dbt source with a singular table. |
get_asset_keys_by_output_name_for_source | A helper method that retrieves the AssetKeys for a dbt source with multiple tables. |
dbt models and Dagster asset definitions
Dagster’s asset definitions bear several similarities to dbt models. An asset definition contains an asset key, a set of upstream asset keys, and an operation that is responsible for computing the asset from its upstream dependencies. Models defined in a dbt project can be interpreted as Dagster asset definitions:
- The asset key for a dbt model is (by default) the name of the model.
- The upstream dependencies of a dbt model are defined with
ref
orsource
calls within the model's definition. - The computation required to compute the asset from its upstream dependencies is the SQL within the model's definition.
These similarities make it natural to interact with dbt models as asset definitions. Let’s take a look at a dbt model and an asset definition, in code:
Here's what's happening in this example:
- The first code block is a dbt model
- As dbt models are named using file names, this model is named
orders
- The data for this model comes from a dependency named
raw_orders
- As dbt models are named using file names, this model is named
- The second code block is a Dagster asset
- The asset key corresponds to the name of the dbt model,
orders
raw_orders
is provided as an argument to the asset, defining it as a dependency
- The asset key corresponds to the name of the dbt model,
Scaffolding a Dagster project from a dbt project
Check out part two of the dbt & Dagster tutorial to see this concept in context.
You can create a Dagster project that wraps your dbt project by using the dagster-dbt project scaffold
command line interface.
dagster-dbt project scaffold --project-name project_dagster --dbt-project-dir path/to/dbt/project
This creates a directory called project_dagster/
inside the current directory. The project_dagster/
directory contains a set of files that define a Dagster project that loads the dbt project at the path defined by --dbt-project-dir
. The path to the dbt project must contain a dbt_project.yml
.
Loading dbt models from a dbt project
Check out part two of the dbt & Dagster tutorial to see this concept in context.
The dagster-dbt
library offers @dagster_dbt.dbt_assets
to define Dagster assets for dbt models. It requires a dbt manifest, or manifest.json
, to be created from your dbt project to parse your dbt project's representation.
The manifest can be created in two ways:
- At run time: A dbt manifest is generated when your Dagster definitions are loaded, or
- At build time: A dbt manifest is generated before loading your Dagster definitions and is included as part of your Python package.
When deploying your Dagster project to production, we recommend generating the manifest at build time to avoid the overhead of recompiling your dbt project every time your Dagster code is executed. A manifest.json
should be precompiled and included in the Python package for your Dagster code.
The easiest way to handle the creation of your manifest file is to use DbtProject
.
In the Dagster project created by the dagster-dbt project scaffold
command, the creation of your manifest is handled at run time during development:
from pathlib import Path
from dagster_dbt import DbtProject
my_dbt_project = DbtProject(
project_dir=Path(__file__).joinpath("..", "..", "..").resolve(),
packaged_project_dir=Path(__file__)
.joinpath("..", "..", "dbt-project")
.resolve(),
)
my_dbt_project.prepare_if_dev()
The manifest path can then be accessed with my_dbt_project.manifest_path
.
When developing locally, you can run the following command to generate the manifest at run time for your dbt and Dagster project:
dagster dev
In production, a precompiled manifest should be used. Using DbtProject
, the manifest can be created at build time by running the dagster-dbt project prepare-and-package
command in your CI/CD workflow. For more information, see the Deploying a Dagster project with a dbt project section.
Selecting a profiles directory, profile and target for your dbt project
You can specify which connection information dbt should use when parsing and executing your models. This can be done by passing the profiles directory, profile and target to your when creating your DbtProject
object. These fields are optional - the default values defined in your dbt project will be used for each parameter that is not passed.
from pathlib import Path
from dagster_dbt import DbtProject
my_dbt_project = DbtProject(
project_dir=Path(__file__).joinpath("..", "..", "..").resolve(),
profiles_dir=Path(__file__)
.joinpath("..", "..", "..", "my_profiles_dir")
.resolve(),
profile="my_profile",
target="my_target",
)
For more information, see dbt's guide about connection profiles.