Review dbt project structure
With our base tables populated, we can turn to the dbt project. The analytics/models
directory, located alongside our Dagster code in the src/project_dbt
directory, contains a small but representative dbt project:
analytics
├── marts
│ ├── daily_metrics.sql
│ └── location_metrics.sql
├── sources
│ └── raw_taxis.yml
└── staging
├── staging.yml
├── stg_trips.sql
└── stg_zones.sql
It’s usually simpler to keep your dbt project in the same repository as your Dagster code. This makes it easier for Dagster to parse the metadata files generated by dbt commands, and enables tighter integration between the two tools.
dbt sources
This dbt project includes four models and two sources. The sources correspond to the taxi_zones
and taxi_trips
tables created in the previous step:
version: 2
sources:
- name: raw_taxis
schema: main
tables:
- name: zones
- name: trips
The connection for the dbt project is defined in profiles.yml
. Here, we point it to the same DuckDB storage layer used by our Dagster DuckDB resource, /var/tmp/duckdb.db
:
dbt_project:
target: dev
outputs:
dev:
type: duckdb
path: '{{ env_var("DUCKDB_DATABASE", "/var/tmp/duckdb.db") }}'
dbt models
In addition to the sources, the project defines several models that capture the business logic. Most models are materialized as tables, one is incremental, and one includes lightweight tests (configured in staging.yml
).
Table | Materialization | Tests |
---|---|---|
daily_metrics | Incremental | No |
location_metrics | Table | No |
stg_trips | Table | No |
stg_zones | Table | Yes |
The SQL inside each model isn’t the focus here. Instead, this compact project highlights many of the core dbt patterns you’ll encounter: staging layers, source declarations, different materializations, and basic testing.
Next, we’ll turn these dbt models into Dagster assets so they can be orchestrated, tracked, and monitored as part of your data pipeline.
Next steps
- Continue this example with dbt assets