dbt patterns and best practices
This guide covers advanced patterns and best practices for integrating dbt with Dagster, helping you build more maintainable data pipelines.
Preventing concurrent dbt snapshots
dbt snapshots track changes to data over time by comparing current data to previous snapshots. Running snapshots concurrently can corrupt these tables, so it's critical to ensure only one snapshot operation runs at a time.
1. Separate snapshots from other models
Create separate dbt component definitions to isolate snapshots from your regular dbt models. First, scaffold two dbt components:
# Create component for regular models
dg scaffold defs dagster_dbt.DbtProjectComponent dbt_models
# Create component for snapshots
dg scaffold defs dagster_dbt.DbtProjectComponent dbt_snapshots
Configure the regular models component to exclude snapshots:
type: dagster_dbt.DbtProjectComponent
attributes:
project: '{{ project_root }}/dbt'
exclude: "resource_type:snapshot"
Configure the snapshots component with concurrency control:
type: dagster_dbt.DbtProjectComponent
attributes:
project: '{{ project_root }}/dbt'
select: "resource_type:snapshot"
post_processing:
assets:
- target: "*"
attributes:
pool: "dbt-snapshots"
2. Configure concurrency pools
Configure your Dagster instance to create pools with maximum concurrency of 1. Add this configuration to your dagster.yaml (for Dagster Open Source) or deployment settings (for Dagster+):
concurrency:
pools:
dbt-snapshots:
limit: 1
granularity: 'op'
Then set the pool limit for the snapshot pool:
# Set pool limit using CLI
dagster instance concurrency set dbt-snapshots 1
3. Manage multiple snapshot groups with Dagster components
For large projects with many snapshots, you can create multiple snapshot groups while still preventing concurrency issues within each group. Create separate Dagster components for different business domains:
# Create component for sales snapshots
dg scaffold defs dagster_dbt.DbtProjectComponent dbt_snapshots_sales
# Create component for inventory snapshots
dg scaffold defs dagster_dbt.DbtProjectComponent dbt_snapshots_inventory
Sales snapshots component:
type: dagster_dbt.DbtProjectComponent
attributes:
project: '{{ project_root }}/dbt'
select: "resource_type:snapshot,path:snapshots/sales/*"
post_processing:
assets:
- target: "*"
attributes:
pool: "sales-snapshots"
Inventory snapshots component:
type: dagster_dbt.DbtProjectComponent
attributes:
project: '{{ project_root }}/dbt'
select: "resource_type:snapshot,path:snapshots/inventory/*"
post_processing:
assets:
- target: "*"
attributes:
pool: "inventory-snapshots"
Configure separate pool limits for each domain. This approach allows snapshots from different business domains to run in parallel while preventing concurrent execution within each domain, reducing the risk of corruption while maintaining reasonable performance.