ClickHouse SQL component
Dagster’s TemplatedSqlComponent runs SQL through a connection object.
The ClickhouseQueryComponent
(in dagster-clickhouse) supplies a ClickHouse connection compatible with that workflow so you can
scaffold a reusable connection and reference it from templated SQL definitions.
ClickhouseQueryComponentis currently preview. Behavior and configuration may change in minor releases.
Step 1: Prepare a Dagster project
Use an existing components-ready project or create one:
create-dagster project my-project && cd my-project/src
Activate the project virtual environment:
source ../.venv/bin/activate
Add the ClickHouse library:
uv add dagster-clickhouse
Step 2: Scaffold a templated SQL component
dg scaffold defs dagster.TemplatedSqlComponent species_counts
Creating defs at /.../my-project/src/my_project/defs/species_counts.
tree my_project/defs
my_project/defs
├── __init__.py
└── species_counts
└── defs.yaml
2 directories, 2 files
Step 3: Scaffold a ClickHouse connection component
dg scaffold defs dagster_clickhouse.ClickhouseQueryComponent clickhouse_connection
Creating defs at /.../my-project/src/my_project/defs/clickhouse_connection.
type: dagster_clickhouse.ClickhouseQueryComponent
attributes:
host: localhost
port: 9000
user: default
password: ""
tree my_project/defs
my_project/defs
├── __init__.py
├── clickhouse_connection
│ └── defs.yaml
└── species_counts
└── defs.yaml
3 directories, 3 files
You typically keep one connection component per ClickHouse cluster and reuse it from multiple SQL components.
Step 4: Point templated SQL at the connection
Update the templated SQL defs.yaml to load the connection component and provide ClickHouse-specific SQL. This example aggregates the Iris dataset by species:
type: dagster.TemplatedSqlComponent
attributes:
sql_template: |
SELECT
species,
count() AS row_count
FROM {{ table_name }}
GROUP BY species
ORDER BY species
sql_template_vars:
table_name: iris.iris_dataset
connection: "{{ context.load_component('clickhouse_connection') }}"
assets:
- key: analytics/species_counts
group_name: analytics
kinds: [clickhouse]
You can list the resulting defs with:
dg list defs
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┓ │
│ │ ┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━┩ │
│ │ │ analytics/species_counts │ analytics │ │ clickhouse │ │ │
│ │ └─────────────────────────┴───────────┴──────┴───────────┴─────────────┘ │
└─────────┴──────────────────────────────────────────────────────────────────────────┘
Step 5: Launch your assets
dg launch --assets '*'
Related
- Using ClickHouse with Dagster
- Snowflake SQL component (same
TemplatedSqlComponentpattern)