Skip to main content

ClickHouse SQL component

Dagster’s TemplatedSqlComponent runs SQL through a connection object. The ClickhouseQueryComponent (in dagster-clickhouse) supplies a ClickHouse connection compatible with that workflow so you can scaffold a reusable connection and reference it from templated SQL definitions.

info
ClickhouseQueryComponent

is currently preview. Behavior and configuration may change in minor releases.

Step 1: Prepare a Dagster project

Use an existing components-ready project or create one:

create-dagster project my-project && cd my-project/src

Activate the project virtual environment:

source ../.venv/bin/activate

Add the ClickHouse library:

uv add dagster-clickhouse

Step 2: Scaffold a templated SQL component

dg scaffold defs dagster.TemplatedSqlComponent species_counts
Creating defs at /.../my-project/src/my_project/defs/species_counts.
tree my_project/defs
my_project/defs
├── __init__.py
└── species_counts
└── defs.yaml

2 directories, 2 files

Step 3: Scaffold a ClickHouse connection component

dg scaffold defs dagster_clickhouse.ClickhouseQueryComponent clickhouse_connection
Creating defs at /.../my-project/src/my_project/defs/clickhouse_connection.
my_project/defs/clickhouse_connection/defs.yaml
type: dagster_clickhouse.ClickhouseQueryComponent

attributes:
host: localhost
port: 9000
user: default
password: ""
tree my_project/defs
my_project/defs
├── __init__.py
├── clickhouse_connection
│ └── defs.yaml
└── species_counts
└── defs.yaml

3 directories, 3 files

You typically keep one connection component per ClickHouse cluster and reuse it from multiple SQL components.

Step 4: Point templated SQL at the connection

Update the templated SQL defs.yaml to load the connection component and provide ClickHouse-specific SQL. This example aggregates the Iris dataset by species:

my_project/defs/species_counts/defs.yaml
type: dagster.TemplatedSqlComponent

attributes:
sql_template: |
SELECT
species,
count() AS row_count
FROM {{ table_name }}
GROUP BY species
ORDER BY species

sql_template_vars:
table_name: iris.iris_dataset

connection: "{{ context.load_component('clickhouse_connection') }}"

assets:
- key: analytics/species_counts
group_name: analytics
kinds: [clickhouse]

You can list the resulting defs with:

dg list defs
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┓ │
│ │ ┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━┩ │
│ │ │ analytics/species_counts │ analytics │ │ clickhouse │ │ │
│ │ └─────────────────────────┴───────────┴──────┴───────────┴─────────────┘ │
└─────────┴──────────────────────────────────────────────────────────────────────────┘

Step 5: Launch your assets

dg launch --assets '*'