dagster-clickhouse integration reference
This page covers ClickHouse integration details beyond the Using ClickHouse with Dagster guide.
ClickHouse resource
The ClickhouseResource exposes a clickhouse_driver.Client from get_connection(). Use it to run DDL/DML and queries with full control over SQL.
Executing custom SQL
from dagster_clickhouse import ClickhouseResource
from dagster import asset
# This example executes a query against the `iris.iris_dataset` table created in the
# Using ClickHouse with Dagster guide.
iris_dataset = asset(name="iris_dataset")(lambda: None)
@asset(deps=[iris_dataset])
def small_petals(clickhouse: ClickhouseResource) -> None:
with clickhouse.get_connection() as client:
client.execute("DROP TABLE IF EXISTS iris.small_petals")
client.execute(
"CREATE TABLE iris.small_petals ENGINE = MergeTree() ORDER BY tuple() AS"
" SELECT * FROM iris.iris_dataset WHERE petal_length_cm < 2 AND petal_width_cm < 0.5"
)
This builds on the iris.iris_dataset table created in the resource tutorial.
ClickHouse I/O managers
The Pandas and Polars I/O managers create MergeTree tables when needed and use ALTER TABLE … DELETE (with mutations_sync = 1) or TRUNCATE when replacing partitions or full tables, consistent with ClickhouseDbClient.
Selecting specific columns in a downstream asset
You can restrict loaded columns by attaching columns metadata on the input asset, similar to other database I/O managers:
import pandas as pd
from dagster import AssetIn, asset
@asset(
ins={
"iris_sepal": AssetIn(
key="iris_dataset",
metadata={"columns": ["sepal_length_cm", "sepal_width_cm"]},
)
}
)
def sepal_data(iris_sepal: pd.DataFrame) -> pd.DataFrame:
iris_sepal["sepal_area_cm2"] = (
iris_sepal["sepal_length_cm"] * iris_sepal["sepal_width_cm"]
)
return iris_sepal
Dagster schema vs connection database
schemaon the I/O manager / asset metadata: the ClickHouse database that contains the table (for example`analytics`.`revenue_daily`).databaseon the I/O manager / resource: the default database on the client session (ClickHouseUSEsemantics). Table locations for assets still use the Dagsterschema/ asset metadata path.
Legacy @io_manager factories
The packages expose clickhouse_pandas_io_manager and clickhouse_polars_io_manager built with build_clickhouse_io_manager for setups that prefer the older factory style.