Dagster & ClickHouse
This library provides an integration with ClickHouse using the native protocol (clickhouse-driver). Use a resource for ad hoc SQL, I/O managers for Pandas or Polars DataFrames, and components to wire templated SQL into Dagster projects with dg.
Installation
Install the packages that match how you plan to use ClickHouse:
- uv
- pip
uv add dagster-clickhouse dagster-clickhouse-pandas dagster-clickhouse-polars
pip install dagster-clickhouse dagster-clickhouse-pandas dagster-clickhouse-polars
dagster-clickhouse:ClickhouseResource,ClickhouseIOManager, andClickhouseQueryComponentfordgprojects.dagster-clickhouse-pandas:ClickhousePandasIOManagerfor Pandas DataFrames.dagster-clickhouse-polars:ClickhousePolarsIOManagerfor Polars DataFrames.
You will also need a running ClickHouse server (for example ClickHouse Cloud or a self-hosted instance) reachable at the host and port you configure.
Example
This example stores a Pandas DataFrame in ClickHouse using the Pandas I/O manager. Dagster maps the asset name to a table and uses the I/O manager schema (and optional asset key_prefix) to choose the ClickHouse database for that table.
import pandas as pd
from dagster_clickhouse_pandas import ClickhousePandasIOManager
import dagster as dg
@dg.asset
def iris_dataset() -> (
pd.DataFrame
): # asset name is the table name; I/O manager `schema` is the ClickHouse database
return pd.read_csv(
"https://docs.dagster.io/assets/iris.csv",
names=[
"sepal_length_cm",
"sepal_width_cm",
"petal_length_cm",
"petal_width_cm",
"species",
],
)
defs = dg.Definitions(
assets=[iris_dataset],
resources={
"io_manager": ClickhousePandasIOManager(
host="localhost",
port=9000,
database="default",
schema="iris",
)
},
)
About ClickHouse
ClickHouse is a column-oriented OLAP database designed for analytical queries over large datasets. The Dagster integration connects with the native protocol (default port 9000) using clickhouse-driver.
Dagster schema and ClickHouse databases
Unlike some databases, ClickHouse does not use a separate “schema” layer above the database. In this integration, Dagster’s schema metadata and I/O manager schema config refer to a ClickHouse database name (for example, analytics → `analytics`.`my_table`).
See the Using ClickHouse with Dagster guide and the reference for usage patterns, including templated SQL with dg.