Skip to main content

Dagster & ClickHouse

This library provides an integration with ClickHouse using the native protocol (clickhouse-driver). Use a resource for ad hoc SQL, I/O managers for Pandas or Polars DataFrames, and components to wire templated SQL into Dagster projects with dg.

Installation

Install the packages that match how you plan to use ClickHouse:

uv add dagster-clickhouse dagster-clickhouse-pandas dagster-clickhouse-polars

You will also need a running ClickHouse server (for example ClickHouse Cloud or a self-hosted instance) reachable at the host and port you configure.

Example

This example stores a Pandas DataFrame in ClickHouse using the Pandas I/O manager. Dagster maps the asset name to a table and uses the I/O manager schema (and optional asset key_prefix) to choose the ClickHouse database for that table.

import pandas as pd
from dagster_clickhouse_pandas import ClickhousePandasIOManager

import dagster as dg


@dg.asset
def iris_dataset() -> (
pd.DataFrame
): # asset name is the table name; I/O manager `schema` is the ClickHouse database
return pd.read_csv(
"https://docs.dagster.io/assets/iris.csv",
names=[
"sepal_length_cm",
"sepal_width_cm",
"petal_length_cm",
"petal_width_cm",
"species",
],
)


defs = dg.Definitions(
assets=[iris_dataset],
resources={
"io_manager": ClickhousePandasIOManager(
host="localhost",
port=9000,
database="default",
schema="iris",
)
},
)

About ClickHouse

ClickHouse is a column-oriented OLAP database designed for analytical queries over large datasets. The Dagster integration connects with the native protocol (default port 9000) using clickhouse-driver.

Dagster schema and ClickHouse databases

Unlike some databases, ClickHouse does not use a separate “schema” layer above the database. In this integration, Dagster’s schema metadata and I/O manager schema config refer to a ClickHouse database name (for example, analytics`analytics`.`my_table`).

See the Using ClickHouse with Dagster guide and the reference for usage patterns, including templated SQL with dg.