Dagster & ClickHouse

This library provides an integration with ClickHouse using the native protocol (clickhouse-driver). Use a resource for ad hoc SQL, I/O managers for Pandas or Polars DataFrames, and components to wire templated SQL into Dagster projects with dg.

Installation

Install the packages that match how you plan to use ClickHouse:

uv add dagster-clickhouse dagster-clickhouse-pandas dagster-clickhouse-polars

pip install dagster-clickhouse dagster-clickhouse-pandas dagster-clickhouse-polars

dagster-clickhouse: ClickhouseResource, ClickhouseIOManager, and ClickhouseQueryComponent for dg projects.
dagster-clickhouse-pandas: ClickhousePandasIOManager for Pandas DataFrames.
dagster-clickhouse-polars: ClickhousePolarsIOManager for Polars DataFrames.

You will also need a running ClickHouse server (for example ClickHouse Cloud or a self-hosted instance) reachable at the host and port you configure.

Example

This example stores a Pandas DataFrame in ClickHouse using the Pandas I/O manager. Dagster maps the asset name to a table and uses the I/O manager schema (and optional asset key_prefix) to choose the ClickHouse database for that table.

import pandas as pd
from dagster_clickhouse_pandas import (
    ClickhousePandasIOManager,  # ty: ignore[unresolved-import]
)

import dagster as dg


@dg.asset
def iris_dataset() -> (
    pd.DataFrame
):  # asset name is the table name; I/O manager `schema` is the ClickHouse database
    return pd.read_csv(
        "https://docs.dagster.io/assets/iris.csv",
        names=[
            "sepal_length_cm",
            "sepal_width_cm",
            "petal_length_cm",
            "petal_width_cm",
            "species",
        ],
    )


defs = dg.Definitions(
    assets=[iris_dataset],
    resources={
        "io_manager": ClickhousePandasIOManager(
            host="localhost",
            port=9000,
            database="default",
            schema="iris",
        )
    },
)

About ClickHouse

ClickHouse is a column-oriented OLAP database designed for analytical queries over large datasets. The Dagster integration connects with the native protocol (default port 9000) using clickhouse-driver.

Dagster `schema` and ClickHouse databases

Unlike some databases, ClickHouse does not use a separate “schema” layer above the database. In this integration, Dagster’s schema metadata and I/O manager schema config refer to a ClickHouse database name (for example, analytics → `analytics`.`my_table`).

See the Using ClickHouse with Dagster guide and the reference for usage patterns, including templated SQL with dg.

Installation​

Example​

About ClickHouse​

Dagster schema and ClickHouse databases​

Installation

Example

About ClickHouse

Dagster `schema` and ClickHouse databases