Skip to main content

Build your first ETL pipeline

In this tutorial, you'll build a full ETL pipeline with Dagster that:

  • Ingests data into DuckDB
  • Transforms data into reports with dbt
  • Runs scheduled reports automatically
  • Generates one-time reports on demand
  • Visualizes the data with Evidence

You will learn to:

  • Set up a Dagster project with the recommended project structure
  • Integrate with other tools
  • Create and materialize assets and dependencies
  • Ensure data quality with asset checks
  • Create and materialize partitioned assets
  • Automate the pipeline
  • Create and materialize assets with sensors

Prerequisites

To follow the steps in this tutorial, you'll need:

  • Python 3.9+ and uv installed. For more information, see the Installation guide.
  • Familiarity with Python and SQL.
  • A basic understanding of data pipelines and the extract, transform, and load (ETL) process.

1: Scaffold a new Dagster project

  1. Open your terminal and scaffold a new Dagster project:

    uvx -U create-dagster project etl-tutorial
  2. Respond y to the prompt to run uv sync after scaffolding

    Responding y to uv sync prompt

  3. Change to the etl-tutorial directory:

    cd etl-tutorial
  4. Activate the virtual environment:

    source .venv/bin/activate

2: Start Dagster webserver

Make sure Dagster and its dependencies were installed correctly by starting the Dagster webserver:

dg dev

In your browser, navigate to http://127.0.0.1:3000

At this point the project will be empty, but we will continue to add to it throughout the tutorial.

2048 resolution

Next steps