If you're new to Dagster, we recommend working through this tutorial to become familiar with Dagster's feature set and tooling, using small examples that are intended to be illustrative of real data problems.

Before We Start

The tutorial is divided into several sections:

  • Setup for the Tutorial will give you a starting point to follow the tutorial.
  • Overview will teach you the fundamental concepts of Dagster: solids and pipelines.
  • ETL with Dagster will teach you ways to construct and execute a simple data pipeline using the basics of Dagster.
  • Advanced Tutorials will showcase Dagster's advanced features like scheduling and materializations.

What Are We Building

We'll build examples around a simple but scary .csv dataset, cereal.csv, which contains nutritional facts about 80 breakfast cereals. You can find this dataset on Github. Or, if you've cloned the dagster git repository, you'll find this dataset at dagster/examples/dagster_examples/intro_tutorial/cereal.csv

To get the flavor of this dataset, let's look at the header and the first five rows:

100% Bran,N,C,70,4,1,130,10,5,6,280,25,3,1,0.33,68.402973
100% Natural Bran,Q,C,120,3,5,15,2,8,8,135,0,3,1,1,33.983679
All-Bran with Extra Fiber,K,C,50,4,0,140,14,8,0,330,25,3,1,0.5,93.704912
Almond Delight,R,C,110,2,2,200,1,14,8,-1,25,3,1,0.75,34.384843

You can find all of the tutorial code checked into the dagster repository.