If you're new to Dagster, we recommend working through this tutorial to become familiar with Dagster's feature set and tooling, using small examples that are intended to be illustrative of real data problems.
Before We Start¶
The tutorial is divided into several sections:
- Setup for the Tutorial will give you a starting point to follow the tutorial.
- Overview will teach you the fundamental concepts of Dagster: solids and pipelines.
- ETL with Dagster will teach you ways to construct and execute a simple data pipeline using the basics of Dagster.
- Advanced Tutorials will showcase Dagster's advanced features like scheduling and materializations.
What Are We Building¶
We'll build examples around a simple but scary .csv dataset,
cereal.csv, which contains nutritional
facts about 80 breakfast cereals. You can find this dataset on
Or, if you've cloned the dagster git repository, you'll find this dataset at
To get the flavor of this dataset, let's look at the header and the first five rows:
name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating 100% Bran,N,C,70,4,1,130,10,5,6,280,25,3,1,0.33,68.402973 100% Natural Bran,Q,C,120,3,5,15,2,8,8,135,0,3,1,1,33.983679 All-Bran,K,C,70,4,1,260,9,7,5,320,25,3,1,0.33,59.425505 All-Bran with Extra Fiber,K,C,50,4,0,140,14,8,0,330,25,3,1,0.5,93.704912 Almond Delight,R,C,110,2,2,200,1,14,8,-1,25,3,1,0.75,34.384843
You can find all of the tutorial code checked into the dagster repository.