If you're new to Dagster, we recommend working through this tutorial to become familiar with Dagster's feature set and tooling, using small examples that are intended to be illustrative of real data problems.
The tutorial is divided into several sections:
- Setup for the Tutorial will give you a starting point to follow the tutorial.
- Overview will teach you the fundamental concepts of Dagster: solids and pipelines.
- Building Pipelines with Dagster will teach you ways to construct and execute a simple data pipeline using the basics of Dagster.
- Basics of Solids will teach you the basics of using solids.
- Basics of Pipelines will teach you the basics of using pipelines.
- Making Your Pipelines Testable and Maintainable will show you how to test your pipelines.
- Dagster Types & Expectations covers Dagster's type system and defining expectations.
These sections will introduce some advanced features and give you deeper insight into Dagster. It's worth reading if you have needs including things like:
- Advanced: Solids demonstrates more ways you can use solids, e.g. by creating reusable solids.
- Advanced: Pipelines demonstrates configuring pipeline-wide facilities to avoiding repeated code or config with unchanged business logic.
- Advanced: Materializations demonstrates a way to make Dagster aware of your persistent artifacts outside the system.
- Advanced: Intermediates demonstrates the use of intermediates storage, for persisting the serialized "intermediate" values passed between solids.
- Advanced: Organizing Pipelines in Repositories & Workspaces these constructs are useful when you have many pipelines that you need to organize.
- Advanced: Scheduling Pipeline Runs will show you how to schedule pipelines to run at regular intervals using cron.
What Are We Building?¶
We'll build examples around a simple but scary CSV dataset,
cereal.csv, which contains nutritional
facts about 80 breakfast cereals. You can find this dataset on
Or, if you've cloned the dagster git repository, you'll find this dataset at
To get the flavor of this dataset, let's look at the header and the first five rows:
name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating 100% Bran,N,C,70,4,1,130,10,5,6,280,25,3,1,0.33,68.402973 100% Natural Bran,Q,C,120,3,5,15,2,8,8,135,0,3,1,1,33.983679 All-Bran,K,C,70,4,1,260,9,7,5,320,25,3,1,0.33,59.425505 All-Bran with Extra Fiber,K,C,50,4,0,140,14,8,0,330,25,3,1,0.5,93.704912 Almond Delight,R,C,110,2,2,200,1,14,8,-1,25,3,1,0.75,34.384843
You can find all of the tutorial code checked into the dagster repository.