Before We Start
If you're new to Dagster, we recommend working through this tutorial to become familiar with Dagster's feature set and tooling, using small examples that are intended to be illustrative of real data problems.
Tutorials¶
The tutorial is divided into several sections:
- Setup for the Tutorial will give you a starting point to follow the tutorial.
- Overview will teach you the fundamental concepts of Dagster: solids and pipelines.
- Building Pipelines with Dagster will teach you ways to construct and execute a simple data pipeline.
- Basics of Solids will teach you the basics of using solids.
- Basics of Pipelines will teach you the basics of using pipelines.
- Making Your Pipelines Testable and Maintainable will show you how to test your pipelines.
- Dagster Types & Expectations covers Dagster's type system and defining expectations.
Advanced Tutorials¶
These sections will introduce some advanced features and give you deeper insight into Dagster.
- Advanced: Solids demonstrates more ways you can use solids, e.g. by creating reusable solids.
- Advanced: Pipelines demonstrates configuring pipeline-wide facilities to avoiding repeated code or config with unchanged business logic.
- Advanced: Materializations demonstrates a way to make Dagster aware of your persistent artifacts outside the system.
- Advanced: Organizing Pipelines in Repositories & Workspaces demonstrates constructs that are useful when you have many pipelines that you need to organize.
- Advanced: Scheduling Pipeline Runs will show you how to schedule pipelines to run at regular intervals using cron.
What Are We Building?¶
We'll build examples around a simple but scary CSV dataset, cereal.csv
, which contains nutritional
facts about 80 breakfast cereals. You can find this dataset on Github.
Or, if you've cloned the Dagster git repository, you'll find this dataset at
dagster/examples/docs_snippets/docs_snippets/intro_tutorial/cereal.csv
To get the flavor of this dataset, let's look at the header and the first five rows:
name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
100% Bran,N,C,70,4,1,130,10,5,6,280,25,3,1,0.33,68.402973
100% Natural Bran,Q,C,120,3,5,15,2,8,8,135,0,3,1,1,33.983679
All-Bran,K,C,70,4,1,260,9,7,5,320,25,3,1,0.33,59.425505
All-Bran with Extra Fiber,K,C,50,4,0,140,14,8,0,330,25,3,1,0.5,93.704912
Almond Delight,R,C,110,2,2,200,1,14,8,-1,25,3,1,0.75,34.384843
You can find all of the tutorial code checked into the Dagster repository.