Our assets will represent and transform a simple but scary CSV dataset,
cereal.csv, which contains nutritional facts about 80 breakfast cereals.
Let's write our first Dagster asset and save it as
A software-defined asset specifies an asset that you want to exist and how to compute its contents. Typically, you'll define assets by annotating ordinary Python functions with the
Our first asset represents a dataset of cereal data, downloaded from the internet.
import csv import requests from dagster import asset @asset def cereals(): response = requests.get("https://docs.dagster.io/assets/cereal.csv") lines = response.text.split("\n") cereal_rows = [row for row in csv.DictReader(lines)] return cereal_rows
In this simple case, our asset doesn't depend on any other assets.
Materializing an asset means computing its contents and then writing them to persistent storage. By default, Dagster will pickle the value returned by the function and store it in the local filesystem, using the name of the asset as the name of the file. Where and how the contents are stored is fully customizable - e.g. you might store them in a database or a cloud object store like S3. We'll look at how that works later.
Assuming you’ve saved this code as
cereal.py, you can execute it via two different mechanisms:
To visualize your assets in Dagit, run the following command. Make sure you're in the directory that contains the file with your code:
dagit -f cereal.py
You'll see output similar to:
Serving dagit on http://127.0.0.1:3000 in process 70635
You should be able to navigate to http://127.0.0.1:3000 in your web browser and view your asset.
Click the Materialize button to launch a run that materializes the asset. After the run completes, the Activity tab on the Assets page will contain information about the run.
Click the run ID in the Activity tab to view details about the run. A page including a structured stream of logs and events that occurred during the run will display:
In this view, you can filter and search through the logs corresponding to the run that's materializing your asset.
Click the cereals link in the upper left corner of the page - next to the Success indicator in the image below - to navigate back to the Asset details page:
If you'd rather materialize your asset as a script, you can do that without spinning up Dagit. Just add a few lines to
cereal.py. This executes a run within the Python process.
from dagster import materialize if __name__ == "__main__": materialize([cereals])
Now you can just run: