In all of the examples we've seen so far in this tutorial, we've specified a file (
-f) in order to tell the CLI tools how to load a single job, e.g.:
dagit -f hello_cereal.py dagster job execute -f hello_cereal.py
But most of the time, especially when working on long-running projects with other people, we will want to be able to target many jobs at once with our tools.
Dagster provides the concept of a repository, a collection of jobs (and other definitions) that the Dagster tools can target as a whole. Repositories are declared using the
repository decorator as follows:
@repository def hello_cereal_repository(): return [hello_cereal_job, complex_job]
This method returns a list of items, each one of which can be a job, a schedule, or a sensor.
If you save this file as
repos.py, you can then run the command line tools on it. Try running:
dagit -f repos.py
Now you can see the list of all jobs in the repo on the left:
Loading repositories via the
-f option is actually just a convenience function. The underlying abstraction is the "workspace", which determines the set of repositories available to Dagit. Dagster allows you to configure your workspace in a
workspace.yaml file, which keeps you from having to type the same flags repeatedly to dagit, and allows you to load repositories from multiple locations.
The following config will load repositories from a single file, just like the
-f CLI option. The file path is specified relative to the folder containing the
load_from: - python_file: repos.py
Dagit will look for
workspace.yaml in the current directory by default, so now you can launch Dagit from that directory with no arguments.
You can use the
python_package config option to load jobs from local or installed Python packages. For example, you can
pip install our tutorial code snippets as a Python package:
pip install -e docs_snippets # Run this in the `examples/` directory.
Then, you can configure your
workspace.yaml to load jobs from this package.
load_from: - python_package: docs_snippets.intro_tutorial.advanced.repositories.repos
You can also use
workspace.yaml to load multiple repositories. This can be useful for organization purposes, in order to group jobs and other artifacts by team.
load_from: - python_package: docs_snippets.intro_tutorial.advanced.repositories.repos - python_file: repos.py
dagit -w multi_repo_workspace.yaml
And now you can switch between repositories in Dagit.
Sometimes teams desire different Python versions or virtual environments. To support this, Dagster repositories each live in completely separate processes from each other, and tools like Dagit communicate with those repositories using a cross-process RPC protocol.
Via the workspace, you can configure a different executable_path for each of your repositories. For example:
load_from: - python_file: relative_path: "/path/to/team/pipelines.py" executable_path: "/path/to/team/virtualenv/bin/python" - python_package: package_name: "other_team_package" executable_path: "/path/to/other_/virtualenv/bin/python"
For this example to work, you need to change the executable paths to point a virtual environment available in your system, and point to a Python file or package that is available and loadable by that executable.