Ask AI

Repositories#

In 1.1.6, we introduced Definitions, which replaces repositories. While repositories will continue to work, we recommend migrating to Definitions. Refer to the Code locations documentation for more info.

A repository is a collection of software-defined assets, jobs, schedules, and sensors. Repositories are loaded as a unit by the Dagster CLI, Dagster webserver, and the Dagster daemon.

A convenient way to organize your job and other definitions, each repository:

  • Includes various definitions: Software-defined assets, Jobs, Schedules, and Sensors.
  • Is loaded in a different process than Dagster system processes like the webserver. Any communication between the Dagster system and repository code occurs over an RPC mechanism, ensuring that problems in repository code can't affect Dagster or other repositories.
  • Can be loaded in its own Python environment, so you can manage your dependencies (or even your own Python versions) separately.

You can set up multiple repositories and load them all at once by creating a workspace.yaml file. This can be useful for grouping jobs and other artifacts by team for organizational purposes. Refer to the Workspace documentation to learn more about setting up multiple repositories.


Relevant APIs#

NameDescription
@repositoryThe decorator used to define repositories. The decorator returns a RepositoryDefinition
RepositoryDefinitionBase class for repositories. You almost never want to use initialize this class directly. Instead, you should use the @repository which returns a RepositoryDefinition

Defining a repository#

Repositories are typically declared using the @repository decorator. For example:

from dagster import RunRequest, ScheduleDefinition, asset, job, op, repository, sensor


@asset
def asset1():
    pass


@asset
def asset2():
    pass


@asset(group_name="mygroup")
def asset3():
    pass


@op
def hello():
    pass


@job
def job1():
    hello()


@job
def job2():
    hello()


@job
def job3():
    hello()


job1_schedule = ScheduleDefinition(job=job1, cron_schedule="0 0 * * *")


@sensor(job=job2)
def job2_sensor():
    should_run = True
    if should_run:
        yield RunRequest(run_key=None, run_config={})


@repository
def my_repository():
    return [
        asset1,
        asset2,
        asset3,
        job1_schedule,
        job2_sensor,
        job3,
    ]

The repository specifies a list of items, each of which can be a AssetsDefinition, JobDefinition, ScheduleDefinition, or SensorDefinition. If you include a schedule or sensor, the job it targets will be automatically also included on the repository.


Using a repository#

If you save the code above as repo.py, you can then run the Dagster command line tools on it. Try running:

dagster dev -f repo.py

Now you can see that all the assets and jobs in this repository are listed in the left sidebar. Assets are organized in groups. In our example, asset1 and asset2 are placed in the default group because they were not explicitly assigned a group. asset3 is in mygroup.

repo-ui

You can also use -m to specify a module where the repository lives (See dagster dev for the full list of ways to locate a repository).

Loading repositories via the -f or -m options is actually just a convenience function. The underlying abstraction is the Workspace, which determines all of the available repositories available to the webserver/UI. See Workspace for more details.