Ask AI

Managing multiple projects & teams in Dagster+#

In this guide, we'll cover some strategies for managing multiple projects/code bases and teams in a Dagster+ account.


Separating code bases#

In this section, repository refers to a version control system, such as Git or Mercurial.

If you want to manage complexity or divide your work into areas of responsibility, consider isolating your code bases into multiple projects with:

  • Multiple directories in a single repository, or
  • Multiple repositories

Refer to the following table for more information, including the pros and cons of each approach.

ApproachMultiple directories in a single repositoryMultiple repositories
How it worksYou can use a single repository to manage multiple projects by placing each project in a separate directory. Depending on your VCS, you may be able to set code owners to restrict who can modify each project.For stronger isolation, you can use multiple repositories to manage multiple projects.
Pros
  • Simple to implement
  • Facilitates code sharing between projects
  • Stronger isolation between projects and teams
  • Each project has its own CI/CD pipeline and be deployed independently
  • Dependencies between projects can be managed independently
Cons
  • All projects share the same CI/CD pipeline and cannot be deployed independently
  • Shared dependencies between projects may cause conflicts and require coordination between teams
Code sharing between projects require additional coordination to publish and reuse packages between projects

Deployment configuration#

Whether you use a single repository or multiple, you can use a dagster_cloud.yaml file to define the code locations to deploy. For each repository, follow the steps appropriate to your CI/CD provider and include only the code locations that are relevant to the repository in your CI/CD workflow.

Example with GitHub CI/CD on Hybrid deployment#

  1. For each repository, use the CI/CD workflow provided in Dagster+ Hybrid quickstart repository.

  2. For each project in the repository, configure a code location in the dagster_cloud.yaml file:

    # dagster_cloud.yml
    
    locations:
      - location_name: project_a
        code_source:
          package_name: project_a
        build:
          # ...
      - location_name: project_b
        code_source:
          package_name: project_b
        build:
          # ...
    
  3. In the repository's dagster-cloud-deploy.yml file, modify the CI/CD workflow to deploy all code locations for the repository:

    # .github/workflows/dagster-cloud-deploy.yml
    
    jobs:
      dagster-cloud-deploy:
        # ...
        steps:
          - name: Update build session with image tag for "project_a" code location
            id: ci-set-build-output-project-a
            if: steps.prerun.outputs.result != 'skip'
            uses: dagster-io/dagster-cloud-action/actions/utils/dagster-cloud-cli@v0.1
            with:
              command: "ci set-build-output --location-name=project_a --image-tag=$IMAGE_TAG"
    
          - name: Update build session with image tag for "project_b" code location
            id: ci-set-build-output-project-b
            if: steps.prerun.outputs.result != 'skip'
            uses: dagster-io/dagster-cloud-action/actions/utils/dagster-cloud-cli@v0.1
            with:
              command: "ci set-build-output --location-name=project_b --image-tag=$IMAGE_TAG"
          # ...
    

Isolating execution context between projects#

Separating execution context between projects can have several motivations:

  • Facilitating separation of duty between teams to prevent access to sensitive data
  • Differing compute environments and requirements, such as different architecture, cloud provider, etc.
  • Reducing impact on other projects. For example, a project with a large number of runs can impact the performance of other projects.

In order from least to most isolated, there are three levels of isolation:

Code location isolation#

If you have no specific requirements for isolation beyond the ability to deploy and run multiple projects, you can use a single agent and deployment to manage all your projects as individual code locations.

Diagram of isolation at the code location level
ProsCons
  • Simplest and most cost-effective solution
  • User access control can be set at the code location level
  • Single glass pane to view all assets
  • No isolation between execution environments

Agent isolation#

Agent queues are a Dagster+ Pro feature available on hybrid deployment.

Using the agent routing feature, you can effectively isolate execution environments between projects by using a separate agent for each project.

Motivations for utilizing this approach could include:

  • Different compute requirements, such as different cloud providers or architectures
  • Optimizing for locality or access, such as running the data processing closer or in environment with access to the storage locations
Diagram of isolation at the agent level
ProsCons
  • Isolation between execution environments
  • User access control can be set at the code location level
  • Single glass pane to view all assets
Extra work to set up additional agents and agent queues

Deployment isolation#

Multiple deployments are only available in Dagster+ Pro.

Of the approaches outlined in this guide, multiple deployments are the most isolated solution. The typical motivation for this isolation level is to separate production and non-production environments. It may be considered to satisfy other organization specific requirements.

Diagram of isolation at the Dagster+ deployment level
ProsCons
  • Isolation between assets and execution environments
  • User access control can be set at the code location and deployment level
No single glass pane to view all assets (requires switching between multiple deployments in the UI)