Dagster+ Serverless is a fully managed version of Dagster+, and is the easiest way to get started with Dagster. With Serverless, you can run your Dagster jobs without spinning up any infrastructure.
Serverless works best with workloads that primarily orchestrate other services or perform light computation. Most workloads fit into this category, especially those that orchestrate third-party SaaS products like cloud data warehouses and ETL tools.
If any of the following are applicable, you should select Hybrid deployment:
You require substantial computational resources. For example, training a large machine learning (ML) model in-process.
Your dataset is too large to fit in memory. For example, training a large machine learning (ML) model in-process on a terabyte of data.
You need to distribute computation across many nodes for a single run. Dagster+ runs currently execute on a single node with 4 CPUs.
You don't want to add Dagster Labs as a data processor.
If you are a GitHub user, our GitHub integration is the fastest way to get started. It uses a GitHub app and GitHub Actions to set up a repo containing skeleton code and configuration consistent with Dagster+'s best practices with a single click.
When you create a new Dagster+ organization, you'll be prompted to choose Serverless or Hybrid deployment. Once activated, our GitHub integration will scaffold a new git repo for you with Serverless and Branch Deployments already configured. Pushing to the main branch will deploy to your prod Serverless deployment. Pull requests will spin up ephemeral branch deployments using the Serverless agent.
If you are a Gitlab user, our Gitlab integration is the fastest way to get started. It uses a Gitlab app to set up a repo containing skeleton code and CI/CD configuration consistent with Dagster+'s best practices with a single click.
When you create a new Dagster+ organization, you'll be prompted to choose Serverless or Hybrid deployment. Once activated, our Gitlab integration will scaffold a new git repo for you with Serverless and Branch Deployments already configured. Pushing to the main branch will deploy to your prod Serverless deployment. Pull requests will spin up ephemeral branch deployments using the Serverless agent.
Dagster+ Serverless packages your code as PEX files and deploys them on Docker images. Using PEX files significantly reduces the time to deploy since it does not require building a new Docker image and provisioning a new container for every code change. Many apps will work fine with the default Dagster+ Serverless setup. However, some apps may need to make changes to the runtime environment, either to include data files, use a different base image, different Python version, or install some native dependencies. You can customize the runtime environment using various methods described below.
To add data files to your deployment, use the Data Files Support built into Python's setup.py. This requires adding a package_data or include_package_data keyword in the call to setup() in setup.py. For example, given this directory structure:
If you want to include the data folder, modify your setup.py to add the package_data line:
# setup.pyfrom setuptools import find_packages, setup
if __name__ =="__main__":
setup(
name="my_dagster_project",
packages=find_packages(exclude=["my_dagster_project_tests"]),# Add the following line. Here "data/*" is relative to the my_dagster_project sub directory.
package_data={"my_dagster_project":["data/*"]},
install_requires=["dagster",...],)
The default version of Python for Serverless deployments is Python 3.8. Versions 3.9 through 3.12 are also supported. You can specify the version you want by updating your GitHub workflow or using the --python-version command line argument:
With GitHub: Change the python_version parameter for the build_deploy_python_executable job in your .github/workflows files. For example:
-name: Build and deploy Python executable
if: env.ENABLE_FAST_DEPLOYS == 'true'
uses: dagster-io/dagster-cloud-action/actions/build_deploy_python_executable@pex-v0.1
with:dagster_cloud_file:"$GITHUB_WORKSPACE/project-repo/dagster_cloud.yaml"build_output_dir:"$GITHUB_WORKSPACE/build"python_version:"3.9"# Change this value to the desired Python versionenv:GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
With the CLI: Add the --python-version CLI argument to the deploy command to specify the registry path to the desired base image:
Using a different base image or using native dependencies#
Dagster+ runs your code on a Docker image that we build as follows:
The standard Python "slim" Docker image, such as python:3.8-slim is used as the base.
The dagster-cloud[serverless] module installed in the image.
As far as possible, add all dependencies by including the corresponding native Python bindings in your setup.py. When that is not possible, you can build and upload a custom base image that will be used to run your Python code.
To build and upload the image, use the command line:
Build your Docker image using docker build or your usual Docker toolchain. Ensure the dagster-cloud[serverless] dependency is included. You can do this by adding the following to your Dockerfile:
RUN pip install"dagster-cloud[serverless]"
Upload your Docker image to Dagster+ using the upload-base-image command. Note that this command prints out the tag used in Dagster+ to identify your image:
$ dagster-cloud serverless upload-base-image local-image:tag
...
To use the uploaded image run: dagster-cloud deploy-python-executable ... --base-image-tag=sha256_518ad2f92b078c63c60e89f0310f13f19d3a1c7ea9e1976d67d59fcb7040d0d6
To use a Docker image you have published to Dagster+, use the --base-image-tag tag printed out by the above command.
With GitHub: Set the SERVERLESS_BASE_IMAGE_TAG environment variable in your GitHub Actions configuration (usually at .github/workflows/dagster-cloud-deploy.yml):
With the CLI: Use the deploy command instead of the deploy-python-executable command:
dagster-cloud serverless deploy \
--location-name example \
--package-name assets_modern_data_stack
The Docker image deployed can be customized using either lifecycle hooks or customizing the base image.
This method is the easiest to set up, and does not require setting up any additional infrastructure.
In the root of your repo, you can provide two optional shell scripts: dagster_cloud_pre_install.sh and dagster_cloud_post_install.sh. These will run before and after Python dependencies are installed. They are useful for installing any non-Python dependencies or otherwise configuring your environment.
This method is the most flexible, but requires setting up a pipeline outside of Dagster to build a custom base image.
The default base image is debian:bullseye-slim, but it can be changed.
With GitHub: Provide a base_image input parameter to the Build and deploy step in your GitHub Actions configuration (usually at .github/workflows/dagster-cloud-deploy.yml):
-name: Build and deploy to Dagster+ serverless
uses: dagster-io/dagster-cloud-action/actions/serverless_prod_deploy@v0.1
with:dagster_cloud_api_token: ${{ secrets.DAGSTER_CLOUD_API_TOKEN }}location: ${{ toJson(matrix.location) }}# Use a custom base imagebase_image:"my_base_image:latest"organization_id: ${{ secrets.ORGANIZATION_ID }}env:GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
With the CLI: Add the --base-image CLI argument to the deploy command to specify the registry path to the desired base image:
If your organization begins to hit the limitations of Serverless, you should transition to a Hybrid deployment. Hybrid deployments allow you to run an agent in your own infrastructure and give you substantially more flexibility and control over the Dagster environment.
To switch to Hybrid, navigate to Status > Agents in your Dagster+ account. On this page, an organization administrator can disable the Serverless agent on and view instructions for enabling Hybrid.
After changing the deployment type, you will need to update your code locations' images and configuration to be compatible with the type of Hybrid agent that you chose. Complete the following steps to finalize the transition:
Update your code locations' configuration in dagster_cloud.yaml to work with your agent. Refer to the reference for your agent type for more information:
Update your build process to publish a new container image and configuration for each code location. To use Dagster's CI/CD process, refer to Step 4 of the Dagster+ getting started guide.
Replace Serverless-only features with their Hybrid equivalents:
Lifecycle hooks - To customize a code location's runtime environment, customize the code location's Dockerfile to build its image
The default I/O manager cannot be used if you are a Serverless user who:
Works with personally identifiable information (PII)
Works with private health information (PHI)
Has signed a business association agreement (BAA), or
Are otherwise working with data subject to GDPR or other such regulations
In Serverless, code that uses the default I/O manager is automatically adjusted to save data in Dagster+ managed storage. This automatic change is useful because the default file system in Serverless is ephemeral, which means the default I/O manager wouldn't work as expected. However, this automatic change means potentially sensitive data is being stored, not just processed or orchestrated, by Dagster+.
To avoid this behavior, you can:
Use an I/O manager that stores data in your infrastructure
Dagster+ Serverless offers two settings for run isolation: isolated and non-isolated. Non-isolated runs are for iterating quickly and trade off isolation for speed. Isolated runs are for production and compute heavy Assets/Jobs.
Isolated runs each take place in their own container with their own compute resources: 4 cpu cores and 16GB of RAM.
These runs may take up to 3 minutes to start while these resources are provisioned.
When launching runs manually, select Isolate run environment in the Launchpad to launch an isolated runs. Scheduled, sensor, and backfill runs are always isolated.
Note: if non-isolated runs aren't enabled (see the section below), the toggle won't appear and all runs will be isolated.
This can be enabled or disabled in deployment settings with
non_isolated_runs:enabled:True
Non-isolated runs provide a faster start time by using a standing, shared container for each code location.
They have fewer compute resources: 0.25 vCPU cores and 1GB of RAM. These resources are shared with other processes for a code location like sensors. As a result, it's recommended to use isolated runs for compute intensive jobs and asset materializations.
While launching runs from the Launchpad, uncheck Isolate run environment. When materializing an asset, shift-click Materialize all and uncheck it in the modal.
By default only one non-isolated run will execute at once. While a run is in progress, the Launchpad will swap to only launching isolated runs.
This limit can be configured in deployment settings. Take caution; The limit is in place to help wih avoiding crashes due to OOMs.