Code location snapshot size limits
When a code location is deployed, Dagster+ stores a serialized snapshot of its definitions — a repository snapshot describing all of your assets, jobs, schedules, sensors, and resources, plus a separate snapshot for each job — including the built-in job Dagster uses to materialize your assets when a run doesn't target a specific named job. These snapshots are what the Dagster+ UI and agents read to display and run your definitions.
Each individual snapshot can be at most 130 MB uncompressed.
In practice you'll usually hit a different limit first. Dagster loads snapshots from
your code server over gRPC, and gRPC messages are limited to 100 MB by default.
A snapshot larger than that fails to load with a gRPC RESOURCE_EXHAUSTED error
before it ever reaches the 130 MB storage limit. You can raise the gRPC limit with
the DAGSTER_GRPC_MAX_RX_BYTES and DAGSTER_GRPC_MAX_SEND_BYTES environment
variables, which must be set on both the agent and the code server processes
(the code server sends the snapshot and the agent receives it, so both ends enforce
the limit). However, rather than working around it, a snapshot that large is best reduced — see
Reducing snapshot size.
Estimating your snapshot size locally
You can estimate the serialized size of your repository snapshot and each job
snapshot before deploying, directly from your Definitions:
from dagster import Definitions, asset, serialize_value
from dagster._core.remote_representation.external_data import RepositorySnap
from dagster._core.snap.job_snapshot import JobSnap
# Replace these example assets with your own Definitions object.
@asset
def upstream() -> None: ...
@asset(deps=[upstream])
def downstream() -> None: ...
defs = Definitions(assets=[upstream, downstream])
def _mb(serialized: str) -> float:
return len(serialized.encode("utf-8")) / 1024 / 1024
repo_def = defs.get_repository_def()
# Repository snapshot. defer_snapshots=True matches how Dagster+ stores it: job
# snapshots are persisted separately, so the repository snapshot holds job
# references rather than full job data.
repo_snap = RepositorySnap.from_def(repo_def, defer_snapshots=True)
print(f"Repository snapshot: {_mb(serialize_value(repo_snap)):.2f} MB")
# Each job snapshot. get_all_jobs() includes the built-in asset job
# ("__ASSET_JOB") that Dagster runs when no named job is specified.
for job_def in sorted(repo_def.get_all_jobs(), key=lambda j: j.name):
job_snap = JobSnap.from_job_def(job_def)
print(f"Job '{job_def.name}' snapshot: {_mb(serialize_value(job_snap)):.2f} MB")
This prints the size of the repository snapshot and of every job snapshot, so you can see which is approaching the limit.
Reducing snapshot size
If a snapshot is large or approaching the limit, the most effective levers are usually, in order:
- Upgrade Dagster. Newer versions serialize snapshots more compactly (for example, by no longer persisting redundant fields), so upgrading the version your code location runs can shrink the snapshot with no changes to your definitions.
- Reduce repeated per-asset metadata. Asset metadata is stored on every asset node. If the same metadata block is attached to many assets, consider attaching it once at a higher level or trimming redundant entries. Large metadata values (long descriptions, embedded tables, big tag sets) on individual assets also add up.
- Split into multiple code locations. If a single code location genuinely defines a large number of assets, splitting it divides the snapshots and lets them load in parallel. For more information, see creating workspaces.