Ask AI

Changelog#

1.9.6 (core) / 0.25.6 (libraries)#

New#

  • Updated cronitor pin to allow versions >= 5.0.1 to enable use of DayOfWeek as 7. Cronitor 4.0.0 is still disallowed. (Thanks, @joshuataylor!)
  • Added flag checkDbReadyInitContainer to optionally disable db check initContainer.
  • [ui] Added Google Drive icon for kind tags. (Thanks, @dragos-pop!)
  • [ui] Renamed the run lineage sidebar on the Run details page to Re-executions.
  • [ui] Sensors and schedules that appear in the Runs page are now clickable.
  • [ui] Runs targeting assets now show more of the assets in the Runs page.
  • [dagster-airbyte] The destination type for an Airbyte asset is now added as a kind tag for display in the UI.
  • [dagster-gcp] DataprocResource now receives an optional parameter labels to be attached to Dataproc clusters. (Thanks, @thiagoazcampos!)
  • [dagster-k8s] Added a checkDbReadyInitContainer flag to the Dagster Helm chart to allow disabling the default init container behavior. (Thanks, @easontm!)
  • [dagster-k8s] K8s pod logs are now logged when a pod fails. (Thanks, @apetryla!)
  • [dagster-sigma] Introduced build_materialize_workbook_assets_definition which can be used to build assets that run materialize schedules for a Sigma workbook.
  • [dagster-snowflake] SnowflakeResource and SnowflakeIOManager both accept additional_snowflake_connection_args config. This dictionary of arguments will be passed to the snowflake.connector.connect method. This config will be ignored if you are using the sqlalchemy connector.
  • [helm] Added the ability to set user-deployments labels on k8s deployments as well as pods.

Bugfixes#

  • Assets with self dependencies and BackfillPolicy are now evaluated correctly during backfills. Self dependent assets no longer result in serial partition submissions or disregarded upstream dependencies.
  • Previously, the freshness check sensor would not re-evaluate freshness checks if an in-flight run was planning on evaluating that check. Now, the freshness check sensor will kick off an independent run of the check, even if there's already an in flight run, as long as the freshness check can potentially fail.
  • Previously, if the freshness check was in a failing state, the sensor would wait for a run to update the freshness check before re-evaluating. Now, if there's a materialization later than the last evaluation of the freshness check and no planned evaluation, we will re-evaluate the freshness check automatically.
  • [ui] Fixed run log streaming for runs with a large volume of logs.
  • [ui] Fixed a bug in the Backfill Preview where a loading spinner would spin forever if an asset had no valid partitions targeted by the backfill.
  • [dagster-aws] PipesCloudWatchMessageReader correctly identifies streams which are not ready yet and doesn't fail on ThrottlingException. (Thanks, @jenkoian!)
  • [dagster-fivetran] Column metadata can now be fetched for Fivetran assets using FivetranWorkspace.sync_and_poll(...).fetch_column_metadata().
  • [dagster-k8s] The k8s client now waits for the main container to be ready instead of only waiting for sidecar init containers. (Thanks, @OrenLederman!)

Documentation#

  • Fixed a typo in the dlt_assets API docs. (Thanks, @zilto!)

1.9.5 (core) / 0.25.5 (libraries)#

New#

  • The automatic run retry daemon has been updated so that there is a single source of truth for if a run will be retried and if the retry has been launched. Tags are now added to run at failure time indicating if the run will be retried by the automatic retry system. Once the automatic retry has been launched, the run ID of the retry is added to the original run.
  • When canceling a backfill of a job, the backfill daemon will now cancel all runs launched by that backfill before marking the backfill as canceled.
  • Dagster execution info (tags such as dagster/run-id, dagster/code-location, dagster/user and Dagster Cloud environment variables) typically attached to external resources are now available under DagsterRun.dagster_execution_info.
  • SensorReturnTypesUnion is now exported for typing the output of sensor functions.
  • [dagster-dbt] dbt seeds now get a valid code version (Thanks @marijncv!).
  • Manual and automatic retries of runs launched by backfills that occur while the backfill is still in progress are now incorporated into the backfill's status.
  • Manual retries of runs launched by backfills are no longer considered part of the backfill if the backfill is complete when the retry is launched.
  • [dagster-fivetran] Fivetran assets can now be materialized using the FivetranWorkspace.sync_and_poll(…) method in the definition of a @fivetran_assets decorator.
  • [dagster-fivetran] load_fivetran_asset_specs has been updated to accept an instance of DagsterFivetranTranslator or custom subclass.
  • [dagster-fivetran] The fivetran_assets decorator was added. It can be used with the FivetranWorkspace resource and DagsterFivetranTranslator translator to load Fivetran tables for a given connector as assets in Dagster. The build_fivetran_assets_definitions factory can be used to create assets for all the connectors in your Fivetran workspace.
  • [dagster-aws] ECSPipesClient.run now waits up to 70 days for tasks completion (waiter parameters are configurable) (Thanks @jenkoian!)
  • [dagster-dbt] Update dagster-dbt scaffold template to be compatible with uv (Thanks @wingyplus!).
  • [dagster-airbyte] A load_airbyte_cloud_asset_specs function has been added. It can be used with the AirbyteCloudWorkspace resource and DagsterAirbyteTranslator translator to load your Airbyte Cloud connection streams as external assets in Dagster.
  • [ui] Add an icon for the icechunk kind.
  • [ui] Improved ui for manual sensor/schedule evaluation.

Bugfixes#

  • Fixed database locking bug for the ConsolidatedSqliteEventLogStorage, which is mostly used for tests.
  • [dagster-aws] Fixed a bug in the ECSRunLauncher that prevented it from accepting a user-provided task definition when DAGSTER_CURRENT_IMAGE was not set in the code location.
  • [ui] Fixed an issue that would sometimes cause the asset graph to fail to render on initial load.
  • [ui] Fix global auto-materialize tick timeline when paginating.

1.9.4 (core) / 0.25.4 (libraries)#

New#

  • Global op concurrency is now enabled on the default SQLite storage. Deployments that have not been migrated since 1.6.0 may need to run dagster instance migrate to enable.
  • Introduced map_asset_specs to enable modifying AssetSpecs and AssetsDefinitions in bulk.
  • Introduced AssetSpec.replace_attributes and AssetSpec.merge_attributes to easily alter properties of an asset spec.
  • [ui] Add a "View logs" button to open tick logs in the sensor tick history table.
  • [ui] Add Spanner kind icon.
  • [ui] The asset catalog now supports filtering using the asset selection syntax.
  • [dagster-pipes, dagster-aws] PipesS3MessageReader now has a new parameter include_stdio_in_messages which enables log forwarding to Dagster via Pipes messages.
  • [dagster-pipes] Experimental: A new Dagster Pipes message type log_external_stream has been added. It can be used to forward external logs to Dagster via Pipes messages.
  • [dagster-powerbi] Opts in to using admin scan APIs to pull data from a Power BI instance. This can be disabled by passing load_powerbi_asset_specs(..., use_workspace_scan=False).
  • [dagster-sigma] Introduced an experimental dagster-sigma snapshot command, allowing Sigma workspaces to be captured to a file for faster subsequent loading.

Bugfixes#

  • Fixed a bug that caused DagsterExecutionStepNotFoundError errors when trying to execute an asset check step of a run launched by a backfill.
  • Fixed an issue where invalid cron strings like "0 0 30 2 *" that represented invalid dates in February were still allowed as Dagster cron strings, but then failed during schedule execution. Now, these invalid cronstrings will raise an exception when they are first loaded.
  • Fixed a bug where owners added to AssetOuts when defining a @graph_multi_asset were not added to the underlying AssetsDefinition.
  • Fixed a bug where using the & or | operators on AutomationConditions with labels would cause that label to be erased.
  • [ui] Launching partitioned asset jobs from the launchpad now warns if no partition is selected.
  • [ui] Fixed unnecessary middle truncation occurring in dialogs.
  • [ui] Fixed timestamp labels and "Now" line rendering bugs on the sensor tick timeline.
  • [ui] Opening Dagster's UI with a single job defined takes you to the Overview page rather than the Job page.
  • [ui] Fix stretched tags in backfill table view for non-partitioned assets.
  • [ui] Open automation sensor evaluation details in a dialog instead of navigating away.
  • [ui] Fix scrollbars in dark mode.
  • [dagster-sigma] Workbooks filtered using a SigmaFilter no longer fetch lineage information.
  • [dagster-powerbi] Fixed an issue where reports without an upstream dataset dependency would fail to translate to an asset spec.

Deprecations#

  • [dagster-powerbi] DagsterPowerBITranslator.get_asset_key is deprecated in favor of DagsterPowerBITranslator.get_asset_spec().key
  • [dagster-looker] DagsterLookerApiTranslator.get_asset_key is deprecated in favor of DagsterLookerApiTranslator.get_asset_spec().key
  • [dagster-sigma] DagsterSigmaTranslator.get_asset_key is deprecated in favor of DagsterSigmaTranslator.get_asset_spec().key
  • [dagster-tableau] DagsterTableauTranslator.get_asset_key is deprecated in favor of DagsterTableauTranslator.get_asset_spec().key

1.9.3 (core) / 0.25.3 (libraries)#

New#

  • Added run_id to the run_tags index to improve database performance. Run dagster instance migrate to update the index. (Thanks, @HynekBlaha!)

  • Added icons for kind tags: Cassandra, ClickHouse, CockroachDB, Doris, Druid, Elasticsearch, Flink, Hadoop, Impala, Kafka, MariaDB, MinIO, Pinot, Presto, Pulsar, RabbitMQ, Redis, Redpanda, ScyllaDB, Starrocks, and Superset. (Thanks, @swrookie!)

  • Added a new icon for the Denodo kind tag. (Thanks, @tintamarre!)

  • Errors raised from defining more than one Definitions object at module scope now include the object names so that the source of the error is easier to determine.

  • [ui] Asset metadata entries like dagster/row_count now appear on the events page and are properly hidden on the overview page when they appear in the sidebar.

  • [dagster-aws] PipesGlueClient now attaches AWS Glue metadata to Dagster results produced during Pipes invocation.

  • [dagster-aws] PipesEMRServerlessClient now attaches AWS EMR Serverless metadata to Dagster results produced during Pipes invocation and adds Dagster tags to the job run.

  • [dagster-aws] PipesECSClient now attaches AWS ECS metadata to Dagster results produced during Pipes invocation and adds Dagster tags to the ECS task.

  • [dagster-aws] PipesEMRClient now attaches AWS EMR metadata to Dagster results produced during Pipes invocation.

  • [dagster-databricks] PipesDatabricksClient now attaches Databricks metadata to Dagster results produced during Pipes invocation and adds Dagster tags to the Databricks job.

  • [dagster-fivetran] Added load_fivetran_asset_specs function. It can be used with the FivetranWorkspace resource and DagsterFivetranTranslator translator to load your Fivetran connector tables as external assets in Dagster.

  • [dagster-looker] Errors are now handled more gracefully when parsing derived tables.

  • [dagster-sigma] Sigma assets now contain extra metadata and kind tags.

  • [dagster-sigma] Added support for direct workbook to warehouse table dependencies.

  • [dagster-sigma] Added include_unused_datasets field to SigmaFilter to disable pulling datasets that aren't used by a downstream workbook.

  • [dagster-sigma] Added skip_fetch_column_data option to skip loading Sigma column lineage. This can speed up loading large instances.

  • [dagster-sigma] Introduced an experimental dagster-sigma snapshot command, allowing Sigma workspaces to be captured to a file for faster subsequent loading.

    Introducing: dagster-airlift (experimental)#

    dagster-airlift is coming out of stealth. See the initial Airlift RFC here, and the following documentation to learn more:

    More Airflow-related content is coming soon! We'd love for you to check it out, and post any comments / questions in the #airflow-migration channel in the Dagster slack.

Bugfixes#

  • Fixed a bug in run status sensors where setting incompatible arguments monitor_all_code_locations and monitored_jobs did not raise the expected error. (Thanks, @apetryla!)
  • Fixed an issue that would cause the label for AutomationCondition.any_deps_match() and AutomationCondition.all_deps_match() to render incorrectly when allow_selection or ignore_selection were set.
  • Fixed a bug which could cause code location load errors when using CacheableAssetsDefinitions in code locations that contained AutomationConditions
  • Fixed an issue where the default multiprocess executor kept holding onto subprocesses after their step completed, potentially causing Too many open files errors for jobs with many steps.
  • [ui] Fixed an issue introduced in 1.9.2 where the backfill overview page would sometimes display extra assets that were targeted by the backfill.
  • [ui] Fixed "Open in Launchpad" button when testing a schedule or sensor by ensuring that it opens to the correct deployment.
  • [ui] Fixed an issue where switching a user setting was immediately saved, rather than waiting for the change to be confirmed.
  • [dagster-looker] Unions without unique/distinct criteria are now properly handled.
  • [dagster-powerbi] Fixed an issue where reports without an upstream dataset dependency would fail to translate to an asset spec.
  • [dagster-sigma] Fixed an issue where API fetches did not paginate properly.

Documentation#

Dagster Plus#

  • [ui] Fixed an issue with filtering and catalog search in branch deployments.
  • [ui] Fixed an issue where the asset graph would reload unexpectedly.

1.9.2 (core) / 0.25.2 (libraries)#

New#

  • Introduced a new constructor, AssetOut.from_spec, that will construct an AssetOut from an AssetSpec.
  • [ui] Column tags are now displayed in the Column name section of the asset overview page.
  • [ui] Introduced an icon for the gcs (Google Cloud Storage) kind tag.
  • [ui] Introduced icons for report and semanticmodel kind tags.
  • [ui] The tooltip for a tag containing a cron expression now shows a human-readable, timezone-aware cron string.
  • [ui] Asset check descriptions are now sourced from docstrings and rendered in the UI. (Thanks, @marijncv!)
  • [dagster-aws] Added option to propagate tags to ECS tasks when using the EcsRunLauncher. (Thanks, @zyd14!)
  • [dagster-dbt] You can now implement DagsterDbtTranslator.get_code_version to customize the code version for your dbt assets. (Thanks, @Grzyblon!)
  • [dagster-pipes] Added the ability to pass arbitrary metadata to PipesClientCompletedInvocation. This metadata will be attached to all materializations and asset checks stored during the pipes invocation.
  • [dagster-powerbi] During a full workspace scan, owner and column metadata is now automatically attached to assets.

Bugfixes#

  • Fixed an issue with AutomationCondition.execution_in_progress which would cause it to evaluate to True for unpartitioned assets that were part of a run that was in progress, even if the asset itself had already been materialized.
  • Fixed an issue with AutomationCondition.run_in_progress that would cause it to ignore queued runs.
  • Fixed an issue that would cause a default_automation_condition_sensor to be constructed for user code servers running on dagster version < 1.9.0 even if the legacy auto_materialize: use_sensors configuration setting was set to False.
  • [ui] Fixed an issue when executing asset checks where the wrong job name was used in some situations. The correct job name is now used.
  • [ui] Selecting assets with 100k+ partitions no longer causes the asset graph to temporarily freeze.
  • [ui] Fixed an issue that could cause a GraphQL error on certain pages after removing an asset.
  • [ui] The asset events page no longer truncates event history in cases where both materialization and observation events are present.
  • [ui] The backfill coordinator logs tab no longer sits in a loading state when no logs are available to display.
  • [ui] Fixed issue which would cause the "Partitions evaluated" label on an asset's automation history page to incorrectly display 0 in cases where all partitions were evaluated.
  • [ui] Fix "Open in Playground" link when testing a schedule or sensor by ensuring that it opens to the correct deployment.
  • [ui] Fixed an issue where the asset graph would reload unexpectedly.
  • [dagster-dbt] Fixed an issue where the SQL filepath for a dbt model was incorrectly resolved when the dbt manifest file was built on a Windows machine, but executed on a Unix machine.
  • [dagster-pipes] Asset keys containing embedded / characters now work correctly with Dagster Pipes.

Documentation#

Deprecations#

  • The types-sqlalchemy package is no longer included in the dagster[pyright] extra package.

Dagster Plus#

  • [ui] The Environment Variables table can now be sorted by name and update time.
  • [ui] The code location configuration dialog now contains more metadata about the code location.
  • [ui] Fixed an issue where the incorrect user icons were shown in the Users table when a search filter had been applied.

1.9.1 (core) / 0.25.1 (libraries)#

New#

  • dagster project scaffold now has an option to create dagster projects from templates with excluded files/filepaths.
  • [ui] Filters in the asset catalog now persist when navigating subdirectories.
  • [ui] The Run page now displays the partition(s) a run was for.
  • [ui] Filtering on owners/groups/tags is now case-insensitive.
  • [dagster-tableau] the helper function parse_tableau_external_and_materializable_asset_specs is now available to parse a list of Tableau asset specs into a list of external asset specs and materializable asset specs.
  • [dagster-looker] Looker assets now by default have owner and URL metadata.
  • [dagster-k8s] Added a per_step_k8s_config configuration option to the k8s_job_executor, allowing the k8s configuration of individual steps to be configured at run launch time (thanks @Kuhlwein!)
  • [dagster-fivetran] Introduced DagsterFivetranTranslator to customize assets loaded from Fivetran.
  • [dagster-snowflake] dagster_snowflake.fetch_last_updated_timestamps now supports ignoring tables not found in Snowflake instead of raising an error.

Bugfixes#

  • Fixed issue which would cause a default_automation_condition_sensor to be constructed for user code servers running on dagster version < 1.9.0 even if the legacy auto_materialize: use_sensors configuration setting was set to False.
  • Fixed an issue where running dagster instance migrate on Dagster version 1.9.0 constructed a SQL query that exceeded the maximum allowed depth.
  • Fixed an issue where wiping a dynamically partitioned asset causes an error.
  • [dagster-polars] ImportErrors are no longer raised when bigquery libraries are not installed [#25708]

Documentation#

  • [dagster-dbt] A guide on how to use dbt defer with Dagster branch deployments has been added to the dbt reference.

0.8.1#

Bugfix

  • Fixed a file descriptor leak that caused OSError: [Errno 24] Too many open files when enough temporary files were created.
  • Fixed an issue where an empty config in the Playground would unexpectedly be marked as invalid YAML.
  • Removed "config" deprecation warnings for dask and celery executors.

New

  • Improved performance of the Assets page.

0.8.0 "In The Zone"#

Major Changes

Please see the 080_MIGRATION.md migration guide for details on updating existing code to be compatible with 0.8.0

  • Workspace, host and user process separation, and repository definition Dagit and other tools no longer load a single repository containing user definitions such as pipelines into the same process as the framework code. Instead, they load a "workspace" that can contain multiple repositories sourced from a variety of different external locations (e.g., Python modules and Python virtualenvs, with containers and source control repositories soon to come).

    The repositories in a workspace are loaded into their own "user" processes distinct from the "host" framework process. Dagit and other tools now communicate with user code over an IPC mechanism. This architectural change has a couple of advantages:

    • Dagit no longer needs to be restarted when there is an update to user code.
    • Users can use repositories to organize their pipelines, but still work on all of their repositories using a single running Dagit.
    • The Dagit process can now run in a separate Python environment from user code so pipeline dependencies do not need to be installed into the Dagit environment.
    • Each repository can be sourced from a separate Python virtualenv, so teams can manage their dependencies (or even their own Python versions) separately.

    We have introduced a new file format, workspace.yaml, in order to support this new architecture. The workspace yaml encodes what repositories to load and their location, and supersedes the repository.yaml file and associated machinery.

    As a consequence, Dagster internals are now stricter about how pipelines are loaded. If you have written scripts or tests in which a pipeline is defined and then passed across a process boundary (e.g., using the multiprocess_executor or dagstermill), you may now need to wrap the pipeline in the reconstructable utility function for it to be reconstructed across the process boundary.

    In addition, rather than instantiate the RepositoryDefinition class directly, users should now prefer the @repository decorator. As part of this change, the @scheduler and @repository_partitions decorators have been removed, and their functionality subsumed under @repository.

  • Dagit organization The Dagit interface has changed substantially and is now oriented around pipelines. Within the context of each pipeline in an environment, the previous "Pipelines" and "Solids" tabs have been collapsed into the "Definition" tab; a new "Overview" tab provides summary information about the pipeline, its schedules, its assets, and recent runs; the previous "Playground" tab has been moved within the context of an individual pipeline. Related runs (e.g., runs created by re-executing subsets of previous runs) are now grouped together in the Playground for easy reference. Dagit also now includes more advanced support for display of scheduled runs that may not have executed ("schedule ticks"), as well as longitudinal views over scheduled runs, and asset-oriented views of historical pipeline runs.

  • Assets Assets are named materializations that can be generated by your pipeline solids, which support specialized views in Dagit. For example, if we represent a database table with an asset key, we can now index all of the pipelines and pipeline runs that materialize that table, and view them in a single place. To use the asset system, you must enable an asset-aware storage such as Postgres.

  • Run launchers The distinction between "starting" and "launching" a run has been effaced. All pipeline runs instigated through Dagit now make use of the RunLauncher configured on the Dagster instance, if one is configured. Additionally, run launchers can now support termination of previously launched runs. If you have written your own run launcher, you may want to update it to support termination. Note also that as of 0.7.9, the semantics of RunLauncher.launch_run have changed; this method now takes the run_id of an existing run and should no longer attempt to create the run in the instance.

  • Flexible reexecution Pipeline re-execution from Dagit is now fully flexible. You may re-execute arbitrary subsets of a pipeline's execution steps, and the re-execution now appears in the interface as a child run of the original execution.

  • Support for historical runs Snapshots of pipelines and other Dagster objects are now persisted along with pipeline runs, so that historial runs can be loaded for review with the correct execution plans even when pipeline code has changed. This prepares the system to be able to diff pipeline runs and other objects against each other.

  • Step launchers and expanded support for PySpark on EMR and Databricks We've introduced a new StepLauncher abstraction that uses the resource system to allow individual execution steps to be run in separate processes (and thus on separate execution substrates). This has made extensive improvements to our PySpark support possible, including the option to execute individual PySpark steps on EMR using the EmrPySparkStepLauncher and on Databricks using the DatabricksPySparkStepLauncher The emr_pyspark example demonstrates how to use a step launcher.

  • Clearer names What was previously known as the environment dictionary is now called the run_config, and the previous environment_dict argument to APIs such as execute_pipeline is now deprecated. We renamed this argument to focus attention on the configuration of the run being launched or executed, rather than on an ambiguous "environment". We've also renamed the config argument to all use definitions to be config_schema, which should reduce ambiguity between the configuration schema and the value being passed in some particular case. We've also consolidated and improved documentation of the valid types for a config schema.

  • Lakehouse We're pleased to introduce Lakehouse, an experimental, alternative programming model for data applications, built on top of Dagster core. Lakehouse allows developers to define data applications in terms of data assets, such as database tables or ML models, rather than in terms of the computations that produce those assets. The simple_lakehouse example gives a taste of what it's like to program in Lakehouse. We'd love feedback on whether this model is helpful!

  • Airflow ingest We've expanded the tooling available to teams with existing Airflow installations that are interested in incrementally adopting Dagster. Previously, we provided only injection tools that allowed developers to write Dagster pipelines and then compile them into Airflow DAGs for execution. We've now added ingestion tools that allow teams to move to Dagster for execution without having to rewrite all of their legacy pipelines in Dagster. In this approach, Airflow DAGs are kept in their own container/environment, compiled into Dagster pipelines, and run via the Dagster orchestrator. See the airflow_ingest example for details!

Breaking Changes

  • dagster

    • The @scheduler and @repository_partitions decorators have been removed. Instances of ScheduleDefinition and PartitionSetDefinition belonging to a repository should be specified using the @repository decorator instead.

    • Support for the Dagster solid selection DSL, previously introduced in Dagit, is now uniform throughout the Python codebase, with the previous solid_subset arguments (--solid-subset in the CLI) being replaced by solid_selection (--solid-selection). In addition to the names of individual solids, this argument now supports selection queries like *solid_name++ (i.e., solid_name, all of its ancestors, its immediate descendants, and their immediate descendants).

    • The built-in Dagster type Path has been removed.

    • PartitionSetDefinition names, including those defined by a PartitionScheduleDefinition, must now be unique within a single repository.

    • Asset keys are now sanitized for non-alphanumeric characters. All characters besides alphanumerics and _ are treated as path delimiters. Asset keys can also be specified using AssetKey, which accepts a list of strings as an explicit path. If you are running 0.7.10 or later and using assets, you may need to migrate your historical event log data for asset keys from previous runs to be attributed correctly. This event_log data migration can be invoked as follows:

      from dagster.core.storage.event_log.migration import migrate_event_log_data
      from dagster import DagsterInstance
      
      migrate_event_log_data(instance=DagsterInstance.get())
      
    • The interface of the Scheduler base class has changed substantially. If you've written a custom scheduler, please get in touch!

    • The partitioned schedule decorators now generate PartitionSetDefinition names using the schedule name, suffixed with _partitions.

    • The repository property on ScheduleExecutionContext is no longer available. If you were using this property to pass to Scheduler instance methods, this interface has changed significantly. Please see the Scheduler class documentation for details.

    • The CLI option --celery-base-priority is no longer available for the command: dagster pipeline backfill. Use the tags option to specify the celery priority, (e.g. dagster pipeline backfill my_pipeline --tags '{ "dagster-celery/run_priority": 3 }'

    • The execute_partition_set API has been removed.

    • The deprecated is_optional parameter to Field and OutputDefinition has been removed. Use is_required instead.

    • The deprecated runtime_type property on InputDefinition and OutputDefinition has been removed. Use dagster_type instead.

    • The deprecated has_runtime_type, runtime_type_named, and all_runtime_types methods on PipelineDefinition have been removed. Use has_dagster_type, dagster_type_named, and all_dagster_types instead.

    • The deprecated all_runtime_types method on SolidDefinition and CompositeSolidDefinition has been removed. Use all_dagster_types instead.

    • The deprecated metadata argument to SolidDefinition and @solid has been removed. Use tags instead.

    • The graphviz-based DAG visualization in Dagster core has been removed. Please use Dagit!

  • dagit

    • dagit-cli has been removed, and dagit is now the only console entrypoint.
  • dagster-aws

    • The AWS CLI has been removed.
    • dagster_aws.EmrRunJobFlowSolidDefinition has been removed.
  • dagster-bash

    • This package has been renamed to dagster-shell. Thebash_command_solid and bash_script_solid solid factory functions have been renamed to create_shell_command_solid and create_shell_script_solid.
  • dagster-celery

    • The CLI option --celery-base-priority is no longer available for the command: dagster pipeline backfill. Use the tags option to specify the celery priority, (e.g. dagster pipeline backfill my_pipeline --tags '{ "dagster-celery/run_priority": 3 }'
  • dagster-dask

    • The config schema for the dagster_dask.dask_executor has changed. The previous config should now be nested under the key local.
  • dagster-gcp

    • The BigQueryClient has been removed. Use bigquery_resource instead.
  • dagster-dbt

    • The dagster-dbt package has been removed. This was inadequate as a reference integration, and will be replaced in 0.8.x.
  • dagster-spark

    • dagster_spark.SparkSolidDefinition has been removed - use create_spark_solid instead.
    • The SparkRDD Dagster type, which only worked with an in-memory engine, has been removed.
  • dagster-twilio

    • The TwilioClient has been removed. Use twilio_resource instead.

New

  • dagster

    • You may now set asset_key on any Materialization to use the new asset system. You will also need to configure an asset-aware storage, such as Postgres. The longitudinal_pipeline example demonstrates this system.
    • The partitioned schedule decorators now support an optional end_time.
    • Opt-in telemetry now reports the Python version being used.
  • dagit

    • Dagit's GraphQL playground is now available at /graphiql as well as at /graphql.
  • dagster-aws

    • The dagster_aws.S3ComputeLogManager may now be configured to override the S3 endpoint and associated SSL settings.
    • Config string and integer values in the S3 tooling may now be set using either environment variables or literals.
  • dagster-azure

    • We've added the dagster-azure package, with support for Azure Data Lake Storage Gen2; you can use the adls2_system_storage or, for direct access, the adls2_resource resource. (Thanks @sd2k!)
  • dagster-dask

    • Dask clusters are now supported by dagster_dask.dask_executor. For full support, you will need to install extras with pip install dagster-dask[yarn, pbs, kube]. (Thanks @DavidKatz-il!)
  • dagster-databricks

    • We've added the dagster-databricks package, with support for running PySpark steps on Databricks clusters through the databricks_pyspark_step_launcher. (Thanks @sd2k!)
  • dagster-gcp

    • Config string and integer values in the BigQuery, Dataproc, and GCS tooling may now be set using either environment variables or literals.
  • dagster-k8s

    • Added the CeleryK8sRunLauncher to submit execution plan steps to Celery task queues for execution as k8s Jobs.
    • Added the ability to specify resource limits on a per-pipeline and per-step basis for k8s Jobs.
    • Many improvements and bug fixes to the dagster-k8s Helm chart.
  • dagster-pandas

    • Config string and integer values in the dagster-pandas input and output schemas may now be set using either environment variables or literals.
  • dagster-papertrail

    • Config string and integer values in the papertrail_logger may now be set using either environment variables or literals.
  • dagster-pyspark

    • PySpark solids can now run on EMR, using the emr_pyspark_step_launcher, or on Databricks using the new dagster-databricks package. The emr_pyspark example demonstrates how to use a step launcher.
  • dagster-snowflake

    • Config string and integer values in the snowflake_resource may now be set using either environment variables or literals.
  • dagster-spark

    • dagster_spark.create_spark_solid now accepts a required_resource_keys argument, which enables setting up a step launcher for Spark solids, like the emr_pyspark_step_launcher.

Bugfix

  • dagster pipeline execute now sets a non-zero exit code when pipeline execution fails.

0.7.16#

Bugfix

  • Enabled NoOpComputeLogManager to be configured as the compute_logs implementation in dagster.yaml
  • Suppressed noisy error messages in logs from skipped steps

0.7.15#

New

  • Improve dagster scheduler state reconciliation.

0.7.14#

New

  • Dagit now allows re-executing arbitrary step subset via step selector syntax, regardless of whether the previous pipeline failed or not.
  • Added a search filter for the root Assets page
  • Adds tooltip explanations for disabled run actions
  • The last output of the cron job command created by the scheduler is now stored in a file. A new dagster schedule logs {schedule_name} command will show the log file for a given schedule. This helps uncover errors like missing environment variables and import errors.
  • The Dagit schedule page will now show inconsistency errors between schedule state and the cron tab that were previously only displayed by the dagster schedule debug command. As before, these errors can be resolve using dagster schedule up

Bugfix

  • Fixes an issue with config schema validation on Arrays
  • Fixes an issue with initializing K8sRunLauncher when configured via dagster.yaml
  • Fixes a race condition in Airflow injection logic that happens when multiple Operators try to create PipelineRun entries simultaneously.
  • Fixed an issue with schedules that had invalid config not logging the appropriate error.