Fixed a bug with load_assets_from_x functions where we began erroring when a spec and AssetsDefinition had the same key in a given module. We now only error in this case if include_specs=True.
[dagster-azure] Fixed a bug in 1.9.6 and 1.9.7 where the default behavior of the compute log manager switched from showing logs in the UI to showing a URL. You can toggle the show_url_only option to True to enable the URL showing behavior.
[dagster-dbt] Fixed an issue where group names set on partitioned dbt assets created using the @dbt_assets decorator would be ignored
If you are launching runs using DagsterInstance.launch_run, this method now takes a run id
instead of an instance of PipelineRun. Additionally, DagsterInstance.create_run and
DagsterInstance.create_empty_run have been replaced by DagsterInstance.get_or_create_run and
DagsterInstance.create_run_for_pipeline.
If you have implemented your own RunLauncher, there are two required changes:
RunLauncher.launch_run takes a pipeline run that has already been created. You should remove
any calls to instance.create_run in this method.
Instead of calling startPipelineExecution (defined in the
dagster_graphql.client.query.START_PIPELINE_EXECUTION_MUTATION) in the run launcher, you
should call startPipelineExecutionForCreatedRun (defined in
dagster_graphql.client.query.START_PIPELINE_EXECUTION_FOR_CREATED_RUN_MUTATION).
Refer to the RemoteDagitRunLauncher for an example implementation.
New
Improvements to preset and solid subselection in the playground. An inline preview of the pipeline
instead of a modal when doing subselection, and the correct subselection is chosen when selecting
a preset.
Improvements to the log searching. Tokenization and autocompletion for searching messages types
and for specific steps.
You can now view the structure of pipelines from historical runs, even if that pipeline no longer
exists in the loaded repository or has changed structure.
Historical execution plans are now viewable, even if the pipeline has changed structure.
Added metadata link to raw compute logs for all StepStart events in PipelineRun view and Step
view.
Improved error handling for the scheduler. If a scheduled run has config errors, the errors are
persisted to the event log for the run and can be viewed in Dagit.
Bugfix
No longer manually dispose sqlalchemy engine in dagster-postgres
Made boto3 dependency in dagster-aws more flexible (#2418)
Fixed tooltip UI cleanup in partitioned schedule view
The execute_pipeline_with_mode and execute_pipeline_with_preset APIs have been dropped in
favor of new top level arguments to execute_pipeline, mode and preset.
The use of RunConfig to pass options to execute_pipeline has been deprecated, and RunConfig
will be removed in 0.8.0.
The execute_solid_within_pipeline and execute_solids_within_pipeline APIs, intended to support
tests, now take new top level arguments mode and preset.
New
The dagster-aws Redshift resource now supports providing an error callback to debug failed
queries.
We now persist serialized execution plans for historical runs. They will render correctly even if
the pipeline structure has changed or if it does not exist in the current loaded repository.
Clicking on a pipeline tag in the Runs view will apply that tag as a filter.
Bugfix
Fixed a bug where telemetry logger would create a log file (but not write any logs) even when
telemetry was disabled.
Experimental
The dagster-airflow package supports ingesting Airflow dags and running them as dagster pipelines
(see: make_dagster_pipeline_from_airflow_dag). This is in the early experimentation phase.
Improved the layout of the experimental partition runs table on the Schedules detailed view.
The default sqlite and dagster-postgres implementations have been altered to extract the
event step_key field as a column, to enable faster per-step queries. You will need to run
dagster instance migrate to update the schema. You may optionally migrate your historical event
log data to extract the step_key using the migrate_event_log_data function. This will ensure
that your historical event log data will be captured in future step-key based views. This
event_log data migration can be invoked as follows:
from dagster.core.storage.event_log.migration import migrate_event_log_data
from dagster import DagsterInstance
migrate_event_log_data(instance=DagsterInstance.get())
We have made pipeline metadata serializable and persist that along with run information.
While there are no user-facing features to leverage this yet, it does require an instance
migration. Run dagster instance migrate. If you have already run the migration for the
event_log changes above, you do not need to run it again. Any unforeseen errors related to the
new snapshot_id in the runs table or the new snapshots table are related to this migration.
dagster-pandas ColumnTypeConstraint has been removed in favor of ColumnDTypeFnConstraint and
ColumnDTypeInSetConstraint.
New
You can now specify that dagstermill output notebooks be yielded as an output from dagstermill
solids, in addition to being materialized.
You may now set the extension on files created using the FileManager machinery.
dagster-pandas typed PandasColumn constructors now support pandas 1.0 dtypes.
The Dagit Playground has been restructured to make the relationship between Preset, Partition
Sets, Modes, and subsets more clear. All of these buttons have be reconciled and moved to the
left side of the Playground.
Config sections that are required but not filled out in the Dagit playground are now detected
and labeled in orange.
dagster-celery config now support using env: to load from environment variables.
Bugfix
Fixed a bug where selecting a preset in dagit would not populate tags specified on the pipeline
definition.
Fixed a bug where metadata attached to a raised Failure was not displayed in the error modal in
dagit.
Fixed an issue where reimporting dagstermill and calling dagstermill.get_context() outside of
the parameters cell of a dagstermill notebook could lead to unexpected behavior.
Fixed an issue with connection pooling in dagster-postgres, improving responsiveness when using
the Postgres-backed storages.
Experimental
Added a longitudinal view of runs for on the Schedule tab for scheduled, partitioned pipelines.
Includes views of run status, execution time, and materializations across partitions. The UI is
in flux and is currently optimized for daily schedules, but feedback is welcome.
default_value in Field no longer accepts native instances of python enums. Instead
the underlying string representation in the config system must be used.
default_value in Field no longer accepts callables.
The dagster_aws imports have been reorganized; you should now import resources from
dagster_aws.<AWS service name>. dagster_aws provides s3, emr, redshift, and cloudwatch
modules.
The dagster_aws S3 resource no longer attempts to model the underlying boto3 API, and you can
now just use any boto3 S3 API directly on a S3 resource, e.g.
context.resources.s3.list_objects_v2. (#2292)
New
New Playground view in dagit showing an interactive config map
Improved storage and UI for showing schedule attempts
Added the ability to set default values in InputDefinition
Added CLI command dagster pipeline launch to launch runs using a configured RunLauncher
Added ability to specify pipeline run tags using the CLI
Added a pdb utility to SolidExecutionContext to help with debugging, available within a solid
as context.pdb
Added PresetDefinition.with_additional_config to allow for config overrides
Added resource name to log messages generated during resource initialization
Added grouping tags for runs that have been retried / reexecuted.
Bugfix
Fixed a bug where date range partitions with a specified end date was clipping the last day
Fixed an issue where some schedule attempts that failed to start would be marked running forever.
Fixed the @weekly partitioned schedule decorator
Fixed timezone inconsistencies between the runs view and the schedules view
Integers are now accepted as valid values for Float config fields
Fixed an issue when executing dagstermill solids with config that contained quote characters.
dagstermill
The Jupyter kernel to use may now be specified when creating dagster notebooks with the --kernel
flag.
dagster-dbt
dbt_solid now has a Nothing input to allow for sequencing
dagster-k8s
Added get_celery_engine_config to select celery engine, leveraging Celery infrastructure
Documentation
Improvements to the airline and bay bikes demos
Improvements to our dask deployment docs (Thanks jswaney!!)
Added the IntSource type, which lets integers be set from environment variables in config.
You may now set tags on pipeline definitions. These will resolve in the following cases:
Loading in the playground view in Dagit will pre-populate the tag container.
Loading partition sets from the preset/config picker will pre-populate the tag container with
the union of pipeline tags and partition tags, with partition tags taking precedence.
Executing from the CLI will generate runs with the pipeline tags.
Executing programmatically using the execute_pipeline api will create a run with the union
of pipeline tags and RunConfig tags, with RunConfig tags taking precedence.
Scheduled runs (both launched and executed) will have the union of pipeline tags and the
schedule tags function, with the schedule tags taking precedence.
Output materialization configs may now yield multiple Materializations, and the tutorial has
been updated to reflect this.
We now export the SolidExecutionContext in the public API so that users can correctly type hint
solid compute functions.
Dagit
Pipeline run tags are now preserved when resuming/retrying from Dagit.
Scheduled run stats are now grouped by partition.
A "preparing" section has been added to the execution viewer. This shows steps that are in
progress of starting execution.
Markers emitted by the underlying execution engines are now visualized in the Dagit execution
timeline.
Bugfix
Resume/retry now works as expected in the presence of solids that yield optional outputs.
Fixed an issue where dagster-celery workers were failing to start in the presence of config
values that were None.
Fixed an issue with attempting to set threads_per_worker on Dask distributed clusters.
dagster-postgres
All postgres config may now be set using environment variables in config.
dagster-aws
The s3_resource now exposes a list_objects_v2 method corresponding to the underlying boto3
API. (Thanks, @basilvetas!)
Added the redshift_resource to access Redshift databases.
dagster-k8s
The K8sRunLauncher config now includes the load_kubeconfig and kubeconfig_file options.
Documentation
Fixes and improvements.
Dependencies
dagster-airflow no longer pins its werkzeug dependency.
Community
We've added opt-in telemetry to Dagster so we can collect usage statistics in order to inform
development priorities. Telemetry data will motivate projects such as adding features in
frequently-used parts of the CLI and adding more examples in the docs in areas where users
encounter more errors.
We will not see or store solid definitions (including generated context) or pipeline definitions
(including modes and resources). We will not see or store any data that is processed within solids
and pipelines.
If you'd like to opt in to telemetry, please add the following to $DAGSTER_HOME/dagster.yaml:
telemetry:
enabled: true
Thanks to @basilvetas and @hspak for their contributions!