Fixed a bug with load_assets_from_x functions where we began erroring when a spec and AssetsDefinition had the same key in a given module. We now only error in this case if include_specs=True.
[dagster-azure] Fixed a bug in 1.9.6 and 1.9.7 where the default behavior of the compute log manager switched from showing logs in the UI to showing a URL. You can toggle the show_url_only option to True to enable the URL showing behavior.
[dagster-dbt] Fixed an issue where group names set on partitioned dbt assets created using the @dbt_assets decorator would be ignored
It is now possible to use Postgres to back schedule storage by configuring
dagster_postgres.PostgresScheduleStorage on the instance.
Added the execute_pipeline_with_mode API to allow executing a pipeline in test with a specific
mode without having to specify RunConfig.
Experimental support for retries in the Celery executor.
It is now possible to set run-level priorities for backfills run using the Celery executor by
passing --celery-base-priority to dagster pipeline backfill.
Added the @weekly schedule decorator.
Deprecations
The dagster-ge library has been removed from this release due to drift from the underlying
Great Expectations implementation.
dagster-pandas
PandasColumn now includes an is_optional flag, replacing the previous
ColumnExistsConstraint.
You can now pass the ignore_missing_values flag to PandasColumn in order to apply column
constraints only to the non-missing rows in a column.
dagster-k8s
The Helm chart now includes provision for an Ingress and for multiple Celery queues.
It is now possible to configure a Dagit instance to disable executing pipeline runs in a local
subprocess.
Resource initialization, teardown, and associated failure states now emit structured events
visible in Dagit. Structured events for pipeline errors and multiprocess execution have been
consolidated and rationalized.
Support Redis queue provider in dagster-k8s Helm chart.
Support external postgresql in dagster-k8s Helm chart.
Bugfix
Fixed an issue with inaccurate timings on some resource initializations.
Fixed an issue that could cause the multiprocess engine to spin forever.
Fixed an issue with default value resolution when a config value was set using SourceString.
Fixed an issue when loading logs from a pipeline belonging to a different repository in Dagit.
Fixed an issue with where the CLI command dagster schedule up would fail in certain scenarios
with the SystemCronScheduler.
Pandas
Column constraints can now be configured to permit NaN values.
Dagstermill
Removed a spurious dependency on sklearn.
Docs
Improvements and fixes to docs.
Restored dagster.readthedocs.io.
Experimental
An initial implementation of solid retries, throwing a RetryRequested exception, was added.
This API is experimental and likely to change.
Other
Renamed property runtime_type to dagster_type in definitions. The following are deprecated
and will be removed in a future version.
InputDefinition.runtime_type is deprecated. Use InputDefinition.dagster_type instead.
OutputDefinition.runtime_type is deprecated. Use OutputDefinition.dagster_type instead.
CompositeSolidDefinition.all_runtime_types is deprecated. Use
CompositeSolidDefinition.all_dagster_types instead.
SolidDefinition.all_runtime_types is deprecated. Use SolidDefinition.all_dagster_types
instead.
PipelineDefinition.has_runtime_type is deprecated. Use PipelineDefinition.has_dagster_type
instead.
PipelineDefinition.runtime_type_named is deprecated. Use
PipelineDefinition.dagster_type_named instead.
PipelineDefinition.all_runtime_types is deprecated. Use
PipelineDefinition.all_dagster_types instead.
dagster.readthedocs.io is currently stale due to availability issues.
New
Improvements to S3 Resource. (Thanks @dwallace0723!)
Better error messages in Dagit.
Better font/styling support in Dagit.
Changed OutputDefinition to take is_required rather than is_optional argument. This is to
remain consistent with changes to Field in 0.7.1 and to avoid confusion
with python's typing and dagster's definition of Optional, which indicates None-ability,
rather than existence. is_optional is deprecated and will be removed in a future version.
Added support for Flower in dagster-k8s.
Added support for environment variable config in dagster-snowflake.
Bugfixes
Improved performance in Dagit waterfall view.
Fixed bug when executing solids downstream of a skipped solid.
Improved navigation experience for pipelines in Dagit.
Fixed for the dagster-aws CLI tool.
Fixed issue starting Dagit without DAGSTER_HOME set on windows.
Fixed pipeline subset execution in partition-based schedules.
There are a substantial number of breaking changes in the 0.7.0 release.
Please see 070_MIGRATION.md for instructions regarding migrating old code.
Scheduler
The scheduler configuration has been moved from the @schedules decorator to DagsterInstance.
Existing schedules that have been running are no longer compatible with current storage. To
migrate, remove the scheduler argument on all @schedules decorators:
Finally, if you had any existing schedules running, delete the existing $DAGSTER_HOME/schedules
directory and run dagster schedule wipe && dagster schedule up to re-instatiate schedules in a
valid state.
The should_execute and environment_dict_fn argument to ScheduleDefinition now have a
required first argument context, representing the ScheduleExecutionContext
Config System Changes
In the config system, Dict has been renamed to Shape; List to Array; Optional to
Noneable; and PermissiveDict to Permissive. The motivation here is to clearly delineate
config use cases versus cases where you are using types as the inputs and outputs of solids as
well as python typing types (for mypy and friends). We believe this will be clearer to users in
addition to simplifying our own implementation and internal abstractions.
Our recommended fix is not to use Shape and Array, but instead to use our new condensed
config specification API. This allow one to use bare dictionaries instead of Shape, lists with
one member instead of Array, bare types instead of Field with a single argument, and python
primitive types (int, bool etc) instead of the dagster equivalents. These result in
dramatically less verbose config specs in most cases.
So instead of
from dagster import Shape, Field, Int, Array, String
# ... code
config=Shape({ # Dict prior to change
'some_int' : Field(Int),
'some_list: Field(Array[String]) # List prior to change
})
one can instead write:
config={'some_int': int, 'some_list': [str]}
No imports and much simpler, cleaner syntax.
config_field is no longer a valid argument on solid, SolidDefinition, ExecutorDefintion,
executor, LoggerDefinition, logger, ResourceDefinition, resource, system_storage, and
SystemStorageDefinition. Use config instead.
For composite solids, the config_fn no longer takes a ConfigMappingContext, and the context
has been deleted. To upgrade, remove the first argument to config_fn.
Field takes a is_required rather than a is_optional argument. This is to avoid confusion
with python's typing and dagster's definition of Optional, which indicates None-ability,
rather than existence. is_optional is deprecated and will be removed in a future version.
Required Resources
All solids, types, and config functions that use a resource must explicitly list that
resource using the argument required_resource_keys. This is to enable efficient
resource management during pipeline execution, especially in a multiprocessing or
remote execution environment.
The @system_storage decorator now requires argument required_resource_keys, which was
previously optional.
Dagster Type System Changes
dagster.Set and dagster.Tuple can no longer be used within the config system.
Dagster types are now instances of DagsterType, rather than a class than inherits from
RuntimeType. Instead of dynamically generating a class to create a custom runtime type, just
create an instance of a DagsterType. The type checking function is now an argument to the
DagsterType, rather than an abstract method that has to be implemented in
a subclass.
RuntimeType has been renamed to DagsterType is now an encouraged API for type creation.
Core type check function of DagsterType can now return a naked bool in addition
to a TypeCheck object.
type_check_fn on DagsterType (formerly type_check and RuntimeType, respectively) now
takes a first argument context of type TypeCheckContext in addition to the second argument of
value.
define_python_dagster_type has been eliminated in favor of PythonObjectDagsterType .
dagster_type has been renamed to usable_as_dagster_type.
as_dagster_type has been removed and similar capabilities added as
make_python_type_usable_as_dagster_type.
PythonObjectDagsterType and usable_as_dagster_type no longer take a type_check argument. If
a custom type_check is needed, use DagsterType.
As a consequence of these changes, if you were previously using dagster_pyspark or
dagster_pandas and expecting Pyspark or Pandas types to work as Dagster types, e.g., in type
annotations to functions decorated with @solid to indicate that they are input or output types
for a solid, you will need to call make_python_type_usable_as_dagster_type from your code in
order to map the Python types to the Dagster types, or just use the Dagster types
(dagster_pandas.DataFrame instead of pandas.DataFrame) directly.
Other
We no longer publish base Docker images. Please see the updated deployment docs for an example
Dockerfile off of which you can work.
step_metadata_fn has been removed from SolidDefinition & @solid.
SolidDefinition & @solid now takes tags and enforces that values are strings or
are safely encoded as JSON. metadata is deprecated and will be removed in a future version.
resource_mapper_fn has been removed from SolidInvocation.
New
Dagit now includes a much richer execution view, with a Gantt-style visualization of step
execution and a live timeline.
Early support for Python 3.8 is now available, and Dagster/Dagit along with many of our libraries
are now tested against 3.8. Note that several of our upstream dependencies have yet to publish
wheels for 3.8 on all platforms, so running on Python 3.8 likely still involves building some
dependencies from source.
dagster/priority tags can now be used to prioritize the order of execution for the built-in
in-process and multiprocess engines.
dagster-postgres storages can now be configured with separate arguments and environment
variables, such as:
run_storage:
module: dagster_postgres.run_storage
class: PostgresRunStorage
config:
postgres_db:
username: test
password:
env: ENV_VAR_FOR_PG_PASSWORD
hostname: localhost
db_name: test
Support for RunLaunchers on DagsterInstance allows for execution to be "launched" outside of
the Dagit/Dagster process. As one example, this is used by dagster-k8s to submit pipeline
execution as a Kubernetes Job.
Added support for adding tags to runs initiated from the Playground view in Dagit.
Added @monthly_schedule decorator.
Added Enum.from_python_enum helper to wrap Python enums for config. (Thanks @kdungs!)
[dagster-bash] The Dagster bash solid factory now passes along kwargs to the underlying
solid construction, and now has a single Nothing input by default to make it easier to create a
sequencing dependency. Also, logs are now buffered by default to make execution less noisy.
[dagster-aws] We've improved our EMR support substantially in this release. The
dagster_aws.emr library now provides an EmrJobRunner with various utilities for creating EMR
clusters, submitting jobs, and waiting for jobs/logs. We also now provide a
emr_pyspark_resource, which together with the new @pyspark_solid decorator makes moving
pyspark execution from your laptop to EMR as simple as changing modes.
[dagster-pandas] Added create_dagster_pandas_dataframe_type, PandasColumn, and
Constraint API's in order for users to create custom types which perform column validation,
dataframe validation, summary statistics emission, and dataframe serialization/deserialization.
[dagster-gcp] GCS is now supported for system storage, as well as being supported with the
Dask executor. (Thanks @habibutsu!) Bigquery solids have also been updated to support the new API.
Bugfix
Ensured that all implementations of RunStorage clean up pipeline run tags when a run
is deleted. Requires a storage migration, using dagster instance migrate.
The multiprocess and Celery engines now handle solid subsets correctly.
The multiprocess and Celery engines will now correctly emit skip events for steps downstream of
failures and other skips.
The @solid and @lambda_solid decorators now correctly wrap their decorated functions, in the
sense of functools.wraps.
Performance improvements in Dagit when working with runs with large configurations.
The Helm chart in dagster_k8s has been hardened against various failure modes and is now
compatible with Helm 2.
SQLite run and event log storages are more robust to concurrent use.
Improvements to error messages and to handling of user code errors in input hydration and output
materialization logic.
Fixed an issue where the Airflow scheduler could hang when attempting to load dagster-airflow
pipelines.
We now handle our SQLAlchemy connections in a more canonical way (thanks @zzztimbo!).
Fixed an issue using S3 system storage with certain custom serialization strategies.
Fixed an issue leaking orphan processes from compute logging.
Fixed an issue leaking semaphores from Dagit.
Setting the raise_error flag in execute_pipeline now actually raises user exceptions instead
of a wrapper type.
Documentation
Our docs have been reorganized and expanded (thanks @habibutsu, @vatervonacht, @zzztimbo). We'd
love feedback and contributions!
Thank you
Thank you to all of the community contributors to this release!! In alphabetical order: @habibutsu,
@kdungs, @vatervonacht, @zzztimbo.