You are viewing an unreleased or outdated version of the documentation

Changelog#

1.6.6 (core) / 0.22.6 (libraries)#

New#

  • Dagster officially supports Python 3.12.
  • dagster-polars has been added as an integration. Thanks @danielgafni!
  • [dagster-dbt] @dbt_assets now supports loading projects with semantic models.
  • [dagster-dbt] @dbt_assets now supports loading projects with model versions.
  • [dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
  • [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
  • [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.

Bugfixes#

  • Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
  • Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
  • [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
  • [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
  • [dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!

Experimental#

  • Observable source assets can now yield ObserveResults with no data_version.
  • You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
  • [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.

Documentation#

  • Updated docs to reflect newly-added support for Python 3.12.

Dagster Cloud#

  • [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.

1.6.5 (core) / 0.22.5 (libraries)#

New#

  • Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
  • [dagster-k8s] Include k8s pod debug info in run worker failure messages.
  • [dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.

Bugfixes#

  • A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
  • [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
  • [instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
  • [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
  • [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.

Experimental#

  • @observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
  • [auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
  • [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.

Documentation#

  • Fixed an error in our asset checks docs. Thanks @vaharoni!
  • Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
  • Fixed an issue on the Hello Dagster! guide that prevented it from loading.
  • Add specific capabilities of the Airflow integration to the Airflow integration page.
  • Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.

0.14.11#

Bugfixes#

  • Fixed an issue where schedules created from partition sets that launched runs for multiple partitions in a single schedule tick would sometimes time out while generating runs in the scheduler.
  • Fixed an issue where nested graphs would sometimes incorrectly determine the set of required resources for a hook.

0.14.10#

New#

  • [dagster-k8s] Added an includeConfigInLaunchedRuns flag to the Helm chart that can be used to automatically include configmaps, secrets, and volumes in any runs launched from code in a user code deployment. See https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm#configure-your-user-deployment for more information.
  • [dagit] Improved display of configuration yaml throughout Dagit, including better syntax highlighting and the addition of line numbers.
  • The GraphQL input argument type BackfillParams (used for launching backfills), now has an allPartitions boolean flag, which can be used instead of specifying all the individual partition names.
  • Removed gevent and gevent-websocket dependencies from dagster-graphql
  • Memoization is now supported while using step selection
  • Cleaned up various warnings across the project
  • The default IO Managers now support asset partitions

Bugfixes#

  • Fixed sqlite3.OperationalError error when viewing schedules/sensors pages in Dagit. This was affecting dagit instances using the default SQLite schedule storage with a SQLite version < 3.25.0.

  • Fixed an issues where schedules and sensors would sometimes fail to run when the daemon and dagit were running in different Python environments.

  • Fixed an exception when the telemetry file is empty

  • fixed a bug with @graph composition which would cause the wrong input definition to be used for type checks

  • [dagit] For users running Dagit with --path-prefix, large DAGs failed to render due to a WebWorker error, and the user would see an endless spinner instead. This has been fixed.

  • [dagit] Fixed a rendering bug in partition set selector dropdown on Launchpad.

  • [dagit] Fixed the ‘View Assets’ link in Job headers

  • Fixed an issue where root input managers with resource dependencies would not work with software defined assets

Community Contributions#

  • dagster-census is a new library that includes a census_resource for interacting the Census REST API, census_trigger_sync_op for triggering a sync and registering an asset once it has finished, and a CensusOutput type. Thanks @dehume!
  • Docs fix. Thanks @ascrookes!

0.14.9#

New#

  • Added a parameter in dagster.yaml that can be used to increase the time that Dagster waits when spinning up a gRPC server before timing out. For more information, see https://docs.dagster.io/deployment/dagster-instance#code-servers.
  • Added a new graphQL field assetMaterializations that can be queried off of a DagsterRun field. You can use this field to fetch the set of asset materialization events generated in a given run within a GraphQL query.
  • Docstrings on functions decorated with the @resource decorator will now be used as resource descriptions, if no description is explicitly provided.
  • You can now point dagit -m or dagit -f at a module or file that has asset definitions but no jobs or asset groups, and all the asset definitions will be loaded into Dagit.
  • AssetGroup now has a materialize method which executes an in-process run to materialize all the assets in the group.
  • AssetGroups can now contain assets with different partition_defs.
  • Asset materializations produced by the default asset IO manager, fs_asset_io_manager, now include the path of the file where the values were saved.
  • You can now disable the max_concurrent_runs limit on the QueuedRunCoordinator by setting it to -1. Use this if you only want to limit runs using tag_concurrency_limits.
  • [dagit] Asset graphs are now rendered asynchronously, which means that Dagit will no longer freeze when rendering a large asset graph.
  • [dagit] When viewing an asset graph, you can now double-click on an asset to zoom in, and you can use arrow keys to navigate between selected assets.
  • [dagit] The “show whitespace” setting in the Launchpad is now persistent.
  • [dagit] A bulk selection checkbox has been added to the repository filter in navigation or Instance Overview.
  • [dagit] A “Copy config” button has been added to the run configuration dialog on Run pages.
  • [dagit] An “Open in Launchpad” button has been added to the run details page.
  • [dagit] The Run page now surfaces more information about start time and elapsed time in the header.
  • [dagster-dbt] The dbt_cloud_resource has a new get_runs() function to get a list of runs matching certain paramters from the dbt Cloud API (thanks @kstennettlull!)
  • [dagster-snowflake] Added an authenticator field to the connection arguments for the snowflake_resource (thanks @swotai!).
  • [celery-docker] The celery docker executor has a new configuration entry container_kwargs that allows you to specify additional arguments to pass to your docker containers when they are run.

Bugfixes#

  • Fixed an issue where loading a Dagster repository would fail if it included a function to lazily load a job, instead of a JobDefinition.
  • Fixed an issue where trying to stop an unloadable schedule or sensor within Dagit would fail with an error.
  • Fixed telemetry contention bug on windows when running the daemon.
  • [dagit] Fixed a bug where the Dagit homepage would claim that no jobs or pipelines had been loaded, even though jobs appeared in the sidebar.
  • [dagit] When filtering runs by tag, tag values that contained the : character would fail to parse correctly, and filtering would therefore fail. This has been fixed.
  • [dagster-dbt] When running the “build” command using the dbt_cli_resource, the run_results.json file will no longer be ignored, allowing asset materializations to be produced from the resulting output.
  • [dagster-airbyte] Responses from the Airbyte API with a 204 status code (like you would get from /connections/delete) will no longer produce raise an error (thanks @HAMZA310!)
  • [dagster-shell] Fixed a bug where shell ops would not inherit environment variables if any environment variables were added for ops (thanks @kbd!)
  • [dagster-postgres] usernames are now urlqouted in addition to passwords

Documentation#

0.14.8#

New#

  • The MySQL storage implementations for Dagster storage is no longer marked as experimental.
  • run_id can now be provided as an argument to execute_in_process.
  • The text on dagit’s empty state no longer mentions the legacy concept “Pipelines”.
  • Now, within the IOManager.load_input method, you can add input metadata via InputContext.add_input_metadata. These metadata entries will appear on the LOADED_INPUT event and if the input is an asset, be attached to an AssetObservation. This metadata is viewable in dagit.

Bugfixes#

  • Fixed a set of bugs where schedules and sensors would get out of sync between dagit and dagster-daemon processes. This would manifest in schedules / sensors getting marked as “Unloadable” in dagit, and ticks not being registered correctly. The fix involves changing how Dagster stores schedule/sensor state and requires a schema change using the CLI command dagster instance migrate. Users who are not running into this class of bugs may consider the migration optional.
  • root_input_manager can now be specified without a context argument.
  • Fixed a bug that prevented root_input_manager from being used with VersionStrategy.
  • Fixed a race condition between daemon and dagit writing to the same telemetry logs.
  • [dagit] In dagit, using the “Open in Launchpad” feature for a run could cause server errors if the run configuration yaml was too long. Runs can now be opened from this feature regardless of config length.
  • [dagit] On the Instance Overview page in dagit, runs in the timeline view sometimes showed incorrect end times, especially batches that included in-progress runs. This has been fixed.
  • [dagit] In the dagit launchpad, reloading a repository should present the user with an option to refresh config that may have become stale. This feature was broken for jobs without partition sets, and has now been fixed.
  • Fixed issue where passing a stdlib typing type as dagster_type to input and output definition was incorrectly being rejected.
  • [dagster-airbyte] Fixed issue where AssetMaterialization events would not be generated for streams that had no updated records for a given sync.
  • [dagster-dbt] Fixed issue where including multiple sets of dbt assets in a single repository could cause a conflict with the names of the underlying ops.

0.14.7#

New#

  • [helm] Added configuration to explicitly enable or disable telemetry.
  • Added a new IO manager for materializing assets to Azure ADLS. You can specify this IO manager for your AssetGroups by using the following config:
`from dagster import AssetGroup
from dagster_azure import adls2_pickle_asset_io_manager, adls2_resource
asset_group = AssetGroup(
    [upstream_asset, downstream_asset],
    resource_defs={"io_manager": adls2_pickle_asset_io_manager, "adls2": adls2_resource}
)`
  • Added ability to set a custom start time for partitions when using @hourly_partitioned_config , @daily_partitioned_config, @weekly_partitioned_config, and @monthly_partitioned_config
  • Run configs generated from partitions can be retrieved using the PartitionedConfig.get_run_config_for_partition_key function. This will allow the use of the validate_run_config function in unit tests.
  • [dagit] If a run is re-executed from failure, and the run fails again, the default action will be to re-execute from the point of failure, rather than to re-execute the entire job.
  • PartitionedConfig now takes an argument tags_for_partition_fn which allows for custom run tags for a given partition.

Bugfixes#

  • Fixed a bug in the message for reporting Kubernetes run worker failures
  • [dagit] Fixed issue where re-executing a run that materialized a single asset could end up re-executing all steps in the job.
  • [dagit] Fixed issue where the health of an asset’s partitions would not always be up to date in certain views.
  • [dagit] Fixed issue where the “Materialize All” button would be greyed out if a job had SourceAssets defined.

Documentation#

  • Updated resource docs to reference “ops” instead of “solids” (thanks @joe-hdai!)
  • Fixed formatting issues in the ECS docs