You are viewing an unreleased or outdated version of the documentation

Changelog#

1.6.6 (core) / 0.22.6 (libraries)#

New#

  • Dagster officially supports Python 3.12.
  • dagster-polars has been added as an integration. Thanks @danielgafni!
  • [dagster-dbt] @dbt_assets now supports loading projects with semantic models.
  • [dagster-dbt] @dbt_assets now supports loading projects with model versions.
  • [dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
  • [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
  • [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.

Bugfixes#

  • Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
  • Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
  • [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
  • [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
  • [dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!

Experimental#

  • Observable source assets can now yield ObserveResults with no data_version.
  • You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
  • [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.

Documentation#

  • Updated docs to reflect newly-added support for Python 3.12.

Dagster Cloud#

  • [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.

1.6.5 (core) / 0.22.5 (libraries)#

New#

  • Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
  • [dagster-k8s] Include k8s pod debug info in run worker failure messages.
  • [dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.

Bugfixes#

  • A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
  • [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
  • [instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
  • [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
  • [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.

Experimental#

  • @observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
  • [auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
  • [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.

Documentation#

  • Fixed an error in our asset checks docs. Thanks @vaharoni!
  • Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
  • Fixed an issue on the Hello Dagster! guide that prevented it from loading.
  • Add specific capabilities of the Airflow integration to the Airflow integration page.
  • Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.

1.0.8 (core) / 0.16.8 (libraries)#

New#

  • With the new cron_schedule argument to TimeWindowPartitionsDefinition, you can now supply arbitrary cron expressions to define time window-based partition sets.
  • Graph-backed assets can now be subsetted for execution via AssetsDefinition.from_graph(my_graph, can_subset=True).
  • RunsFilter is now exported in the public API.
  • [dagster-k8s] The dagster-user-deployments.deployments[].schedulerName Helm value for specifying custom Kubernetes schedulers will now also apply to run and step workers launched for the given user deployment. Previously it would only apply to the grpc server.

Bugfixes#

  • In some situations, default asset config was ignored when a subset of assets were selected for execution. This has been fixed.
  • Added a pin to grpcio in dagster to address an issue with the recent 0.48.1 grpcio release that was sometimes causing Dagster code servers to hang.
  • Fixed an issue where the “Latest run” column on the Instance Status page sometimes displayed an older run instead of the most recent run.

Community Contributions#

  • In addition to a single cron string, cron_schedule now also accepts a sequence of cron strings. If a sequence is provided, the schedule will run for the union of all execution times for the provided cron strings, e.g., ['45 23 * * 6', '30 9 * * 0] for a schedule that runs at 11:45 PM every Saturday and 9:30 AM every Sunday. Thanks @erinov1!
  • Added an optional boolean config install_default_libraries to databricks_pyspark_step_launcher . It allows to run Databricks jobs without installing the default Dagster libraries .Thanks @nvinhphuc!

Experimental#

  • [dagster-k8s] Added additional configuration fields (container_config, pod_template_spec_metadata, pod_spec_config, job_metadata, and job_spec_config) to the experimental k8s_job_op that can be used to add additional configuration to the Kubernetes pod that is launched within the op.

1.0.7 (core) / 0.16.7 (libraries)#

New#

  • Several updates to the Dagit run timeline view: your time window preference will now be preserved locally, there is a clearer “Now” label to delineate the current time, and upcoming scheduled ticks will no longer be batched with existing runs.
  • [dagster-k8s] ingress.labels is now available in the Helm chart. Any provided labels are appended to the default labels on each object (helm.sh/chart, app.kubernetes.io/version, and app.kubernetes.io/managed-by).
  • [dagster-dbt] Added support for two types of dbt nodes: metrics, and ephemeral models.
  • When constructing a GraphDefinition manually, InputMapping and OutputMapping objects should be directly constructed.

Bugfixes#

  • [dagster-snowflake] Pandas is no longer imported when dagster_snowflake is imported. Instead, it’s only imported when using functionality inside dagster-snowflake that depends on pandas.
  • Recent changes to run_status_sensors caused sensors that only monitored jobs in external repositories to also monitor all jobs in the current repository. This has been fixed.
  • Fixed an issue where "unhashable type" errors could be spawned from sensor executions.
  • [dagit] Clicking between assets in different repositories from asset groups and asset jobs now works as expected.
  • [dagit] The DAG rendering of composite ops with more than one input/output mapping has been fixed.
  • [dagit] Selecting a source asset in Dagit no longer produces a GraphQL error
  • [dagit] Viewing “Related Assets” for an asset run now shows the full set of assets included in the run, regardless of whether they were materialized successfully.
  • [dagit] The Asset Lineage view has been simplified and lets you know if the view is being clipped and more distant upstream/downstream assets exist.
  • Fixed erroneous experimental warnings being thrown when using with_resources alongside source assets.

Breaking Changes#

  • [dagit] The launchpad tab is no longer shown for Asset jobs. Asset jobs can be launched via the “Materialize All” button shown on the Overview tab. To provide optional configuration, hold shift when clicking “Materialize”.
  • The arguments to InputMapping and OutputMapping APIs have changed.

Community Contributions#

  • The ssh_resource can now accept configuration from environment variables. Thanks @cbini!
  • Spelling corrections in migrations.md. Thanks @gogi2811!

1.0.6 (core) / 0.16.6 (libraries)#

New#

  • [dagit] nbconvert is now installed as an extra in Dagit.
  • Multiple assets can be monitored for materialization using the multi_asset_sensor (experimental).
  • Run status sensors can now monitor jobs in external repositories.
  • The config argument of define_asset_job now works if the job contains partitioned assets.
  • When configuring sqlite-based storages in dagster.yaml, you can now point to environment variables.
  • When emitting RunRequests from sensors, you can now optionally supply an asset_selection argument, which accepts a list of AssetKeys to materialize from the larger job.
  • [dagster-dbt] load_assets_from_dbt_project and load_assets_from_dbt_manifest now support the exclude parameter, allowing you to more precisely which resources to load from your dbt project (thanks @flvndh!)
  • [dagster-k8s] schedulerName is now available for all deployments in the Helm chart for users who use a custom Kubernetes scheduler

Bugfixes#

  • Previously, types for multi-assets would display incorrectly in Dagit when specified. This has been fixed.
  • In some circumstances, viewing nested asset paths in Dagit could lead to unexpected empty states. This was due to incorrect slicing of the asset list, and has been fixed.
  • Fixed an issue in Dagit where the dialog used to wipe materializations displayed broken text for assets with long paths.
  • [dagit] Fixed the Job page to change the latest run tag and the related assets to bucket repository-specific jobs. Previously, runs from jobs with the same name in different repositories would be intermingled.
  • Previously, if you launched a backfill for a subset of a multi-asset (e.g. dbt assets), all assets would be executed on each run, instead of just the selected ones. This has been fixed.
  • [dagster-dbt] Previously, if you configured a select parameter on your dbt_cli_resource , this would not get passed into the corresponding invocations of certain context.resources.dbt.x() commands. This has been fixed.

1.0.4 (core) / 0.16.4 (libraries)#

New#

  • Assets can now be materialized to storage conditionally by setting output_required=False. If this is set and no result is yielded from the asset, Dagster will not create an asset materialization event, the I/O manager will not be invoked, downstream assets will not be materialized, and asset sensors monitoring the asset will not trigger.
  • JobDefinition.run_request_for_partition can now be used inside sensors that target multiple jobs (Thanks Metin Senturk!)
  • The environment variable DAGSTER_GRPC_TIMEOUT_SECONDS now allows for overriding the default timeout for communications between host processes like dagit and the daemon and user code servers.
  • Import time for the dagster module has been reduced, by approximately 50% in initial measurements.
  • AssetIn now accepts a dagster_type argument, for specifying runtime checks on asset input values.
  • [dagit] The column names on the Activity tab of the asset details page no longer reference the legacy term “Pipeline”.
  • [dagster-snowflake] The execute_query method of the snowflake resource now accepts a use_pandas_result argument, which fetches the result of the query as a Pandas dataframe. (Thanks @swotai!)
  • [dagster-shell] Made the execute and execute_script_file utilities in dagster_shell part of the public API (Thanks Fahad Khan!)
  • [dagster-dbt] load_assets_from_dbt_project and load_assets_from_dbt_manifest now support the exclude parameter. (Thanks @flvndh!)

Bugfixes#

  • [dagit] Removed the x-frame-options response header from Dagit, allowing the Dagit UI to be rendered in an iframe.
  • [fully-featured project example] Fixed the duckdb IO manager so the comment_stories step can load data successfully.
  • [dagster-dbt] Previously, if a select parameter was configured on the dbt_cli_resource, it would not be passed into invocations of context.resources.dbt.run() (and other similar commands). This has been fixed.
  • [dagster-ge] An incompatibility between dagster_ge_validation_factory and dagster 1.0 has been fixed.
  • [dagstermill] Previously, updated arguments and properties to DagstermillExecutionContext were not exposed. This has since been fixed.

Documentation#

  • The integrations page on the docs site now has a section for links to community-hosted integrations. The first linked integration is @silentsokolov’s Vault integration.

1.0.3 (core) / 0.16.3 (libraries)#

New#

  • Failure now has an allow_retries argument, allowing a means to manually bypass retry policies.
  • dagstermill.get_context and dagstermill.DagstermillExecutionContext have been updated to reflect stable dagster-1.0 APIs. pipeline/solid referencing arguments / properties will be removed in the next major version bump of dagstermill.
  • TimeWindowPartitionsDefinition now exposes a get_cron_schedule method.

Bugfixes#

  • In some situations where an asset was materialized and that asset that depended on a partitioned asset, and that upstream partitioned asset wasn’t part of the run, the partition-related methods of InputContext returned incorrect values or failed erroneously. This was fixed.
  • Schedules and sensors with the same names but in different repositories no longer affect each others idempotence checks.
  • In some circumstances, reloading a repository in Dagit could lead to an error that would crash the page. This has been fixed.

Community Contributions#

  • @will-holley added an optional key argument to GCSFileManager methods to set the GCS blob key, thank you!
  • Fix for sensors in fully featured example, thanks @pwachira!

Documentation#