You are viewing an unreleased or outdated version of the documentation

Changelog#

1.6.6 (core) / 0.22.6 (libraries)#

New#

  • Dagster officially supports Python 3.12.
  • dagster-polars has been added as an integration. Thanks @danielgafni!
  • [dagster-dbt] @dbt_assets now supports loading projects with semantic models.
  • [dagster-dbt] @dbt_assets now supports loading projects with model versions.
  • [dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
  • [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
  • [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.

Bugfixes#

  • Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
  • Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
  • [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
  • [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
  • [dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!

Experimental#

  • Observable source assets can now yield ObserveResults with no data_version.
  • You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
  • [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.

Documentation#

  • Updated docs to reflect newly-added support for Python 3.12.

Dagster Cloud#

  • [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.

1.6.5 (core) / 0.22.5 (libraries)#

New#

  • Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
  • [dagster-k8s] Include k8s pod debug info in run worker failure messages.
  • [dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.

Bugfixes#

  • A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
  • [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
  • [instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
  • [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
  • [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.

Experimental#

  • @observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
  • [auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
  • [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.

Documentation#

  • Fixed an error in our asset checks docs. Thanks @vaharoni!
  • Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
  • Fixed an issue on the Hello Dagster! guide that prevented it from loading.
  • Add specific capabilities of the Airflow integration to the Airflow integration page.
  • Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.

0.12.6#

New#

  • [dagster-dbt] Added a new synchronous RPC dbt resource (dbt_rpc_sync_resource), which allows you to programmatically send dbt commands to an RPC server, returning only when the command completes (as opposed to returning as soon as the command has been sent).
  • Specifying secrets in the k8s_job_executor now adds to the secrets specified in K8sRunLauncher instead of overwriting them.
  • The local_file_manager no longer uses the current directory as the default base_dir instead defaulting to LOCAL_ARTIFACT_STORAGE/storage/file_manager. If you wish, you can configure LOCAL_ARTIFACT_STORAGE in your dagster.yaml file.

Bugfixes#

  • Following the recent change to add strict Content-Security-Policy directives to Dagit, the CSP began to block the iframe used to render ipynb notebook files. This has been fixed and these iframes should now render correctly.
  • Fixed an error where large files would fail to upload when using the s3_pickle_io_manager for intermediate storage.
  • Fixed an issue where Kubernetes environment variables defined in pipeline tags were not being applied properly to Kubernetes jobs.
  • Fixed tick preview in the Recent live tick timeline view for Sensors.
  • Added more descriptive error messages for invalid sensor evaluation functions.
  • dagit will now write to a temp directory in the current working directory when launched with the env var DAGSTER_HOME not set. This should resolve issues where the event log was not keeping up to date when observing runs progress live in dagit with no DAGSTER_HOME
  • Fixed an issue where retrying from a failed run sometimes failed if the pipeline was changed after the failure.
  • Fixed an issue with default config on to_job that would result in an error when using an enum config schema within a job.

Community Contributions#

  • Documentation typo fix for pipeline example, thanks @clippered!

Experimental#

  • Solid and resource versions will now be validated for consistency. Valid characters are A-Za-z0-9_.

Documentation#

  • The “Testing Solids and Pipelines” section of the tutorial now uses the new direct invocation functionality and tests a solid and pipeline from an earlier section of the tutorial.
  • Fixed the example in the API docs for EventMetadata.python_artifact.

0.12.5#

Bugfixes#

  • Fixed tick display in the sensor/schedule timeline view in Dagit.
  • Changed the dagster sensor list and dagster schedule list CLI commands to include schedules and sensors that have never been turned on.
  • Fixed the backfill progress stats in Dagit which incorrectly capped the number of successful/failed runs.
  • Improved query performance in Dagit on pipeline (or job) views, schedule views, and schedules list view by loading partition set data on demand instead of by default.
  • Fixed an issue in Dagit where re-executing a pipeline that shares an identical name and graph to a pipeline in another repository could lead to the wrong pipeline being executed.
  • Fixed an issue in Dagit where loading a very large DAG in the pipeline overview could sometimes lead to a render loop that repeated the same GraphQL query every few seconds, causing an endless loading state and never rendering the DAG.
  • Fixed an issue with execute_in_process where providing default executor config to a job would cause config errors.
  • Fixed an issue with default config for jobs where using an ops config entry in place of solids would cause a config error.
  • Dynamic outputs are now properly supported while using adls2_io_manager
  • ModeDefinition now validates the keys of resource_defs at definition time.
  • Failure exceptions no longer bypass the RetryPolicy if one is set.

Community Contributions#

  • Added serviceAccount.name to the user deployment Helm subchart and schema, thanks @jrouly!

Experimental#

  • To account for ECS’ eventual consistency model, the EcsRunLauncher will now exponentially backoff certain requests for up to a minute while waiting for ECS to reach a consistent state.
  • Memoization is now available from all execution entrypoints. This means that a pipeline tagged for use with memoization can be launched from dagit, the launch CLI, and other modes of external execution, whereas before, memoization was only available via execute_pipeline and the execute CLI.
  • Memoization now works with root input managers. In order to use a root input manager in a pipeline that utilizes memoization, provide a string value to the version argument on the decorator:
from dagster import root_input_manager

@root_input_manager(version="foo")
def my_root_manager(_):
    pass
  • The versioned_fs_io_manager now defaults to using the storage directory of the instance as a base directory.
  • GraphDefinition.to_job now accepts a tags dictionary with non-string values - which will be serialized to JSON. This makes job tags work similarly to pipeline tags and solid tags.

Documentation#

  • The guide for migrating to the experimental graph, job, and op APIs now includes an example of how to migrate a pipeline with a composite solid.

0.12.4#

New#

  • [helm] The compute log manager now defaults to a NoOpComputeLogManager. It did not make sense to default to the LocalComputeLogManager as pipeline runs are executed in ephemeral jobs, so logs could not be retrieved once these jobs were cleaned up. To have compute logs in a Kubernetes environment, users should configure a compute log manager that uses a cloud provider.
  • [helm] The K8sRunLauncher now supports environment variables to be passed in from the current container to the launched Kubernetes job.
  • [examples] Added a new dbt_pipeline to the hacker news example repo, which demonstrates how to run a dbt project within a Dagster pipeline.
  • Changed the default configuration of steps launched by the k8s_job_executor to match the configuration set in the K8sRunLauncher.

Bugfixes#

  • Fixed an issue where dagster gRPC servers failed to load if they did not have permissions to write to a temporary directory.
  • Enabled compression and raised the message receive limit for our gRPC communication. This prevents large pipelines from causing gRPC message limit errors. This limit can now be manually overridden with the DAGSTER_GRPC_MAX_RX_BYTES environment variable.
  • Fixed errors with dagster instance migrate when the asset catalog contains wiped assets.
  • Fixed an issue where backfill jobs with the “Re-execute from failures” option enabled were not picking up the solid selection from the originating failed run.
  • Previously, when using memoization, if every step was memoized already, you would get an error. Now, the run succeeds and runs no steps.
  • [dagster-dbt] If you specify --models, --select, or --exclude flags while configuring the dbt_cli_resource, it will no longer attempt to supply these flags to commands that don’t accept them.
  • [dagstermill] Fixed an issue where yield_result wrote output value to the same file path if output names are the same for different solids.

Community Contributions#

  • Added the ability to customize the TTL and backoff limit on Dagster Kubernetes jobs (thanks @Oliver-Sellwood!)

Experimental#

  • ops can now be used as a config entry in place of solids.
  • Fixed a GraphQL bug in ECS deployments by making the EcsRunLauncher more resilient to ECS’ eventual consistency model.

Documentation#

  • Fixed hyperlink display to be more visible within source code snippets.
  • Added documentation for Run Status Sensor on the Sensors concept page.

0.12.3#

New#

  • The Dagit web app now has a strict Content Security Policy.
  • Introduced a new decorator [@run_status_sensor](https://docs.dagster.io/_apidocs/schedules-sensors#dagster.run_status_sensor) which defines sensors that react to given PipelineRunStatus.
  • You can now specify a solid on build_hook_context. This allows you to access the hook_context.solid parameter.

Bugfixes#

  • dagster’s dependency on docstring-parser has been loosened.
  • @pipeline now pulls its description from the doc string on the decorated function if it is provided.
  • The sensor example generated via dagster new-project now no longer targets a non-existent mode.

Community Contributions#

  • Thanks for the docs typo fix @cvoegele!

Experimental#

  • The “jobs” key is now supported when returning a dict from @repository functions.
  • GraphDefinition.to_job now supports the description argument.
  • Jobs with nested Graph structures no longer fail to load in dagit.
  • Previously, the ECS reference deployment granted its tasks the AmazonECS_FullAccess policy. Now, the attached roles has been more narrowly scoped to only allow the daemon and dagit tasks to interact with the ECS actions required by the EcsRunLauncher.
  • The EcsRunLauncher launches ECS tasks by setting a command override. Previously, if the Task Definition it was using also defined an entrypoint, it would concatenate the entrypoint and the overridden command which would cause launches to fail with Error: Got unexpected extra arguments. Now, it ignores the entrypoint and launches succeed.

Documentation#

  • Fixed a broken link in the sensor testing overview.

0.12.2#

New#

  • Improved Asset catalog load times in Dagit, for Dagster instances that have fully migrated using dagster instance migrate.
  • When using the ScheduleDefinition constructor to instantiate a schedule definition, if a schedule name is not provided, the name of the schedule will now default to the pipeline name, plus “_schedule”, instead of raising an error.

Bugfixes#

  • Fixed a bug where pipeline definition arguments description and solid_retry_policy were getting dropped when using a solid_hook decorator on a pipeline definition (#4355).
  • Fixed an issue where the Dagit frontend wasn’t disabling certain UI elements when launched in read-only mode.
  • Fixed a bug where directly invoking an async solid with type annotations would fail, if called from another async function.

Documentation#

  • Added a guide to migrating from the existing Pipeline, Mode, Preset, and Solid APIs to the new experimental Graph, Job, and Op APIs. Check out the guide here!