DagsterDocs

Main Concepts #

Dagster is a data orchestrator. It lets you define jobs in terms of the data flow between logical components called ops. These jobs can be developed locally and run anywhere.

How to use the Main Concepts section #

Each page in this section contains:

  • Overview: Description of the concept.
  • Relevant APIs: Index of top-level Dagster APIs that are relevant to the concept.
  • Examples: Easy-to-copy code snippets for the concept.
  • Patterns: A list of advanced patterns to use and anti-patterns to avoid with the concept.

Sections #

Ops, Jobs, and Graphs are the building blocks of Dagster code. This section covers how to define and use both ops and jobs.

Resources enable you to separate logic from its heavyweight external dependencies. This makes testing and developing data jobs possible in various environments.

Dagster enables you to build testable and maintainable data applications. This section shows that Dagster enables you to unit-test your data jobs, separate business logic from external dependencies, and run data quality tests.

Dagster provides a configuration system that allows you to document, schematize, and error-check your configuration. This section demonstrates how configurations work with different Dagster entities.

Dagster includes gradual, opt-in typing for the inputs and outputs of ops. This section explains how to define, use, and test types in Dagster.

IO Managers are user-provided objects that store op outputs and load them as inputs to downstream ops. This section explains how Dagster thinks about IO management and shows how to define and use IO managers and other IO-related features.

Dagit is a web-based interface for viewing and interacting with Dagster objects. This section walks you through Dagit's functionalities and the GraphQL API used to interact with Dagster programatically.

A workspace is a collection of user-defined repositories and information about where to find them. Dagster tools, like Dagit and the Dagster CLI, use workspaces to load user code. This section shows how to define and when to use repositories and workspaces.

Schedulers can launch runs on a fixed interval, while sensors allow you to run based on any external state change. This section demonstrates how to define them and their convenient capabilities like partitioning and backfilling.

Assets are data objects that you produce during a run. This section walks you through how to inform Dagster about these assets so that they can be tracked over time.

Dagster includes a rich and extensible logging system. This section showcases Dagster's built-in logger and shows how you can customize loggers to fit your logging and monitoring infrastructure.