IBM DataStage with Dagster

In this example, you'll build a pipeline with Dagster that:

Wraps IBM DataStage replication jobs as Dagster multi-assets
Runs inline data quality checks in the same step as materialization
Uses a translator pattern to map DataStage tables to Dagster asset keys
Configures everything with a YAML-based Dagster component

Prerequisites

To follow the steps in this guide, you'll need:

Basic Python knowledge
Python 3.10+ installed on your system. For more information, see the Installation guide.
Familiarity with IBM DataStage

note

This example runs in demo mode and doesn't require the cpdctl CLI. If you want to run this example against a real DataStage instance, follow the IBM installation instructions and set demo_mode: false in the YAML configuration.

Step 1: Set up your Dagster environment

First, set up a new Dagster project.

Clone the Dagster repo and navigate to the project:
```
cd examples/docs_projects/project_datastage
```
Install the required dependencies with uv:
```
uv sync
```
Activate the virtual environment:
- MacOS
- Windows
source .venv/bin/activate

Step 2: Launch the Dagster webserver

Navigate to the project root directory and start the Dagster webserver:

dg dev

note

With demo_mode: true set in the YAML configuration, the project simulates a DataStage replication job locally without a cpdctl installation.

Next steps

Continue this example with defining assets

Prerequisites​

Step 1: Set up your Dagster environment​

Step 2: Launch the Dagster webserver​

Next steps​

Prerequisites

Step 1: Set up your Dagster environment

Step 2: Launch the Dagster webserver

Next steps