Using Dagster pipes
In this guide, we’ll show you how to use Dagster Pipes with Dagster’s built-in subprocess PipesSubprocessClient
to run a local subprocess with a given command and environment. You can then send information such as structured metadata and logging back to Dagster from the subprocess, where it will be visible in the Dagster UI.
To get there, you'll:
- Create a Dagster asset that invokes a subprocess
- Modify existing code to work with Dagster Pipes to send information back to Dagster
- Learn about using Dagster Pipes with other entities in the Dagster system in the Reference section
This guide focuses on using an out-of-the-box PipesSubprocessClient
resource. For further customization with the subprocess invocation, use open_dagster_pipes
approach instead. Refer to Customizing Dagster Pipes protocols for more info.
Prerequisites
To use Dagster Pipes to run a subprocess, you’ll need to have Dagster (dagster
) and the Dagster UI (dagster-webserver
) installed. Refer to the Installation guide for more info.
You'll also need an existing Python script. We’ll use the following Python script to demonstrate. This file will be invoked by the Dagster asset that you’ll create later in this tutorial.
Create a file named external_code.py
and paste the following into it:
import pandas as pd
def main():
orders_df = pd.DataFrame({"order_id": [1, 2], "item_id": [432, 878]})
total_orders = len(orders_df)
print(f"processing total {total_orders} orders")
if __name__ == "__main__":
main()
Ready to get started?
When you've fulfilled all the prerequisites for the tutorial, you can get started by creating a Dagster asset that executes a subprocess.