Migrating an Airflow BashOperator to Dagster
In this page, we'll explain migrating an Airflow BashOperator
to Dagster.
If using the BashOperator
to execute dbt commands, see "Migrating an Airflow BashOperator (dbt) to Dagster".
About the Airflow BashOperator
The Airflow BashOperator
is a common operator used to execute bash commands as part of a data pipeline.
from airflow.operators.bash import BashOperator
execute_script = BashOperator(
task_id="execute_script",
bash_command="python /path/to/script.py",
)
The BashOperator
's functionality is very general since it can be used to run any bash command, and there exist richer integrations in Dagster for many common BashOperator use cases. We'll explain how 1-1 migration of the BashOperator to execute a bash command in Dagster, and how to use the dagster-airlift
library to proxy the execution of the original task to Dagster. We'll also provide a reference for richer integrations in Dagster for common BashOperator use cases.
Dagster equivalent
The direct Dagster equivalent to the BashOperator
is to use the PipesSubprocessClient
to execute a bash command in a subprocess.
Migrating the operator
Migrating the operator breaks down into a few steps:
- Ensure that the resources necessary for your bash command are available to both your Airflow and Dagster deployments.
- Write an
asset
that executes the bash command using thePipesSubprocessClient
. - Use
dagster-airlift
to proxy execution of the original task to Dagster. - (Optional) Implement a richer integration for common BashOperator use cases.
Step 1: Ensure shared bash command access
First, you'll need to ensure that the bash command you're running is available for use in both your Airflow and Dagster deployments. What this entails will vary depending on the command you're running. For example, if you're running a Python script, it's as simple as ensuring the Python script exists in a shared location accessible to both Airflow and Dagster, and all necessary env vars are set in both environments.
Step 2: Writing an @asset
-decorated function
You can write a Dagster asset
-decorated function that runs your bash command. This is quite straightforward using the PipesSubprocessClient
.
from dagster import AssetExecutionContext, PipesSubprocessClient, asset
@asset
def script_result(context: AssetExecutionContext):
return (
PipesSubprocessClient()
.run(context=context, command="python /path/to/script.py")
.get_results()
)
Step 3: Using dagster-airlift to proxy execution
Finally, you can use dagster-airlift
to proxy the execution of the original task to Dagster. For more information, see "Migrating from Airflow to Dagster.
Step 4: Implementing richer integrations
For many of the use cases that you might be using the BashOperator for, Dagster might have better options. We'll detail some of those here.
Running a Python script
As mentioned above, you can use the PipesSubprocessClient
to run a Python script in a subprocess. But you can also modify this script to send additional information and logging back to Dagster. See the Dagster Pipes tutorial for more information.
Running a dbt command
We have a whole guide for switching from the BashOperator
to the dbt
integration in Dagster. For more information, see "Migrating an Airflow BashOperator (dbt) to Dagster".
Running S3 Sync or other AWS CLI commands
Dagster has a rich set of integrations for AWS services. For example, you can use the s3.S3Resource
to interact with S3 directly.