The dagster-aws integration library provides the PipesGlueClient resource which can be used to launch AWS Glue jobs from Dagster assets and ops. Dagster can receive regular events like logs, asset checks, or asset materializations from jobs launched with this client. Using it requires minimal code changes on the job side.
In the Dagster asset/op code, use the PipesGlueClient resource to launch the job:
import os
import boto3
from dagster_aws.pipes import PipesGlueClient
from dagster import AssetExecutionContext, asset
@assetdefglue_pipes_asset(
context: AssetExecutionContext, pipes_glue_client: PipesGlueClient
):return pipes_glue_client.run(
context=context,
start_job_run_params={"JobName":"Example Job","Arguments":{"some_parameter":"some_value"},},).get_materialize_result()
This will launch the AWS Glue job and monitor its status until it either fails or succeeds. A job failure will also cause the Dagster run to fail with an exception.
Dagster will now be able to launch the AWS Glue job from the glue_pipes_asset asset.
By default, the client uses the CloudWatch log stream (.../output/<job-run-id>) created by the Glue job to receive Dagster events. The client will also forward the stream to stdout.