Skip to main content
Version: Next

Pyspark (dagster-pyspark)

dagster_pyspark.PySparkResource ResourceDefinition

This resource provides access to a PySpark Session for executing PySpark code within Dagster.

Example:

@op
def my_op(pyspark: PySparkResource)
spark_session = pyspark.spark_session
dataframe = spark_session.read.json("examples/src/main/resources/people.json")


@job(
resource_defs={
"pyspark": PySparkResource(
spark_config={
"spark.executor.memory": "2g"
}
)
}
)
def my_spark_job():
my_op()

Legacy

dagster_pyspark.pyspark_resource ResourceDefinition

This resource provides access to a PySpark SparkSession for executing PySpark code within Dagster.

Example:

@op(required_resource_keys={"pyspark"})
def my_op(context):
spark_session = context.resources.pyspark.spark_session
dataframe = spark_session.read.json("examples/src/main/resources/people.json")

my_pyspark_resource = pyspark_resource.configured(
{"spark_conf": {"spark.executor.memory": "2g"}}
)

@job(resource_defs={"pyspark": my_pyspark_resource})
def my_spark_job():
my_op()