How does Prefect integrate with Spark?

anna_geller · August 23, 2022, 11:01am

it depends a lot on where your Spark cluster is running.

Databricks

Prefect is an official Databricks partner, so if you want to leverage Spark on Databricks, Prefect can help you orchestrate those workflows:

Fugue

Also, we integrate with Fugue:

AWS EMR

AWS has an extremely easy way to run EMR jobs with awswrangler:

https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/015%20-%20EMR.html

Self-hosted Spark on Kubernetes

Running a Spark job on a Kubernetes cluster is more difficult as you have to submit jobs to a cluster yourself (even with Kubeflow). But it’s possible - you need to run spark_submit command to submit jobs to the cluster and poll for status

Topic	Replies	Views
Prefect Collection for interacting with Databricks: prefect-databricks Archive prefect-2-0 , databricks , prefect-collections	603	August 16, 2022
How to use Prefect together with an EMR cluster on AWS? Archive prefect-1-0 , dag-flow-structure , aws , resource-manager , emr , spark	1627	February 7, 2022
Prefect Collection for interacting with Fugue: prefect-fugue Archive prefect-2-0 , dask , spark , prefect-collections , fugue	674	August 23, 2022
Any plan to add ECSRunTask to AWS Tasks? Archive prefect-1-0 , aws , fargate , task-library , ecs	575	February 22, 2022
Prefect Collection for interacting with Kubernetes: prefect-kubernetes Archive prefect-2-0 , kubernetes , prefect-collections	467	December 15, 2022

How does Prefect integrate with Spark?

Databricks

Fugue

AWS EMR

Self-hosted Spark on Kubernetes

Related topics