How does Prefect integrate with Spark?

it depends a lot on where your Spark cluster is running.


Prefect is an official Databricks partner, so if you want to leverage Spark on Databricks, Prefect can help you orchestrate those workflows:


Also, we integrate with Fugue:


AWS has an extremely easy way to run EMR jobs with awswrangler:

Self-hosted Spark on Kubernetes

Running a Spark job on a Kubernetes cluster is more difficult as you have to submit jobs to a cluster yourself (even with Kubeflow). But it’s possible - you need to run spark_submit command to submit jobs to the cluster and poll for status