How does Prefect integrate with Spark?

it depends a lot on where your Spark cluster is running.

Databricks

Prefect is an official Databricks partner, so if you want to leverage Spark on Databricks, Prefect can help you orchestrate those workflows:

Fugue

Also, we integrate with Fugue:

AWS EMR

AWS has an extremely easy way to run EMR jobs with awswrangler:

https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/015%20-%20EMR.html

Self-hosted Spark on Kubernetes

Running a Spark job on a Kubernetes cluster is more difficult as you have to submit jobs to a cluster yourself (even with Kubeflow). But it’s possible - you need to run spark_submit command to submit jobs to the cluster and poll for status