Cloud Run Jobs - Possible to configure 'Number of retries per failed task'?

I’m deploying my flows to GCP Cloud Run from a push work pool, setup from following Serverless Push Work Pools - Prefect Docs

Due to my own mistake, I was saving too much data to disk, and since in Cloud Run disk space is stored into memory, I ran into a out of memory error. (Nothing to do with Prefect, PEBCAK)

However, it seems the default Cloud Run Jobs behaviour is to retry a failed ‘task’ (Not a prefect task, a task in the context of Cloud Run) 3 times before giving up, and Prefect does not seem to reflect this in the UI very well.

To give a specific example of a Prefect task that was restarted multiple times in the job UI:

You can see here that this task ‘Finished in state Completed()’ 4 times, even though it shows a run count of 1, since this repeatedly restarted by Cloud Run itself.

Cloud run permits setting the --max-retries field via the CLI / YAML, is it possible to define settings in the work pool ‘Base Job Template’ to set this not to retry?

On a sidenote, this makes more sense to be the default functionality when the work pool is created, since I would prefer a crashed Cloud Run Job to be retried by the configuration of the @flow / @task, rather than configuration that lives in GCP.

1 Like