Cloud Run Jobs - Possible to configure 'Number of retries per failed task'?

jackharrhy · February 9, 2024, 2:26pm

I’m deploying my flows to GCP Cloud Run from a push work pool, setup from following Serverless Push Work Pools - Prefect Docs

Due to my own mistake, I was saving too much data to disk, and since in Cloud Run disk space is stored into memory, I ran into a out of memory error. (Nothing to do with Prefect, PEBCAK)

However, it seems the default Cloud Run Jobs behaviour is to retry a failed ‘task’ (Not a prefect task, a task in the context of Cloud Run) 3 times before giving up, and Prefect does not seem to reflect this in the UI very well.

To give a specific example of a Prefect task that was restarted multiple times in the job UI:

You can see here that this task ‘Finished in state Completed()’ 4 times, even though it shows a run count of 1, since this repeatedly restarted by Cloud Run itself.

Cloud run permits setting the --max-retries field via the CLI / YAML, is it possible to define settings in the work pool ‘Base Job Template’ to set this not to retry?

On a sidenote, this makes more sense to be the default functionality when the work pool is created, since I would prefer a crashed Cloud Run Job to be retried by the configuration of the @flow / @task, rather than configuration that lives in GCP.

Topic		Replies	Views
How does retrying failed flow runs intersect with new flow run code? Help	1	371	March 14, 2023
Why results need to be configured in order to use retries? Archive prefect-1-0 , retries , results	0	805	May 3, 2022
Prefect 2 retry didn't skip flow with status manually updated to completed Help prefect-2-0	0	340	July 28, 2023
Prefect 2.6.5 is here! It includes retries from failure from the UI, Docker and Census integration, and many more! Announcements prefect-2-0 , release-notes , prefect-collections	1	1263	June 26, 2023
Checkpoint/restart capability Archive	1	969	August 31, 2022

Cloud Run Jobs - Possible to configure 'Number of retries per failed task'?

Related topics