Flow run status when agent gets killed

mercurialsolo · August 27, 2022, 4:59pm

How does a flow run status get updated if an agent gets killed while the tasks are executing?

Agents seem to be points of failure in the system right now. Since they do not seem to have any recovery or retry mechanism baked into them, once a flow enters a running state it’s essential that the agent which has picked up the flow stays alive for the duration of the run.

Would love to know if there’s an alternate view ?

anna_geller · August 28, 2022, 12:34pm

If the agent dies, the flow run won’t be able to complete. You should design that your system in a way that agents are on 24/7 to ensure that this doesn’t happen.

Some ways to accomplish that:

running your agent as Kubernetes deployment
running your agent as a supervisord/systemd process
running your agent as an ECS service

mercurialsolo · August 30, 2022, 12:02am

Will the active flow run which was ongoing b/w the agent being killed / auto-restarted resume processing?

My understanding is that if an agent goes down then any active flow runs basically go into no-man’s land. If workers have been handed over the tasks, they will continue processing and complete but no further downstream tasks will be executed.

For new flow-runs to be picked up, we will need new agents, but the “active flow run” basically fails but doesn’t signal it’s status back to the server.

anna_geller · August 30, 2022, 1:24am

it depends on the type of infra - e.g. with DockerContainer or KubernetesJob, Prefect agent spins up individual container/pod per flow run but those run as individual processes so even if the agent dies, those may continue even if agent won’t get state updates about it

Topic		Replies	Views
My flow run in Prefect 2 is stuck in a Pending state - what can I do? Help prefect-2-0 , agent , stuck , pending , marvin	8	3846	June 23, 2023
Work queue concurrency limit and deleted flow runs Help	1	875	February 22, 2023
Why is my flow stuck in a Submitted State? Archive prefect-1-0 , states , infrastructure , stuck , submitted-state , lazarus	1	2401	February 24, 2022
Pending flow-runs block execution in queue Help prefect-2-0 , kubernetes	1	271	July 18, 2023
Prefect 2.7.5/2.7.6, flow stuck at Pending forever Help prefect-2-0 , agent , failure	3	961	January 5, 2023

Flow run status when agent gets killed

Related Topics