Flow run status when agent gets killed

How does a flow run status get updated if an agent gets killed while the tasks are executing?

Agents seem to be points of failure in the system right now. Since they do not seem to have any recovery or retry mechanism baked into them, once a flow enters a running state it’s essential that the agent which has picked up the flow stays alive for the duration of the run.

Would love to know if there’s an alternate view ?

1 Like

If the agent dies, the flow run won’t be able to complete. You should design that your system in a way that agents are on 24/7 to ensure that this doesn’t happen.

Some ways to accomplish that:

  • running your agent as Kubernetes deployment
  • running your agent as a supervisord/systemd process
  • running your agent as an ECS service

Will the active flow run which was ongoing b/w the agent being killed / auto-restarted resume processing?

My understanding is that if an agent goes down then any active flow runs basically go into no-man’s land. If workers have been handed over the tasks, they will continue processing and complete but no further downstream tasks will be executed.

For new flow-runs to be picked up, we will need new agents, but the “active flow run” basically fails but doesn’t signal it’s status back to the server.

it depends on the type of infra - e.g. with DockerContainer or KubernetesJob, Prefect agent spins up individual container/pod per flow run but those run as individual processes so even if the agent dies, those may continue even if agent won’t get state updates about it