Rolling changes of workers/agents. Or info to create a system for upgrading agent versions while there are consistent flows being sent to a workqueue or workpool

Also this could be for a blue/green deployment with workers, while flows are actively running.

Information that might be helpful when thinking about the building a system that works for your team:

  • For workers and agents, upon receiving a SIGTERM, they should complete any ongoing flow runs before terminating the process. Stopping an agent or worker executing flow runs should be safe, provided they have sufficient time to finish their current tasks.
  • In infrastructure such as Kubernetes, where flow run execution occurs outside the agent or worker, the worst-case scenario is that the flow crashes or becomes unresponsive, leaving it in a running state, as the agent or worker is unavailable to detect the failure.
  • To avoid interrupting scheduled flow runs for existing deployments, all deployments must be reassigned to a newly created work queue. Alternatively, it might be simpler to initiate a new agent or worker on the existing work queue and then decommission the old worker.
    – One could pause the work queue for a few minutes while it is going through the “maintenance” of stopping one agent and starting another.
2 Likes