Horizontal scaling on Kubernetes & handling Workers termination

Hello
I’m using a self-hosted Prefect instance (version 2.13.1) utilizing a Kubernetes cluster. I’m encountering some issues while trying to correctly set up horizontal scaling of my pods. Currently, my architecture includes one Kubernetes Deployment with the Prefect Server, three separate workpools, and three Kubernetes Deployments using HPA (Horizontal Pod Autoscaler) with Prefect Workers (of process type) for each of the workpools. Scaling up and adding new pods with Prefect Workers is relatively straightforward, but the problem arises when they need to be scaled down and terminated. I’ve noticed (please correct me if I’m wrong) that there is no built-in mechanism to wait for currently running flows to finish, except for some grace period. Another issue is that such unfinished runs later remain in the RUNNING state indefinitely and are not marked as CRASHED/CANCELLED.

My questions are:

  1. Is there any recommended way or some built-in solution for handling termination of Prefect Workers when they’re still processing flows?
  2. Do you have any general tips for horizontally scaling the Prefect instance or setting up a similar architecture, addressing the issues I’ve mentioned above?