My flow run in Prefect 2 is stuck in a Pending state - what can I do?

Explanation of the issue

Any time you see that your flow run has transitioned from a Scheduled into a Pending state, but it doesn’t move to the Running state, this indicates an issue in the execution layer, e.g.:

  • your agent can’t deploy your flow run to a given infrastructure
  • something is wrong in your Kubernetes or ECS cluster, or your VM

Pending state means that the agent was able to pick up the scheduled flow run and it submitted it for execution, but something has gone wrong in the flow run deployment.

How to resolve that?

  1. Verify that the agent process is running e.g. Kubernetes deployment, ECS service, dockerd daemon
  2. Check the agent logs to see if anything suspicious stands out there
  3. Verify that your execution layer is able to pull your flow run’s image e.g. if the image needs to be pulled from a container registry, make sure your container can reach the Internet and has appropriate permissions to pull the image
  4. Verify that your execution layer has enough permissions to spin up the required resources e.g. IAM roles, valid Prefect API key
  5. Verify that your execution layer has enough capacity on the cluster to deploy your flow run - we’ve seen similar issues when the agent is starved for resources - try allocating more CPU and memory to the agent process and see whether that helps
  6. Agent is polling too frequently, consuming lots of resources and not having enough resources to deploy runs to infra: try decreasing the poll frequency to, e.g., 30 seconds: prefect config set PREFECT_AGENT_QUERY_INTERVAL='30.0'
  7. Check if there is more than one agent polling for runs from the same work queue - we’ve seen some issues when the user had multiple agents polling from the same work queue and this often led to some Pending runs that can’t get deployed efficiently
3 Likes

Hi, I built a toy example and ran into the pending issue (It was marked as “late” and then “pending”). Have checked all that are listed above, but could not resolve the issue. Can you plz shed more light on what could go wrong there?

Here is my example:

from prefect import flow
from prefect.deployments import Deployment, run_deployment
@flow(name="my_flow", log_prints=True)
def my_flow()
    print("123")

def main():
   aaa = Deployment.build_from_flow(flow=my_flow, 
                                     name="test", 
                                     work_queue_name="prod-test", 
                                     skip_upload=False, 
                                     apply=True)

  run_deployment(name="my_flow/test", timeout=0)

if __name__ == "__main__":
    main()

And I make sure orion and agent are on:

prefect orion start
prefect agent start --match 'prod-'

What I got from the UI:
Work-Queue

Deployment

Flow run pending
Screenshot 2022-12-12 at 18.25.13

PS: Both Orion and agent didn’t report any error.

perhaps try deploying with CLI as shown in this example, also using the match pattern?

Hi Anna, thanks for your reply. My actual use case involves a main flow accepting many dataclasses as arguments. The problem with prefect deployment cli command is that it does not accept dataclasses. Even if I refactored my main flow, converting all dataclasses into actual parameters, there would be too many parameters to type in the cli.

why would you type those parameters into CLI? the parameter schema should get automatically inferred from the flow function

Sorry I didn’t make it clear. I meant, when building deployment with CLI, passing lots of parameters via “–params” option might not be viable for me. So am looking to programmatically generate params under the “parameter” section in deployment.yml. I will let you know should the pending issue occur again.

1 Like

passing params through CLI is only required if you want to override them for a custom deployment which uses different default values than those set on a flow function

perhaps changing those directly on the flow function would work better?

the alternative is to first build a deployment, this generates a YAML file, and then modify those default values in the YAML file.