Explanation of the issue
Any time you see that your flow run has transitioned from a
Scheduled into a
Pending state, but it doesn’t move to the
Running state, this indicates an issue in the execution layer, e.g.:
- your agent can’t deploy your flow run to a given infrastructure
- something is wrong in your Kubernetes or ECS cluster, or your VM
Pending state means that the agent was able to pick up the scheduled flow run and it submitted it for execution, but something has gone wrong in the flow run deployment.
How to resolve that?
- Verify that the agent process is running e.g. Kubernetes deployment, ECS service, dockerd daemon
- Check the agent logs to see if anything suspicious stands out there
- Verify that your execution layer is able to pull your flow run’s image e.g. if the image needs to be pulled from a container registry, make sure your container can reach the Internet and has appropriate permissions to pull the image
- Verify that your execution layer has enough permissions to spin up the required resources e.g. IAM roles, valid Prefect API key
- Verify that your execution layer has enough capacity on the cluster to deploy your flow run - we’ve seen similar issues when the agent is starved for resources - try allocating more CPU and memory to the agent process and see whether that helps
- Agent is polling too frequently, consuming lots of resources and not having enough resources to deploy runs to infra: try decreasing the poll frequency to, e.g., 30 seconds:
prefect config set PREFECT_AGENT_QUERY_INTERVAL='30.0'
- Check if there is more than one agent polling for runs from the same work queue - we’ve seen some issues when the user had multiple agents polling from the same work queue and this often led to some Pending runs that can’t get deployed efficiently
Hi, I built a toy example and ran into the pending issue (It was marked as “late” and then “pending”). Have checked all that are listed above, but could not resolve the issue. Can you plz shed more light on what could go wrong there?
Here is my example:
from prefect import flow
from prefect.deployments import Deployment, run_deployment
aaa = Deployment.build_from_flow(flow=my_flow,
if __name__ == "__main__":
And I make sure orion and agent are on:
prefect orion start
prefect agent start --match 'prod-'
What I got from the UI:
Flow run pending
PS: Both Orion and agent didn’t report any error.
perhaps try deploying with CLI as shown in this example, also using the match pattern?
Hi Anna, thanks for your reply. My actual use case involves a main flow accepting many dataclasses as arguments. The problem with prefect deployment cli command is that it does not accept dataclasses. Even if I refactored my main flow, converting all dataclasses into actual parameters, there would be too many parameters to type in the cli.
why would you type those parameters into CLI? the parameter schema should get automatically inferred from the flow function
Sorry I didn’t make it clear. I meant, when building deployment with CLI, passing lots of parameters via “–params” option might not be viable for me. So am looking to programmatically generate params under the “parameter” section in deployment.yml. I will let you know should the pending issue occur again.
passing params through CLI is only required if you want to override them for a custom deployment which uses different default values than those set on a flow function
perhaps changing those directly on the flow function would work better?
the alternative is to first build a deployment, this generates a YAML file, and then modify those default values in the YAML file.
I haven’t found an issue in Github regarding problem number 7 - " Check if there is more than one agent polling for runs from the same work queue…".
I am having this problem currently, I have 5 agents all working for the same queue, each with a concurrency limit of 10. I prefer to have many agents instead of 1 with a higher concurrency in case any of them crash or the node they are running in crashes.
Is there a plan to fix this?
Thanks in advance