Pending flow-runs block execution in queue

davidp1404 · June 29, 2023, 6:14am

Hello, my current setup is:

Prefect v2.10.11
Infrastructure: kubernetes v1.24.6
I run 2 agent watching the same work queues due to resiliency
I run a simple flow every 10s linked to a work-queue with a limit of 10 and it works fine but eventually the execution of flows stops due to 10 flow-runs stuck on pending state blocking others to run.
The UI, command-cli or agent/server don’t reveal any error and exec into the agent pod what I get is:

$ prefect flow-run inspect 678db3d9-747a-4d50-9c22-3913d80dc751
05:59:49.195 | DEBUG | prefect.profiles - Using profile ‘default’
05:59:49.319 | DEBUG | prefect.client - Connecting to API at https://****/api/
FlowRun(
id=‘678db3d9-747a-4d50-9c22-3913d80dc751’,
created=DateTime(2023, 6, 28, 18, 48, 54, 179379, tzinfo=Timezone(‘+00:00’)),
updated=DateTime(2023, 6, 28, 19, 32, 36, 635283, tzinfo=Timezone(‘+00:00’)),
name=‘complex-aardwolf’,
flow_id=‘085c9387-e60f-4478-8649-95a503905f7e’,
state_id=‘69c3e7f8-891b-4d73-8672-4ea4fd89e38c’,
deployment_id=‘390398eb-4a14-4d1b-a372-120eb40a2e9a’,
work_queue_id=‘8bc6421c-4143-4303-b32a-f951e7954cce’,
work_queue_name=‘default’,
idempotency_key=‘scheduled 390398eb-4a14-4d1b-a372-120eb40a2e9a 2023-06-28T21:13:41.259000+02:00’,
tags=[‘auto-scheduled’],
state_type=StateType.PENDING,
state_name=‘Pending’,
expected_start_time=DateTime(2023, 6, 28, 19, 13, 41, 259000, tzinfo=Timezone(‘+00:00’)),
estimated_start_time_delta=datetime.timedelta(seconds=38768, microseconds=111511),
auto_scheduled=True,
infrastructure_document_id=‘8e418b68-d957-46ef-acc5-8656f769b37d’,
work_pool_id=‘a005d8d5-0dd0-4f0d-862f-9c57ef74e443’,
work_pool_name=‘default-agent-pool’,
state=State(
id=‘69c3e7f8-891b-4d73-8672-4ea4fd89e38c’,
type=StateType.PENDING,
name=‘Pending’,
timestamp=DateTime(2023, 6, 28, 19, 32, 36, 633732, tzinfo=Timezone(‘+00:00’)),
state_details=StateDetails(
flow_run_id=‘678db3d9-747a-4d50-9c22-3913d80dc751’,
scheduled_time=DateTime(2023, 6, 28, 19, 13, 41, 259000, tzinfo=Timezone(‘+00:00’))
)
)
)
If I cancel manually the pending flow-run everything start to work again till the next occurrence of the problem.
I understood that there is a watcher routine in the server that move the pending state to crash after a timeout, but it seems not work.
Any hint of how to solve this?
Thanks in advance.

voreh · July 18, 2023, 5:16pm

same problem here, did you fix it?

Topic		Replies	Views
My flow run in Prefect 2 is stuck in a Pending state - what can I do? Help prefect-2-0 , agent , stuck , pending , marvin	8	5019	June 23, 2023
Prefect 2.7.1 is released with a new UI page for coordinating task run concurrency limits, bulk-delete functionality from the UI for flows, deployments and work queues, improved timeout for Kubernetes jobs, and many more! Announcements prefect-2-0 , release-notes	0	859	December 8, 2022
Prefect 2.7.5/2.7.6, flow stuck at Pending forever Help prefect-2-0 , agent , failure	3	1235	January 5, 2023
Work queue concurrency limit and deleted flow runs Help	1	1192	February 22, 2023
Prefect 2.7.2 is here with enhanced pause/resume functionality, new Kubernetes and BitBucket collections, OpsGenie notification block, and more! Announcements prefect-2-0 , release-notes , kubernetes , bitbucket	4	1451	December 16, 2022

Pending flow-runs block execution in queue

Related topics