Prefect 2 hangs when using ProcessPoolExecutor

Hello,
I am in the process of migrating from Prefect 1 to 2. I am using a virtual machine (CentOS 8, 4.18.0-365.el8.x86_64, python 3.9.7) to start a prefect agent and the prefect server UI. Some of my flows are calling a function from an external library that makes use of python’s ProcessPoolExecutor class to spawn some processes. In prefect v1 I used a SequentialTaskRunner and a limited concurrency and so far I got it working fine.

In Prefect v2 I can’t get the flow using this library to complete as it seems that the processes spawned by the ProcessPoolExecutor object are never joined.

I have made a small repo to test the problem also on my local linux machine and it seems that if I trigger a run with some dummy values the function containing the ProcessPoolExecutor object never returns: GitHub - grigolet/prefect-demo: Demo for troubleshooting prefect issue

Is this a known issue? Am I doing something wrong?

2 Likes

I think this can be boiled down to the simplest form of:

# flow.py
from concurrent.futures import ProcessPoolExecutor
import time


@flow(persist_result=False)
def simplest_hanging_flow():
    with ProcessPoolExecutor() as p:
        p.submit(time.sleep, 0.5)


if __name__ == "__main__":
    simplest_hanging_flow()

And running the script with python flow.py should result into a log trace like:

14:49:33.277 | DEBUG   | prefect.profiles - Using profile 'default'
14:49:34.914 | DEBUG   | prefect.client - Using ephemeral application with database at sqlite+aiosqlite:////home/grigolet/.prefect/orion.db
14:49:35.064 | INFO    | prefect.engine - Created flow run 'auspicious-muskrat' for flow 'simplest-hanging-flow'
14:49:35.065 | DEBUG   | Flow run 'auspicious-muskrat' - Starting 'ConcurrentTaskRunner'; submitted tasks will be run concurrently...
14:49:35.068 | DEBUG   | prefect.task_runner.concurrent - Starting task runner...
14:49:35.074 | DEBUG   | prefect.client - Using ephemeral application with database at sqlite+aiosqlite:////home/grigolet/.prefect/orion.db
14:49:35.346 | DEBUG   | Flow run 'auspicious-muskrat' - Executing flow 'simplest-hanging-flow' for flow run 'auspicious-muskrat'...
14:49:35.347 | DEBUG   | Flow run 'auspicious-muskrat' - Beginning execution...

with the flow run hanging

Having a similar problem - using multiprocess pool inside a task causes the task to hang and never return.
Killing the process causes the flow to succeed.
Removing the call to multiprocess pool also causes the tasks to succeed.