Flow could not be retrieved from deployment - intermittent error

hugocatlas · January 31, 2023, 10:17am

Hello everyone,

I’m facing a bizarre error in my deployments.
It looks like the one reported here, but not quite…

What I’m doing is the following:

we deploy 3 services on k8s: prefect-orion, an ETL service (let’s call it svc-A), and a fraud detection/data aggregation service (call it svc-B).
both svc-A and -B are scheduled deployments that leverage on prefect-orion
when the service is (re)deployed on our k8s cluster with ArgoCD, it cancels any running workflows for that given deployment, submit the new version of the scheduled deployment (using deployments.Deployment.build_from_flow(…)), and run one flow straightaway (using deployments.run_deployment(…))
until svc-B appeared, everything was fine. Now, we’re making tests to roll this service into production and that’s where the problem arises

Problem is:

sometimes, everything goes fine and as expected: the two services have their own schedule and share the same work queue;
other times svc-B just fails with an error:

Flow could not be retrieved from deployment.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/prefect/engine.py", line 262, in retrieve_flow_then_begin_flow_run
    flow = await load_flow_from_flow_run(flow_run, client=client)
  File "/usr/local/lib/python3.10/dist-packages/prefect/client/utilities.py", line 47, in with_injected_client
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/prefect/deployments.py", line 166, in load_flow_from_flow_run
    await storage_block.get_directory(from_path=deployment.path, local_path=".")
  File "/usr/local/lib/python3.10/dist-packages/prefect/filesystems.py", line 143, in get_directory
    copytree(from_path, local_path, dirs_exist_ok=True)
  File "/usr/lib/python3.10/shutil.py", line 556, in copytree
    with os.scandir(src) as itr:
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.10/dist-packages/mm_datascripts'

Then, very likely, in the next scheduled execution of a flow for svc-B, it will work as expected. In the meantime, nothing has changed neither on the application nor on the infrastructure side.

The intermittent character of this issue is puzzling me. Has someone already faced such behavior?

Cheers,

Hugo

Christopher_Boyd · February 22, 2023, 3:10pm

Hi hugocatlas -

Simplifying this problem a little bit, I think the construct and design of your system are not as relevant as the core problem -

Flow could not be retrieved from deployment.
FileNotFoundError: [Errno 2] No such file or directory: ‘/usr/local/lib/python3.10/dist-packages/mm_datascripts’

How and when is your deployment created, and applied?
Further, when this does occur:
What image is being used?
What does your sys.path and pythonpath look like?
How does mm_datascripts get to this path?
What is the path and entrypoint for your deployment registration?

hugocatlas · February 22, 2023, 3:41pm

Hi Christopher,

Thank you for replying.
As a matter of fact, the problem seems to have been solved yesterday after I found this bug report: BUG: If multiple agents have same type + labels, only one shows up on "Agents" page · Issue #384 · PrefectHQ/ui · GitHub, more particularly this comment: BUG: If multiple agents have same type + labels, only one shows up on "Agents" page · Issue #384 · PrefectHQ/ui · GitHub

The problem seemed to be due to the fact that both services start agents using the same work queues (named after our clients), hence, when flow runs are created from deployments, it seems that orion sometimes sends the run to one agent, sometimes to the other one.
I solved it by removing this behavior: now, each service starts agents with unique work queue names and up to now, everything seems to be working fine.

Else, having the possibility to name the agents as the comment indicates (but no reference to this feature exists in the documentation any longer) would also solve definitively the problem, I suppose.

Cheers,

Hugo

Topic		Replies	Views
Flow could not be retrieved from deployment Help prefect-2-0 , deployment , troubleshooting	2	956	September 14, 2023
"[Errno 28] No space left on device" resulting in "Flow could not be retrieved from deployment." Help	1	704	February 11, 2023
Flow Not Found: Docker Environment Archive prefect-2-0	2	968	December 21, 2022
S3 storage and Flow code retrieval Help prefect-2-0 , storage , aws	0	267	July 17, 2023
Remote Agent not able to download from S3 storage Help prefect-2-0 , s3-storage , marvin	2	1086	April 15, 2023

Flow could not be retrieved from deployment - intermittent error

Related topics