As I continue to break walls to get to the final goal of a working custom Infrastructure Block of a Render job, I now got to a state where I actually was able to run a flow inside a Render, the flow finishes in success state, but then weirdly enough says that: “Infrastructure returned without reporting flow run”.
Let me explain what I currently have.
This is the Infrastructure block:
So, looking at the logs of the agent, everything actually looks and runs nicely.
The agent is getting the flow, and inside the Infrastructure block I am logging the current runId, starting a new Render instance and polling (blocking) until it’s finished:
So the Render instance is running and we are inside a loop, waiting, and polling for it to finish. Then 5 minutes later, when the instance finished doing it’s stuff:
From the image above you can see my log which is “job finished successfully”, and right after I see that error with “Infrastructue returned without…”. But if you look at my code, just after my “success” log, I return a successful InfrastructureResult:
So the question why is the agent yelling about me not returning an InfrastructureResult?
(As always, I’ve tried looking at the GCP and AWS repos regarding their custom infra block and tried to find some line about reporting status to the agent that I didn’t make, but I could not find anything.)
Hi Yaron! Your Render infrastructure block is looking good.
The error message you see is happening because the agent wants you to call the task_status.started callback to let it know the flow has started running. See here for an example of how the Azure Container Instances block does it, or here in the ECS block.
The agent wants this because this callback gives you a chance to provide a unique ID that can be used to cancel the flow if the user requests cancellation. When that happens, the agent will call your block’s kill method and pass the unique ID in as the first parameter. See here for a look at how the ECS block handles it.
In your infrastructure block, you could call the callback with something like task_status.started(render_job_id).
I see you noted that render_start_job only returns after the Render instance finishes running. So perhaps you could pass task_status into render_start_job as a parameter, and then call it with the Render job ID after you create the job, but before you start polling to check for job completion.
I hope this helps, but if you have any other questions, please feel free to post a follow-up message!
@ryan_peden Yes! it worked. So I believe everything is working now. Some small thing that still bothers me is this warning when the instance starts working:
/usr/local/lib/python3.7/runpy.py:125: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in unpredictable behaviour
Nothing specific yet, but I’ve seen the same warning sometimes appear when running flows in other containerized environments like Azure Container Instances and it’s on our list of issues to address.
For what it’s worth, this should not actually cause unpredictable behavior in prefect.engine, but the warning does look ominous so I would like to make it disappear.
That’s odd. Are you able to try it with Python 3.8 or above?
On 3.7 we use a custom version of copytree and it looks like that’s where the problem is starting. It’s hard to tell if our code is causing the problem, or if it’s something external. If the problem disappears on 3.8, that would help narrow things down.
/opt/render/project/python/Python-3.8.16/lib/python3.8/runpy.py:127: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in unpredictable behaviour
Mar 27 08:25:16 PM warn(RuntimeWarning(msg))
Mar 27 08:25:16 PM 17:25:16.709 | INFO | Flow run 'enlightened-marten' - Downloading flow code from storage at None
Mar 27 08:25:17 PM 17:25:17.868 | ERROR | Flow run 'enlightened-marten' - Flow could not be retrieved from deployment.
Mar 27 08:25:17 PM Traceback (most recent call last):
Mar 27 08:25:17 PM File "/opt/render/project/src/.venv/lib/python3.8/site-packages/prefect/engine.py", line 274, in retrieve_flow_then_begin_flow_run
Mar 27 08:25:17 PM flow = await load_flow_from_flow_run(flow_run, client=client)
Mar 27 08:25:17 PM File "/opt/render/project/src/.venv/lib/python3.8/site-packages/prefect/client/utilities.py", line 47, in with_injected_client
Mar 27 08:25:17 PM return await fn(*args, **kwargs)
Mar 27 08:25:17 PM File "/opt/render/project/src/.venv/lib/python3.8/site-packages/prefect/deployments.py", line 194, in load_flow_from_flow_run
Mar 27 08:25:17 PM await storage_block.get_directory(from_path=deployment.path, local_path=".")
Mar 27 08:25:17 PM File "/opt/render/project/src/.venv/lib/python3.8/site-packages/prefect/filesystems.py", line 966, in get_directory
Mar 27 08:25:17 PM copytree(
Mar 27 08:25:17 PM File "/opt/render/project/python/Python-3.8.16/lib/python3.8/shutil.py", line 557, in copytree
Mar 27 08:25:17 PM return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
Mar 27 08:25:17 PM File "/opt/render/project/python/Python-3.8.16/lib/python3.8/shutil.py", line 513, in _copytree
Mar 27 08:25:17 PM raise Error(errors)
Mar 27 08:25:17 PM shutil.Error: [('/tmp/tmpg09rf7_lprefect/.git/objects/pack/pack-0bfd76b4389ea94e4d681cb18045eac511b6db34.pack', './.git/objects/pack/pack-0bfd76b4389ea94e4d681cb18045eac511b6db34.pack', "[Errno 13] Permission denied: './.git/objects/pack/pack-0bfd76b4389ea94e4d681cb18045eac511b6db34.pack'"), ('/tmp/tmpg09rf7_lprefect/.git/objects/pack/pack-0bfd76b4389ea94e4d681cb18045eac511b6db34.idx', './.git/objects/pack/pack-0bfd76b4389ea94e4d681cb18045eac511b6db34.idx', "[Errno 13] Permission denied: './.git/objects/pack/pack-0bfd76b4389ea94e4d681cb18045eac511b6db34.idx'")]
But what is interesting, I figured out the exact line that causes the error.