View in #prefect-community on Slack
@Michael_Hadorn: Hi there
I’m not able to let orion run with docker run. (more infos in the threads)
I followed this tutorial (all on the same machine):
https://orion-docs.prefect.io/tutorials/docker-flow-runner/
But it will crash while I run this flow. Looks like the orion api is not listening to request coming from others than localhost (in this case it’s the docker network).
$ ss -tulpe4 | grep 4200
tcp LISTEN 0 2048 127.0.0.1:4200 0.0.0.0:* users:(("uvicorn",pid=718461,fd=7)) uid:1001 ino:2627707 sk:1006 <->
How can I solve this?
The error message:
11:10:22.762 | ERROR | prefect.engine - Engine execution of flow run '566acdf3-1cd6-4f78-a75c-9e45819bc3a6' exited with unexpected exception
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/anyio/_core/_sockets.py", line 127, in try_connect
stream = await asynclib.connect_tcp(remote_host, remote_port, local_address)
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 1518, in connect_tcp
await get_running_loop().create_connection(StreamProtocol, host, port,
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1056, in create_connection
raise exceptions[0]
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1041, in create_connection
sock = await self._connect_sock(
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 955, in _connect_sock
await self.sock_connect(sock, address)
File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 502, in sock_connect
return await fut
File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 537, in _sock_connect_cb
raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('172.17.0.1', 4200)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/httpcore/_exceptions.py", line 8, in map_exceptions
yield
File "/usr/local/lib/python3.9/site-packages/httpcore/backends/asyncio.py", line 101, in connect_tcp
stream: anyio.abc.ByteStream = await anyio.connect_tcp(
File "/usr/local/lib/python3.9/site-packages/anyio/_core/_sockets.py", line 184, in connect_tcp
raise OSError('All connection attempts failed') from cause
OSError: All connection attempts failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
yield
File "/usr/local/lib/python3.9/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
resp = await self._pool.handle_async_request(req)
File "/usr/local/lib/python3.9/site-packages/httpcore/_async/connection_pool.py", line 253, in handle_async_request
raise exc
File "/usr/local/lib/python3.9/site-packages/httpcore/_async/connection_pool.py", line 237, in handle_async_request
response = await connection.handle_async_request(request)
File "/usr/local/lib/python3.9/site-packages/httpcore/_async/connection.py", line 86, in handle_async_request
raise exc
File "/usr/local/lib/python3.9/site-packages/httpcore/_async/connection.py", line 63, in handle_async_request
stream = await self._connect(request)
File "/usr/local/lib/python3.9/site-packages/httpcore/_async/connection.py", line 111, in _connect
stream = await self._network_backend.connect_tcp(**kwargs)
File "/usr/local/lib/python3.9/site-packages/httpcore/backends/auto.py", line 23, in connect_tcp
return await self._backend.connect_tcp(
File "/usr/local/lib/python3.9/site-packages/httpcore/backends/asyncio.py", line 101, in connect_tcp
stream: anyio.abc.ByteStream = await anyio.connect_tcp(
File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.9/site-packages/httpcore/_exceptions.py", line 12, in map_exceptions
raise to_exc(exc)
httpcore.ConnectError: All connection attempts failed
@Anna_Geller: I can confirm that using the command “prefect deployment run my-flow/example” also didn’t work for me. However, when I started the run through the UI, it worked as expected. Can you try triggering the flow run through UI?
@Michael_Hadorn: I got the same error with the command and over the GUI.
@Anna_Geller: what’s your prefect --version
?
@Michael_Hadorn: current 2.0a12
@Anna_Geller: Did you try starting the agent separately as shown in the last section of the tutorial?
PREFECT_API_URL="<http://127.0.0.1:4200/api/>" prefect agent start
@Michael_Hadorn: Yes. I will provide you a full example.
So you also used the current version?
@Anna_Geller: exactly, yes, I used the same version, that’s why I was wondering why it worked for me through the UI
@Michael_Hadorn: So I did basically exactly what desribed in https://orion-docs.prefect.io/tutorials/docker-flow-runner/
# using: Ubuntu 20.04.3 LTS, conda, python3.9
# new env with only this deps
conda create -n o python=3.9
conda activate o
pip install -U "prefect>=2.0a"
# to clean up everything from testing before
prefect orion database reset
# Tab: 1
prefect orion start --no-agent
# Tab: 2
PREFECT_API_URL="<http://127.0.0.1:4200/api/>" prefect agent start
# using this code in example-deployment:
from prefect import flow
from prefect.deployments import DeploymentSpec
from prefect.flow_runners import DockerFlowRunner
@flow
def my_flow():
print("Hello from Docker!")
DeploymentSpec(
name="example",
flow=my_flow,
flow_runner=DockerFlowRunner()
)
prefect deployment create ./example-deployment.py
# to create this in gui (will already fail)
prefect deployment run my-flow/example
# also after a quick run on in the gui
16:23:46.763 | INFO | prefect.agent - Submitting flow run 'aa1daeeb-7f62-4688-a0fc-f2f658420a2c'
16:23:46.849 | INFO | prefect.flow_runner.docker - Flow run 'calculating-hummingbird' has container settings = {'image': 'prefecthq/prefect:2.0a12-python3.9', 'network': None, 'command': ['python', '-m', 'prefect.engine', 'aa1daeeb-7f62-4688-a0fc-f2f658420a2c'], 'environment': {'PREFECT_API_URL': '<http://host.docker.internal:4200/api/>'}, 'auto_remove': False, 'labels': {'io.prefect.flow-run-id': 'aa1daeeb-7f62-4688-a0fc-f2f658420a2c'}, 'extra_hosts': {'host.docker.internal': 'host-gateway'}, 'name': 'calculating-hummingbird', 'volumes': []}
16:23:47.442 | INFO | prefect.agent - Completed submission of flow run 'aa1daeeb-7f62-4688-a0fc-f2f658420a2c'
16:23:47.467 | INFO | prefect.flow_runner.docker - Flow run container 'calculating-hummingbird' has status 'running'
15:23:48.837 | ERROR | prefect.engine - Engine execution of flow run 'aa1daeeb-7f62-4688-a0fc-f2f658420a2c' exited with unexpected exception
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/anyio/_core/_sockets.py", line 127, in try_connect
stream = await asynclib.connect_tcp(remote_host, remote_port, local_address)
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 1518, in connect_tcp
await get_running_loop().create_connection(StreamProtocol, host, port,
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1056, in create_connection
raise exceptions[0]
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1041, in create_connection
sock = await self._connect_sock(
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 955, in _connect_sock
await self.sock_connect(sock, address)
File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 502, in sock_connect
return await fut
File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 537, in _sock_connect_cb
raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('172.17.0.1', 4200)
...
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/prefect/engine.py", line 950, in <module>
enter_flow_run_engine_from_subprocess(flow_run_id)
File "/usr/local/lib/python3.9/site-packages/prefect/engine.py", line 134, in enter_flow_run_engine_from_subprocess
return anyio.run(retrieve_flow_then_begin_flow_run, flow_run_id)
File "/usr/local/lib/python3.9/site-packages/anyio/_core/_eventloop.py", line 56, in run
return asynclib.run(func, *args, **backend_options)
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 233, in run
return native_run(wrapper(), debug=debug)
File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 228, in wrapper
return await func(*args)
File "/usr/local/lib/python3.9/site-packages/prefect/client.py", line 69, in with_injected_client
return await fn(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/prefect/engine.py", line 188, in retrieve_flow_then_begin_flow_run
flow_run = await client.read_flow_run(flow_run_id)
File "/usr/local/lib/python3.9/site-packages/prefect/client.py", line 720, in read_flow_run
response = await self.get(f"/flow_runs/{flow_run_id}")
File "/usr/local/lib/python3.9/site-packages/prefect/client.py", line 214, in get
response = await self._client.get(route, **kwargs)
File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1729, in get
return await self.request(
File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1506, in request
return await self.send(request, auth=auth, follow_redirects=follow_redirects)
File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1593, in send
response = await self._send_handling_auth(
File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1621, in _send_handling_auth
response = await self._send_handling_redirects(
File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1658, in _send_handling_redirects
response = await self._send_single_request(request)
File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1695, in _send_single_request
response = await transport.handle_async_request(request)
File "/usr/local/lib/python3.9/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
resp = await self._pool.handle_async_request(req)
File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.9/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.ConnectError: All connection attempts failed
16:23:49.268 | INFO | prefect.flow_runner.docker - Flow run container 'calculating-hummingbird' has status 'exited'
Very interesting that this is working on your computer.
Do you also use python 3.9 on ubuntu?
FTR: Docker version 20.10.12, build e91ed57
@Anna_Geller And beside of that, I really didn’t understand, why I have to call prefect deployment run xx
to create a deployment in the GUI. Why is the create not enough?
I can not always let the full flow running to only register it as a deployment
@Anna_Geller: Thanks for the excellent write up of the issue! Will try to dive deeper soon.
- I’m running it on Mac.
- You don’t need to run “prefect deployment run deployname/flowname” when you start the flow run from the UI - it’s either CLI or UI
Will check this soon
@Michael_Hadorn: But if do only a create the deployment, then it’s not visible in the GUI under deployments. It looks it need at least one run to be displayed.
@Anna_Geller: Correct. An alternative is to use this “Show all deployments” button
@Michael_Hadorn: Ah cool! Thank you very much. Didn’t look at this…
@Anna_Geller: Did you figure this out by now? I just recreated my environment completely from scratch and everything from the tutorial seems to work, see the image. What I did was:
• clearing my browser history and closing all tabs
• clearing cache in my Pycharm
• deleting and resetting the DB: prefect orion database reset
• recreating new Conda environment with Python 3.9
• installing Orion with no cache: pip install -U "prefect>=2.0a" --no-cache-dir
• starting Orion again - all services at once: prefect orion start
• creating the deployment for this Docker flow from the tutorial: prefect deployment create flows/docker_flow.py
Doing that seems to fix it. LMK if the tutorial still doesn’t work for you after following all this
not sure why you’re getting a different IP instead of localhost here:
ConnectionRefusedError: [Errno 111] Connect call failed ('172.17.0.1', 4200)
Are you running it on some remote instance? If so, maybe you need to set this variable before starting Orion?
export PREFECT_ORION_API_HOST=172.17.0.1
@Michael_Hadorn: @Anna_Geller Thanks a lot for your tests.
Is your test still on the mac?
No, I use everything on my local machine (nothing remote). It’s really the same like in the example.
I guess for Linux the localhost will be replaced with the host IP (via the host.docker.internal) from the docker network (172…).
I also test to bash into the running container: I can ping my host, but via telnet the port is not accessible.
As far as I know is an application with the local IP before from the ss, only available from the host itself, also not from container running on this host.
$ ss -tulpe4 | grep 4200
tcp LISTEN 0 2048 127.0.0.1:4200 0.0.0.0:* users:(("uvicorn",pid=718461,fd=7)) uid:1001 ino:2627707 sk:1006 <->
I know from prefect core, that there was switch to run the server public (0.0.0.0:4200) so that we were able to connect (we used a proxy anyway, so the gui was not accessible from outside).
And there was also a way for setting the docker network for the flow, but this was not working for me either.
That’s what I did in the container (where ef66e610ea96 is the prefect image):
docker run -it ef66e610ea96 bash
apt update
apt install iputils-ping telnet
root@69e086bf56ae:/usr/src/app# ping 172.17.0.1
PING 172.17.0.1 (172.17.0.1) 56(84) bytes of data.
64 bytes from 172.17.0.1: icmp_seq=1 ttl=64 time=0.123 ms
64 bytes from 172.17.0.1: icmp_seq=2 ttl=64 time=0.069 ms
root@69e086bf56ae:/usr/src/app# telnet 172.17.0.1 4200
Trying 172.17.0.1...
telnet: Unable to connect to remote host: Connection refused
I also tried with --network host.
I guess the easiest part would be, if we could run the gui also in a container. Then we could use the same docker network
@Anna_Geller: Wait, are you running Orion in the container? The process I described is that everything runs locally and the local Orion service spins up a container for a flow run
@Michael_Hadorn: @Anna_Geller No, my setup is what I described (the very locally default).
Everything local, beside the flows running with the docker runner.
My docker output is from the flow container, trying to connect the host (because there the gui is listening on 4200)
@Anna_Geller: I will ask some Orion engineers. If you want to run UI in a container, maybe you can try the setup for a local Kubernetes cluster until then?
@Michael_Hadorn: Ok thanks a lot!
I’m little afraid of kubernetes…
@Anna_Geller: Don’t Panic but I asked Orion team and I’m sure you will be able to use DockerFlowRunner on Ubuntu, we are probably missing something obvious here
@Michael_Adkins: What happens if you run prefect orion start --no-agent --host 0.0.0.0
By default we bind to 127.0.0.1
which may not allow access from the container.
@Michael_Hadorn: @Michael_Adkins Thanks a lot for your hint about setting the host.
I guess, there should be a method to only allow connections from the local docker, not for the full network.
Anyway, it works partially but crashes later:
httpx.HTTPStatusError: Server error ‘500 Internal Server Error’ for url ‘http://host.docker.internal:4200/api/data/retrieve’
First I let it run with an own agent. (The host IP is correct.)
08:35:59.295 | INFO | prefect.agent - Submitting flow run '8c1d94b3-44f5-4b28-a056-d87099f9f2ac'
08:35:59.656 | INFO | prefect.flow_runner.docker - Flow run 'valiant-degu' has container settings = {'image': 'prefecthq/prefect:2.0a12-python3.9', 'network': 'host', 'command': ['python', '-m', 'prefect.engine', '8c1d94b3-44f5-4b28-a056-d87099f9f2ac'], 'environment': {'PREFECT_API_URL': '<http://host.docker.internal:4200/api/>'}, 'auto_remove': False, 'labels': {'io.prefect.flow-run-id': '8c1d94b3-44f5-4b28-a056-d87099f9f2ac'}, 'extra_hosts': {'host.docker.internal': 'host-gateway'}, 'name': 'valiant-degu', 'volumes': []}
08:35:59.891 | INFO | prefect.agent - Completed submission of flow run '8c1d94b3-44f5-4b28-a056-d87099f9f2ac'
08:35:59.913 | INFO | prefect.flow_runner.docker - Flow run container 'valiant-degu' has status 'running'
07:36:01.796 | ERROR | prefect.engine - Engine execution of flow run '8c1d94b3-44f5-4b28-a056-d87099f9f2ac' exited with unexpected exception
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/prefect/engine.py", line 950, in <module>
enter_flow_run_engine_from_subprocess(flow_run_id)
File "/usr/local/lib/python3.9/site-packages/prefect/engine.py", line 134, in enter_flow_run_engine_from_subprocess
return anyio.run(retrieve_flow_then_begin_flow_run, flow_run_id)
File "/usr/local/lib/python3.9/site-packages/anyio/_core/_eventloop.py", line 56, in run
return asynclib.run(func, *args, **backend_options)
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 233, in run
return native_run(wrapper(), debug=debug)
File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 228, in wrapper
return await func(*args)
File "/usr/local/lib/python3.9/site-packages/prefect/client.py", line 69, in with_injected_client
return await fn(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/prefect/engine.py", line 193, in retrieve_flow_then_begin_flow_run
flow = await load_flow_from_deployment(deployment, client=client)
File "/usr/local/lib/python3.9/site-packages/prefect/client.py", line 69, in with_injected_client
return await fn(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/prefect/deployments.py", line 329, in load_flow_from_deployment
maybe_flow = await client.resolve_datadoc(deployment.flow_data)
File "/usr/local/lib/python3.9/site-packages/prefect/client.py", line 1244, in resolve_datadoc
return await resolve_inner(datadoc)
File "/usr/local/lib/python3.9/site-packages/prefect/client.py", line 1237, in resolve_inner
data = await self.retrieve_data(data)
File "/usr/local/lib/python3.9/site-packages/prefect/client.py", line 805, in retrieve_data
response = await <http://self.post|self.post>(
File "/usr/local/lib/python3.9/site-packages/prefect/client.py", line 157, in post
response.raise_for_status()
File "/usr/local/lib/python3.9/site-packages/httpx/_models.py", line 1510, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url '<http://host.docker.internal:4200/api/data/retrieve>'
For more information check: <https://httpstatuses.com/500>
08:36:02.135 | INFO | prefect.flow_runner.docker - Flow run container 'valiant-degu' has status 'exited'
Then i tried also with the server-agent itself. I takes the IP from the host (makes sense, but not here). But even if this is working (support for two urls), not clear if it later would crash with the same error like before.
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url '<http://0.0.0.0:4200/api/data/retrieve>'
Do you also recommend to use it with kubernetes then?
@Anna_Geller: I only meant: you can use Kubernetes flow runner until we know what’s wrong with Docker flow runner. We are using Ubuntu in our CI and this was working for Docker flow runner, so it’s definitely something we can figure out
@Michael_Hadorn: Ah ok. Thanks for the clarification. So it looks that it’s something I messed up…
@Michael_Adkins: So it looks like we’ve successfully connected here this was an issue in Prefect v1 as well, networking with Docker is a bit more restrictive on Linux.
I’m not sure what the 500 error is caused by. You should see logs on your server process with more details.
@Michael_Hadorn: @Michael_Adkins Thanks a lot for your response!
So do you have already a solution for this problem?
Because we can not really let this binded to 0.0.0.0.
Yes, in the server I see:
Encountered exception in request:
Traceback (most recent call last):
File "/home/michi/miniconda3/envs/o/lib/python3.9/site-packages/starlette/middleware/errors.py", line 159, in __call__
await <http://self.app|self.app>(scope, receive, _send)
...
File "/home/michi/miniconda3/envs/o/lib/python3.9/site-packages/prefect/orion/serializers.py", line 77, in loads
return read_blob(path)
File "/home/michi/miniconda3/envs/o/lib/python3.9/site-packages/prefect/orion/utilities/filesystem.py", line 20, in read_blob
with fsspec.open(path, mode="rb") as fp:
File "/home/michi/miniconda3/envs/o/lib/python3.9/site-packages/fsspec/core.py", line 103, in __enter__
f = self.fs.open(self.path, mode=mode)
File "/home/michi/miniconda3/envs/o/lib/python3.9/site-packages/fsspec/spec.py", line 1009, in open
f = self._open(
File "/home/michi/miniconda3/envs/o/lib/python3.9/site-packages/fsspec/implementations/local.py", line 155, in _open
return LocalFileOpener(path, mode, fs=self, **kwargs)
File "/home/michi/miniconda3/envs/o/lib/python3.9/site-packages/fsspec/implementations/local.py", line 250, in __init__
self._open()
File "/home/michi/miniconda3/envs/o/lib/python3.9/site-packages/fsspec/implementations/local.py", line 255, in _open
self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/8a4676a9102e4216b68fda52a4ba283d'
Do you have some hints?
@Michael_Adkins: It looks like your temporary files have been cleaned up and the deployment is gone from the persisted location
You’ll have to deploy it again
We’ll have to do something similar to https://github.com/PrefectHQ/prefect/pull/5182 where you start the server in a container with a docker network and run the flows on the same docker network.
GitHub: Allow override of Prefect API url in docker runs and improve inference by madkinsz · Pull Request #5182 · PrefectHQ/prefect
Since prefect orion start
doesn’t use containers, it’s a bit more work. We might need to look for alternative solutions.
@Michael_Hadorn: COOL! Now it’s working.
Ok, that I understand correctly: I have to wait. ^^
How can I avoid that the deployment specs will be written to the /tmp? Is there something like storages from prefect core?
Thank you anyway for your awesome support! @Anna_Geller, @Michael_Adkins
@Michael_Adkins: We’ll have more storage methods in the next release
You can change the data location that the server uses with PREFECT_ORION_DATA_BASE_PATH
which defaults to /tmp