Flow fails with error: "'api.prefect.cloud' does not appear to be an IPv4 or IPv6 address"

Hello everyone. Hope you’re doing well.

Out of 5 or 6 flows that I have running scheduled once a day every day, there have been a few instances where my flow crashes at the exact beginning with this not very telling trace:

Crash detected! Request to https://api.prefect.cloud/api/accounts/{my_workflow_route} failed: Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/anyio/_core/_sockets.py", line 186, in connect_tcp
    addr_obj = ip_address(remote_host)
  File "/usr/local/lib/python3.9/ipaddress.py", line 53, in ip_address
    raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: 'api.prefect.cloud' does not appear to be an IPv4 or IPv6 address

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
  File "/usr/local/lib/python3.9/site-packages/httpcore/backends/asyncio.py", line 111, in connect_tcp
    stream: anyio.abc.ByteStream = await anyio.connect_tcp(
  File "/usr/local/lib/python3.9/site-packages/anyio/_core/_sockets.py", line 189, in connect_tcp
    gai_res = await getaddrinfo(
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/socket.py", line 954, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

Has anyone encountered this error before and knows what it can be referencing? I’m having a hard time understanding what I can do to avoid it in the future.

I can’t really help you but I’m having the exact same problem, the same flow will work some days and won’t on other days, and I have the same error message. Maybe they are some other people affected out there.

For self hosting deployment, mismatch of deployment network and agent network causing this error.
For my case Using same network_mode and name in deployment file will resolved the issue.
In case of cloud, you might have to put network_mode=“bridge”; although in docs it says by deafult it will be set.