Timeout error for long-running tasks running Prefect 2.0 in WSL

When a task runs for 12 minutes then when the task finished executing it fails with “Task run ‘test1-7c8b7bdd-0’ - Crash detected! Request to http://host.docker.internal:4200/api/blocks/get_default_storage_block timed out.”

I am running orion server on wsl2 and prefect tasks inside a docker container. It works fine for hundreds of other tasks, loops etc… Only an issue with long running tasks.

Test script:

from time import sleep

from prefect import flow, task
from tqdm.auto import tqdm

@task
def test1():
    for _ in tqdm(range(12 * 60)):
        sleep(1)
    return ""
    
@flow
def testflow():
    test1()

What storage do you use? with DockerFlowRunner, you need to use S3 storage (or other cloud object storage) to make it work.

We are currently working on making the storage blocks experience better, but for now, you can follow this approach:

Sorry it is not dockertaskrunner. I am running my application in docker container using sequential runner and the Orion server is running on the host.

Works fine on hundreds of tasks. Only fails after a task finishes and took over 5 minutes. The issue does not occur if executing outside docker.

The error is within async loop. It is just an outbound request so there should not be any timeout issues.

I tested this using S3 as default storage and got same error. FYO the full (very long) traceback is below.

09:57:31.115 | INFO    | Task run 'test1-e6c6c25b-0' - Crash detected! Request to http://host.docker.internal:4200/api/blocks/get_default_storage_block timed out.
09:58:01.144 | ERROR   | Flow run 'aboriginal-seahorse' - Encountered exception during execution:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/httpcore/backends/asyncio.py", line 104, in connect_tcp
    local_host=local_address,
  File "/usr/local/lib/python3.7/site-packages/anyio/_core/_sockets.py", line 180, in connect_tcp
    await event.wait()
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 574, in __aexit__
    raise exceptions[0]
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 611, in _run_wrapped_task
    await coro
  File "/usr/local/lib/python3.7/site-packages/anyio/_core/_sockets.py", line 127, in try_connect
    stream = await asynclib.connect_tcp(remote_host, remote_port, local_address)
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 1519, in connect_tcp
    local_addr=local_addr)
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 949, in create_connection
    await self.sock_connect(sock, address)
  File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 473, in sock_connect
    return await fut
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/httpcore/_exceptions.py", line 8, in map_exceptions
    yield
  File "/usr/local/lib/python3.7/site-packages/httpcore/backends/asyncio.py", line 104, in connect_tcp
    local_host=local_address,
  File "/usr/local/lib/python3.7/site-packages/anyio/_core/_tasks.py", line 103, in __exit__
    raise TimeoutError
TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.7/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection_pool.py", line 253, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection_pool.py", line 237, in handle_async_request
    response = await connection.handle_async_request(request)
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py", line 86, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py", line 63, in handle_async_request
    stream = await self._connect(request)
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py", line 111, in _connect
    stream = await self._network_backend.connect_tcp(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/httpcore/backends/auto.py", line 24, in connect_tcp
    host, port, timeout=timeout, local_address=local_address
  File "/usr/local/lib/python3.7/site-packages/httpcore/backends/asyncio.py", line 104, in connect_tcp
    local_host=local_address,
  File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/httpcore/_exceptions.py", line 12, in map_exceptions
    raise to_exc(exc)
httpcore.ConnectTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/prefect/engine.py", line 459, in orchestrate_flow_run
    flow_run_context.task_run_futures, client=client
  File "/usr/local/lib/python3.7/site-packages/prefect/engine.py", line 820, in wait_for_task_runs_and_report_crashes
    task_run_id=future.task_run.id, state=state, force=True
  File "/usr/local/lib/python3.7/site-packages/prefect/client.py", line 1581, in set_task_run_state
    json=dict(state=state_data_json, force=force),
  File "/usr/local/lib/python3.7/site-packages/prefect/utilities/httpx.py", line 151, in post
    raise_for_status=raise_for_status,
  File "/usr/local/lib/python3.7/site-packages/prefect/utilities/httpx.py", line 61, in request
    request, auth=auth, follow_redirects=follow_redirects
  File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1597, in send
    history=[],
  File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1624, in _send_handling_auth
    history=history,
  File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1658, in _send_handling_redirects
    response = await self._send_single_request(request)
  File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1695, in _send_single_request
    response = await transport.handle_async_request(request)
  File "/usr/local/lib/python3.7/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectTimeout
--- Orion logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/httpcore/backends/asyncio.py", line 104, in connect_tcp
    local_host=local_address,
  File "/usr/local/lib/python3.7/site-packages/anyio/_core/_sockets.py", line 180, in connect_tcp
    await event.wait()
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 574, in __aexit__
    raise exceptions[0]
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 611, in _run_wrapped_task
    await coro
  File "/usr/local/lib/python3.7/site-packages/anyio/_core/_sockets.py", line 127, in try_connect
    stream = await asynclib.connect_tcp(remote_host, remote_port, local_address)
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 1519, in connect_tcp
    local_addr=local_addr)
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 949, in create_connection
    await self.sock_connect(sock, address)
  File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 473, in sock_connect
    return await fut
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/httpcore/_exceptions.py", line 8, in map_exceptions
    yield
  File "/usr/local/lib/python3.7/site-packages/httpcore/backends/asyncio.py", line 104, in connect_tcp
    local_host=local_address,
  File "/usr/local/lib/python3.7/site-packages/anyio/_core/_tasks.py", line 103, in __exit__
    raise TimeoutError
TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.7/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection_pool.py", line 253, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection_pool.py", line 237, in handle_async_request
    response = await connection.handle_async_request(request)
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py", line 86, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py", line 63, in handle_async_request
    stream = await self._connect(request)
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py", line 111, in _connect
    stream = await self._network_backend.connect_tcp(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/httpcore/backends/auto.py", line 24, in connect_tcp
    host, port, timeout=timeout, local_address=local_address
  File "/usr/local/lib/python3.7/site-packages/httpcore/backends/asyncio.py", line 104, in connect_tcp
    local_host=local_address,
  File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/httpcore/_exceptions.py", line 12, in map_exceptions
    raise to_exc(exc)
httpcore.ConnectTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/prefect/logging/handlers.py", line 133, in send_logs
    await client.create_logs(self._pending_logs)
  File "/usr/local/lib/python3.7/site-packages/prefect/client.py", line 1614, in create_logs
    await self._client.post(f"/logs/", json=serialized_logs)
  File "/usr/local/lib/python3.7/site-packages/prefect/utilities/httpx.py", line 151, in post
    raise_for_status=raise_for_status,
  File "/usr/local/lib/python3.7/site-packages/prefect/utilities/httpx.py", line 61, in request
    request, auth=auth, follow_redirects=follow_redirects
  File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1597, in send
    history=[],
  File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1624, in _send_handling_auth
    history=history,
  File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1658, in _send_handling_redirects
    response = await self._send_single_request(request)
  File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1695, in _send_single_request
    response = await transport.handle_async_request(request)
  File "/usr/local/lib/python3.7/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectTimeout
Worker information:
    Approximate queue length: 1
    Pending log batch length: 1
    Pending log batch size: 376
The log worker will attempt to send these logs again in 2.0s
09:58:31.239 | ERROR   | Flow run 'aboriginal-seahorse' - Crash detected! Request to http://host.docker.internal:4200/api/blocks/get_default_storage_block timed out.
--- Orion logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/httpcore/backends/asyncio.py", line 104, in connect_tcp
    local_host=local_address,
  File "/usr/local/lib/python3.7/site-packages/anyio/_core/_sockets.py", line 180, in connect_tcp
    await event.wait()
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 574, in __aexit__
    raise exceptions[0]
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 611, in _run_wrapped_task
    await coro
  File "/usr/local/lib/python3.7/site-packages/anyio/_core/_sockets.py", line 127, in try_connect
    stream = await asynclib.connect_tcp(remote_host, remote_port, local_address)
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 1519, in connect_tcp
    local_addr=local_addr)
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 949, in create_connection
    await self.sock_connect(sock, address)
  File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 473, in sock_connect
    return await fut
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/httpcore/_exceptions.py", line 8, in map_exceptions
    yield
  File "/usr/local/lib/python3.7/site-packages/httpcore/backends/asyncio.py", line 104, in connect_tcp
    local_host=local_address,
  File "/usr/local/lib/python3.7/site-packages/anyio/_core/_tasks.py", line 103, in __exit__
    raise TimeoutError
TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.7/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection_pool.py", line 253, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection_pool.py", line 237, in handle_async_request
    response = await connection.handle_async_request(request)
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py", line 86, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py", line 63, in handle_async_request
    stream = await self._connect(request)
  File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py", line 111, in _connect
    stream = await self._network_backend.connect_tcp(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/httpcore/backends/auto.py", line 24, in connect_tcp
    host, port, timeout=timeout, local_address=local_address
  File "/usr/local/lib/python3.7/site-packages/httpcore/backends/asyncio.py", line 104, in connect_tcp
    local_host=local_address,
  File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/httpcore/_exceptions.py", line 12, in map_exceptions
    raise to_exc(exc)
httpcore.ConnectTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/prefect/logging/handlers.py", line 133, in send_logs
    await client.create_logs(self._pending_logs)
  File "/usr/local/lib/python3.7/site-packages/prefect/client.py", line 1614, in create_logs
    await self._client.post(f"/logs/", json=serialized_logs)
  File "/usr/local/lib/python3.7/site-packages/prefect/utilities/httpx.py", line 151, in post
    raise_for_status=raise_for_status,
  File "/usr/local/lib/python3.7/site-packages/prefect/utilities/httpx.py", line 61, in request
    request, auth=auth, follow_redirects=follow_redirects
  File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1597, in send
    history=[],
  File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1624, in _send_handling_auth
    history=history,
  File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1658, in _send_handling_redirects
    response = await self._send_single_request(request)
  File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1695, in _send_single_request
    response = await transport.handle_async_request(request)
  File "/usr/local/lib/python3.7/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectTimeout
Worker information:
    Approximate queue length: 1
    Pending log batch length: 2
    Pending log batch size: 752
The log worker will attempt to send these logs again in 2.0s
1 Like

Could you open a Github issue and describe it there? I would super appreciate it. There’s a special Orion Preview issue type here:

Temporary workaround is to make regular requests to the orion server in the background to keep the connection alive

1 Like

Thank you so much for the detailed description. We’ll investigate and give update directly on the GitHub issue

I also got this timed out exception when i finished a more than 5 mins long sql query, is there any solution now?

Hi @dmquant, welcome to Prefect Discourse! The best way to follow up here is to track the GitHub issue

I see there have been a few updates on the github issue, but I’m also still suffering from this problem on WSL2

let’s follow up on the GitHub issue if possible, sorry to hear about the issue, perhaps until then you can try out Prefect Cloud? https://app.prefect.cloud/

1 Like

Well, I can run short tasks regardless of whether they were triggered locally or from Prefect Cloud, but the agent that runs within WSL2 will periodically timeout and re-connect afterwards.

So, a flow will run successfully unless it happens to overlap with the periodic disconnect on the agent.

Gotcha, maybe you want to run your agent on a serverless container instead e.g. on AWS ECS Fargate? You can check this recipe with a blog post and video linked in the README

For the same on Azure you could try out: