I’m seeing the below when tinkering with a fresh dev setup, installed from pip just now. It seems to always be 25-31 seconds, which makes me think some code is doing a sleep
and not that my box is that overloaded. My web UI view shows no entries on the default page until these log messages pop up. My box is otherwise not having issues, htop shows plenty of free cores and memory and nothing else fails to be interactive.
* % RecentDeploymentsScheduler took 30.828482 seconds to run, which is longer than its loop interval of 5 seconds.
MarkLateRuns took 30.873706 seconds to run, which is longer than its loop interval of 5.0 seconds.
FlowRunNotifications took 29.957938 seconds to run, which is longer than its loop interval of 4 seconds.
RecentDeploymentsScheduler took 31.069303 seconds to run, which is longer than its loop interval of 5 seconds.
Edit:
This turned out to be because of the default location of the orion sqlite db being ~
, which is on an NFS for me. I switched it to postgres and am seeing far better performance and way fewer 30k line tracebacks when trying to submit a bunch of tasks. The default thinigs to homedir
scheme was so slow that I was getting these errors when submitting tasks to a DaskTaskRunner
.
2022-10-09 08:50:30,800 - distributed.worker - ERROR - Timed out during handshake while connecting to tcp://127.0.0.1:41479 after 30 s
Traceback (most recent call last):
File "/root//.conda/envs/plotting-v2/lib/python3.8/site-packages/distributed/comm/core.py", line 328, in connect
handshake = await asyncio.wait_for(comm.read(), time_left())
File "/root//.conda/envs/plotting-v2/lib/python3.8/asyncio/tasks.py", line 501, in wait_for
raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError
OSError: Timed out during handshake while connecting to tcp://127.0.0.1:41479 after 30 s
2022-10-09 08:50:30,802 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object a
t 0x7f304b806c40>>, <Task finished name='Task-16' coro=<Worker.heartbeat() done, defined at /root//.conda/envs/plotting-v2/lib/python3.8/site-packages/distributed/worker.py:1
185> exception=OSError('Timed out during handshake while connecting to tcp://127.0.0.1:41479 after 30 s')>)
Edit 2
I give up. I’m trying to run about a thousand tasks with the default executor since I earlier gave up on getting Dask to work. They were always failing at about 10 tasks started, despite moving PREFECT_HOME to a local disk. I reordered the sqlite statement which increases the timeout to after the switch to WAL, that got me to about 30 tasks started before crashing. When connecting to postgres for orion I can get about 100 tasks submitted before crashing, I also observe my own single sqlalchemy connection to postgres starts to have issues just before prefect crashes.
10:39:18.023 | ESC[1;31mERRORESC[1;0m | Flow run 'simple-hummingbird' - Crash detected! Execution was interrupted by an unexpected exception.
so instead I am skipping prefect logging of tasks, using one big flow that uses GNU parallel to start the 1000 subprocesses to do the work (but these cannot be annotated as tasks or I crash). My actual application data logs just fine to postgres this way, and it should be many orders of magnitude more traffic than Prefect should be responsible for. I should also note that overnight I ran this program where this task was instead a flow, and this ran just fine as far as I can tell, so maybe the code which submits flows is more resilient than the code which submits tasks. Or maybe it’s because they were all separate processes and now I’m trying to do it all in one process. But too many hours sank now to continue. Would take some serious performance testing / battle-hardening before I try this again.
# How I replaced the calling myfun.submit(arg1, arg2), write a big file with all args then
check_call(shlex.split("parallel --colsep ' ' -a workspec.txt python main.py used-to-be-a-task-that-i-could-call-from-python-but-its-unreliable {1} {2}"))
Example traceback:
10:39:11.037 | ESC[1;31mERRORESC[1;0m | prefect.orion - Encountered exception in request:
Traceback (most recent call last):
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 75, in __call__
raise exc
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 64, in __call__
await self.app(scope, receive, sender)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/starlette/routing.py", line 680, in __call__
await route.handle(scope, receive, send)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/starlette/routing.py", line 275, in handle
await self.app(scope, receive, send)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/starlette/routing.py", line 65, in app
response = await func(request)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/utilities/server.py", line 101, in handle_response_scoped_depends
response = await default_handler(request)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/fastapi/routing.py", line 231, in app
raw_response = await run_endpoint_function(
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/fastapi/routing.py", line 160, in run_endpoint_function
return await dependant.call(**values)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/api/task_runs.py", line 198, in set_task_run_state
orchestration_result = await models.task_runs.set_task_run_state(
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/models/task_runs.py", line 285, in set_task_run_state
run = await models.task_runs.read_task_run(session=session, task_run_id=task_run_id)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/database/dependencies.py", line 117, in async_wrapper
return await fn(*args, **kwargs)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/models/task_runs.py", line 103, in read_task_run
model = await session.get(db.TaskRun, task_run_id)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/ext/asyncio/session.py", line 299, in get
return await greenlet_spawn(
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 126, in greenlet_spawn
result = context.throw(*sys.exc_info())
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2805, in get
return self._get_impl(
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2912, in _get_impl
return db_load_fn(
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 530, in load_on_pk_identity
session.execute(
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1711, in execute
conn = self._connection_for_bind(bind)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
return self._transaction._connection_for_bind(
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 747, in _connection_for_bind
conn = bind.connect()
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/future/engine.py", line 406, in connect
return super(Engine, self).connect()
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3315, in connect
return self._connection_cls(self, close_with_result=close_with_result)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 96, in __init__
else engine.raw_connection()
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3394, in raw_connection
return self._wrap_pool_connect(self.pool.connect, _connection)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3361, in _wrap_pool_connect
return fn()
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 320, in connect
return _ConnectionFairy._checkout(self)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 884, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 486, in checkout
rec = pool._do_get()
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 134, in _do_get
raise exc.TimeoutError(
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
10:39:18.023 | ESC[1;31mERRORESC[1;0m | Flow run 'simple-hummingbird' - Crash detected! Execution was interrupted by an unexpected exception.
or with sqlite (there are many more logs but they seem to be the futures not being properly closed before the garbage collector comes for them at process exit. which is also not a good sign.)
09:19:19.514 | ESC[1;31mERRORESC[1;0m | prefect.orion - Encountered exception in request:
Traceback (most recent call last):
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 100, in execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 228, in _handle_exception
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 82, in execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 68, in await_only
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 121, in greenlet_spawn
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/aiosqlite/cursor.py", line 37, in execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/aiosqlite/cursor.py", line 31, in _execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/aiosqlite/core.py", line 129, in _execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/aiosqlite/core.py", line 102, in run
sqlite3.OperationalError: unable to open database file
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 75, in __call__
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 64, in __call__
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/starlette/routing.py", line 680, in __call__
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/starlette/routing.py", line 275, in handle
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/starlette/routing.py", line 65, in app
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/utilities/server.py", line 101, in handle_response_scoped_depends
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/fastapi/routing.py", line 231, in app
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/fastapi/routing.py", line 160, in run_endpoint_function
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/api/task_runs.py", line 49, in create_task_run
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/database/dependencies.py", line 117, in async_wrapper
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/models/task_runs.py", line 61, in create_task_run
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/ext/asyncio/session.py", line 215, in execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 126, in greenlet_spawn
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1711, in execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 747, in _connection_for_bind
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/future/engine.py", line 406, in connect
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3315, in connect
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 120, in __init__
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/event/attr.py", line 334, in __call__
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/database/configurations.py", line 262, in setup_sqlite
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/future/engine.py", line 280, in execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_20
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 333, in _execute_on_connection
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 100, in execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 228, in _handle_exception
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 82, in execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 68, in await_only
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 121, in greenlet_spawn
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/aiosqlite/cursor.py", line 37, in execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/aiosqlite/cursor.py", line 31, in _execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/aiosqlite/core.py", line 129, in _execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/aiosqlite/core.py", line 102, in run
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file
[SQL: PRAGMA journal_mode = WAL;]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
09:19:19.582 | ESC[1;31mERRORESC[1;0m | sqlalchemy.pool.impl.NullPool - Exception closing connection <AdaptedConnection <Connection(Thread-82, started daemon 140182263150336)>>
Traceback (most recent call last):
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/database/interface.py", line 102, in session_context
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/api/task_runs.py", line 49, in create_task_run
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/database/dependencies.py", line 117, in async_wrapper
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/prefect/orion/models/task_runs.py", line 61, in create_task_run
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/ext/asyncio/session.py", line 215, in execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 126, in greenlet_spawn
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1712, in execute
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_20
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 333, in _execute_on_connection
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2138, in _handle_dbapi_exception
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 744, in invalidate
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 1087, in invalidate
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 597, in invalidate
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 660, in __close
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 255, in _close_connection
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 689, in do_terminate
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 692, in do_close
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 214, in close
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 68, in await_only
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 121, in greenlet_spawn
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/aiosqlite/core.py", line 171, in close
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/aiosqlite/core.py", line 129, in _execute
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 214, in close
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 68, in await_only
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 121, in greenlet_spawn
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/aiosqlite/core.py", line 171, in close
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/aiosqlite/core.py", line 67, in _conn
ValueError: no active connection
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 255, in _close_connection
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 689, in do_terminate
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 692, in do_close
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 216, in close
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 223, in _handle_exception
File "/root/.conda/envs/plotting-v2/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
sqlite3.OperationalError: no active connection
09:19:19.681 | ESC[1;31mERRORESC[1;0m | sqlalchemy.pool.impl.NullPool - Exception closing connection <AdaptedConnection <Connection(Thread-337, started daemon 140163489453824)>>
This is all with prefect 2.5.0
from pip
.