View in #prefect-server on Slack
Problem
@Alex_F: Hi All , I am having issue with prefect server. Then server is starting on server ip and config.toml is set to apollo_url=“http://10.1.x.x:4200/graphql” everything starts normally and no errors during start , the ui shows green connection to API, however 1-2 minutes later it turns yellow and stops working , i turn debug=true but I am not able to find logs are written or how to troubleshoot this problem.
Any help would be appreciated (this is local install on ubuntu)
@Kevin_Kho: How did you start your server? prefect server start or helm?
@Alex_F: like this - > prefect server start --detach
@Kevin_Kho: Maybe you need the expose flag? prefect server start --expose
@Alex_F: I tried , no log entries coming up from graphql or port 4200 , all the log entries are ui_1 or
towel_1, trying wget on port 4200 gets error 200
I did see this in startup
graphql_1 | Could not upgrade the database!
graphql_1 | Error: HTTPConnectionPool(host=‘hasura’, port=3000): Max retries exceeded with url: /v1/query (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7ff7e3b873d0>: Failed to establish a new connection: [Errno 111] Connection refused’))
@Kevin_Kho: Are your ports open for access on the server?
@Alex_F: yes , as far as i know
systemd-r 415 systemd-resolve 13u IPv4 19227 0t0 TCP 127.0.0.53:53 (LISTEN)
sshd 771 root 3u IPv4 24314 0t0 TCP *:22 (LISTEN)
sshd 771 root 4u IPv6 24325 0t0 TCP *:22 (LISTEN)
container 17698 root 12u IPv4 114583 0t0 TCP 127.0.0.1:37907 (LISTEN)
docker-pr 1294360 root 4u IPv4 4606111 0t0 TCP 127.0.0.1:5432 (LISTEN)
docker-pr 1294817 root 4u IPv4 4609192 0t0 TCP *:4200 (LISTEN)
docker-pr 1295053 root 4u IPv4 4608779 0t0 TCP *:8080 (LISTEN)
docker-pr 1295269 root 4u IPv4 4611242 0t0 TCP 127.0.0.1:3000 (LISTEN)
docker-pr 1295281 root 4u IPv4 4611253 0t0 TCP 127.0.0.1:4201 (LISTEN)
@Kevin_Kho: Ohh this isn’t even about remote connection. Looks like hasura is not even spinning up properly? I should say the logs are noisy though so you can test better by trying to go to port 4200 to the graphql playground. Does that load for you?
@Alex_F: no 4200 does not load at all I see bunch of errors on hasura start
ui_1 | 2022/03/09 17:29:46 [notice] 13#13: start worker process 15
apollo_1 | Checking GraphQL service at http://graphql:4201/health …
towel_1 | {“severity”: “ERROR”, “name”: “prefect-server.Scheduler”, “message”: “Unexpected error: ConnectError(gaierror(-3, ‘Temporary failure in name resolution’))”, “exc_info”: “Traceback (most recent call last):\n File "/prefect-server/src/prefect_server/services/loop_service.py", line 60, in run\n await self.run_once()\n File "/prefect-server/src/prefect_server/services/towel/scheduler.py", line 47, in run_once\n offset=500 * iterations,\n File "/prefect-server/src/prefect_server/database/orm.py", line 501, in get\n as_box=not apply_schema,\n File "/prefect-server/src/prefect_server/database/hasura.py", line 85, in execute\n as_box=as_box,\n File "/prefect-server/src/prefect_server/utilities/graphql.py", line 64, in execute\n timeout=30,\n File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1385, in post\n timeout=timeout,\n File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1148, in request\n request, auth=auth, allow_redirects=allow_redirects, timeout=timeout,\n File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1169, in send\n request, auth=auth, timeout=timeout, allow_redirects=allow_redirects,\n File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1196, in send_handling_redirects\n request, auth=auth, timeout=timeout, history=history\n File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1232, in send_handling_auth\n response = await self.send_single_request(request, timeout)\n File "/usr/local/lib/python3.7/site-packages/httpx/_client.py", line 1269, in send_single_request\n timeout=timeout.as_dict(),\n File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection_pool.py", line 153, in request\n method, url, headers=headers, stream=stream, timeout=timeout\n File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py", line 65, in request\n self.socket = await self._open_socket(timeout)\n File "/usr/local/lib/python3.7/site-packages/httpcore/_async/connection.py", line 86, in _open_socket\n hostname, port, ssl_context, timeout\n File "/usr/local/lib/python3.7/site-packages/httpcore/_backends/auto.py", line 38, in open_tcp_stream\n return await self.backend.open_tcp_stream(hostname, port, ssl_context, timeout)\n File "/usr/local/lib/python3.7/site-packages/httpcore/_backends/asyncio.py", line 234, in open_tcp_stream\n stream_reader=stream_reader, stream_writer=stream_writer\n File "/usr/local/lib/python3.7/contextlib.py", line 130, in exit\n self.gen.throw(type, value, traceback)\n File "/usr/local/lib/python3.7/site-packages/httpcore/_exceptions.py", line 12, in map_exceptions\n raise to_exc(exc) from None\nhttpcore._exceptions.ConnectError: [Errno -3] Temporary failure in name resolution”}
after server restart I see it’s up and running but 1-2 minutes late it’s crashed with no errors
ui_1 | 192.168.192.7 - - [09/Mar/2022:17:46:08 +0000] “HEAD / HTTP/1.1” 200 0 “-” “curl/7.74.0” “-”
graphql_1 | INFO: 192.168.192.4:52928 - “GET /health HTTP/1.1” 200 OK
graphql_1 | INFO: 192.168.192.4:52930 - “GET /health HTTP/1.1” 200 OK
ui_1 | 192.168.192.7 - - [09/Mar/2022:17:46:38 +0000] “HEAD / HTTP/1.1” 200 0 “-” “curl/7.74.0” “-”
graphql_1 | INFO: 192.168.192.4:52932 - “GET /health HTTP/1.1” 200 OK
ui_1 | 192.168.192.7 - - [09/Mar/2022:17:47:09 +0000] “HEAD / HTTP/1.1” 200 0 “-” “curl/7.74.0” “-”
graphql_1 | INFO: 192.168.192.4:52934 - “GET /health HTTP/1.1” 200 OK
@Kevin_Kho: That is very weird behavior. What are the specs of the underlying VM?
@Alex_F: it’s AWS Ubuntu instance… if you tell me command you want me to run i can give you more details , I see docker instances are running 8GB.
@Kevin_Kho: is the container not even around to get logs from?
@Alex_F: container is up and running but on 192.168 subnet , but for some strange reason the doker is not mapping to 10.1 (local subnet) it’s like network between docker and local network is not working properly
56ecc673097a prefecthq/ui:core-1.0.0 “/docker-entrypoint.…” 11 minutes ago Up 11 minutes (healthy) 80/tcp, 0.0.0.0:8080->8080/tcp tmp_ui_1
516b57f11ba7 prefecthq/server:core-1.0.0 “tini -g – python s…” 11 minutes ago Up 11 minutes tmp_towel_1
3e11f00441f7 prefecthq/apollo:core-1.0.0 “tini -g – bash -c …” 11 minutes ago Up 11 minutes (healthy) 0.0.0.0:4200->4200/tcp tmp_apollo_1
0a7b1510a618 prefecthq/server:core-1.0.0 “tini -g – bash -c …” 11 minutes ago Up 11 minutes (healthy) 127.0.0.1:4201->4201/tcp tmp_graphql_1
1f7b76dc022f hasura/graphql-engine:v2.1.1 “graphql-engine serve” 11 minutes ago Up 11 minutes (healthy) 127.0.0.1:3000->3000/tcp tmp_hasura_1
41b8009f4001 postgres:11 “docker-entrypoint.s…” 11 minutes ago Up 11 minutes (healthy) 127.0.0.1:5432->5432/tcp tmp_postgres_1
ubuntu@i-0fbe8a4a692337a4b:~$
now i see all the logs but on docker subnets
ui_1 | 192.168.208.7 - - [09/Mar/2022:17:58:10 +0000] “HEAD / HTTP/1.1” 200 0 “-” “curl/7.74.0” “-”
graphql_1 | INFO: 192.168.208.4:43936 - “GET /health HTTP/1.1” 200 OK
graphql_1 | INFO: 192.168.208.4:43938 - “GET /health HTTP/1.1” 200 OK
ui_1 | 192.168.208.7 - - [09/Mar/2022:17:58:40 +0000] “HEAD / HTTP/1.1” 200 0 “-” “curl/7.74.0” “-”
graphql_1 | INFO: 192.168.208.4:43940 - “GET /health HTTP/1.1” 200 OK
ui_1 | 192.168.208.7 - - [09/Mar/2022:17:59:11 +0000] “HEAD / HTTP/1.1” 200 0 “-” “curl/7.74.0” “-”
graphql_1 | INFO: 192.168.208.4:43942 - “GET /health HTTP/1.1” 200 OK
towel_1 | {“severity”: “DEBUG”, “name”: “prefect-server.ZombieKiller”, “message”: “Heartbeat from ZombieKiller: next run at 2022-03-09T18:01:39+00:00”}
ui_1 | 192.168.208.7 - - [09/Mar/2022:17:59:41 +0000] “HEAD / HTTP/1.1” 200 0 “-” “curl/7.74.0” “-”
graphql_1 | INFO: 192.168.208.4:43944 - “GET /health HTTP/1.1” 200 OK
Kevin , I want to say thank you , I got my network guy review AWS security setup and it looks like it started working , Thank you for your help to point me in the right direction
@Kevin_Kho: What!? I just got off a call. What did they do? Would like to know lol
Solution
@Alex_F: lol all they did is open VM to be able to communicate on VM IP:4200 , only 8080 was open as i was configuring UI on 10.1.x.x:8080 IP and API needs to communicate on same IP 10.1.x.x:4200 in order for it to work
@Anna_Geller: that’s true! thanks for updating us on that and good job!