One can set a non-working flow run name through Prefect Cloud UI

Hello all!

We just found a failed flow run in UI without any error log. So we dug into our infrastructure and found the error in the agents’ log (x-ed for anonymization):

"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_": a valid label must be an empty string or consist of alphanumeric characters, ‘-’, ‘’ or ‘.’, and must start and end with an alphanumeric character (e.g. ‘MyValue’, or ‘my_value’, or ‘12345’, regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9.]*)?[A-Za-z0-9])?’)",“field”:“metadata.labels”}]},“code”:422}

History of error:

  1. We create a flow run through UI (custom) and give it a name. The name is “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_” or “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_abc” (abc will be shortened away).
  2. The flow fails.

Desired behavior
There is proper input checking on the frontend side for the name field.
→ Operators are not able to choose a name which results in an error.
OR
There is a message in the Flow Run Log.
→ Operator can try to execute the flow again with another name.

Thank you

1 Like

It looks like this is a kubernetes error, rather than one coming from Prefect

I assume that when the agent deploys your flow, it attaches the flow run name as one of the labels on your Kubernetes job

it might be that to solve that problem, you need to examine which flow run name failed and whether it indeed contains characters that are not allowed by Kubernetes

another option would be to run the same flow on a local Process block with the same flow run name and confirm that there it worked

Thank you for your feedback, Anna!

I just re-checked it and you are right: The error is a result of executing the flow via Kubernetes. Submitting the flow fails, see log output below.

Nevertheless: Wouldn’t it be beneficial to have more information from the agents’ logs for a flow run in the Prefect UI? Perhaps only in the event of a failure? That will help our colleagues to find the reason for errors and they will be able to fix them themselves in the best case. What do you think, is it worth opening a feature request?

Example: In our case, the operator wants to run a task: He defines some parameters and gives the run a speaking name, and lets it run. The only feedback he got: Flow failed, no error message.
The next step for him is to write the flow engineer, the flow engineer does not know either, so he calls infra to consult the agents’ logs regarding the flow run. Here they find the error message and the operator can fix the name. Having the error message in the Prefect UI, the operator would be able to fix it himself.

Have a nice day!

08:24:59.349 | INFO    | prefect.agent - Submitting flow run '5c134e62-f096-4ae7-b6c7-c78614452706'
08:24:59.840 | ERROR   | prefect.agent - Failed to submit flow run '5c134e62-f096-4ae7-b6c7-c78614452706' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/agent.py", line 259, in _submit_run_and_capture_errors
    result = await infrastructure.run(task_status=task_status)
  File "/usr/local/lib/python3.8/site-packages/prefect/infrastructure/kubernetes.py", line 276, in run
    job_name = await run_sync_in_worker_thread(self._create_job, manifest)
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 68, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(call, cancellable=True)
  File "/usr/local/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.8/site-packages/prefect/infrastructure/kubernetes.py", line 505, in _create_job
    job = batch_client.create_namespaced_job(self.namespace, job_manifest)
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api/batch_v1_api.py", line 210, in create_namespaced_job
    return self.create_namespaced_job_with_http_info(namespace, body, **kwargs)  # noqa: E501
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api/batch_v1_api.py", line 309, in create_namespaced_job_with_http_info
    return self.api_client.call_api(
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 391, in request
    return self.rest_client.POST(url,
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 276, in POST
    return self.request("POST", url,
  File "/usr/local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 235, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (422)
Reason: Unprocessable Entity
HTTP response headers: HTTPHeaderDict({'Audit-Id': '62c167b2-c4ac-4968-af15-92434696bc1b', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'b51d1d39-7475-49d3-b2e5-1de1a29c3c61', 'X-Kubernetes-Pf-Prioritylevel-Uid': '79331a16-8739-42bd-9620-ae1e51712722', 'Date': 'Wed, 09 Nov 2022 08:24:59 GMT', 'Content-Length': '1096'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Job.batch \"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-4zgb5\" is invalid: metadata.labels: Invalid value: \"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')","reason":"Invalid","details":{"name":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-4zgb5","group":"batch","kind":"Job","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: \"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')","field":"metadata.labels"}]},"code":422}


08:24:59.846 | INFO    | prefect.agent - Completed submission of flow run '5c134e62-f096-4ae7-b6c7-c78614452706'     
1 Like

nobody has ever complained about having more information :smile:

it’s already on our radar - for now, we added a feature allowing work queue health tracking, and in the future, we’ll release a feature providing more infrastructure-level information