Deployment fails trying to get namespace

Hello, I have a deployment that runs on an AKS cluster. Flow runs are failing with the following error:

{},“status”:“Failure”,“message”:“namespaces "kube-system" is forbidden: User "system:serviceaccount:prefect:default" cannot get resource "namespaces" in API group "" in the namespace "kube-system"”,“reason”:“Forbidden”,“details”:{“name”:“kube-system”,“kind”:“namespaces”},“code”:403}

I’ve followed the documentation and examples I’ve found for configuring permissions for the Kubernetes service account but I don’t understand why it needs permission to get namespaces in the kube-system namespace. Can anyone shed some light on why its trying to do this or what additional permissions the service account needs? I’m hesitant to give it get permission on namespaces in the kube-system namespace so any insight or guidance is greatly appreciated!

You can see which minimal permissions are needed for an agent by running: prefect kubernetes manifest agent

you provide the namespace explicitly and the Prefect agent operates only within that namespace and the permission boundary set by your service account, role and role binding

That’s what I would have expected. Which is why this error confuses me. I deployed the agent, the roles and role bindings exactly as detailed in the manifest generated by that command, explicitly defining what namespace to use. The error above seems to indicate the job is trying to use the service account to query [all? some? specific?] namespaces but of course it is scoped to only the namespace it runs within so it can’t. Why is it trying to do that?

I’m sorry, but I don’t know what you are referring to - could you share more details e.g. part of the code where you see this?

do you run this on Cloud or self-hosted Orion?

Sure. We’re running Prefect Cloud, agent is 2.7.8-python3.9

I created the deployment via the CLI just as you suggested in another post and everything looked good. I manually set up my own kubernetes-job infrastructure block by following guidance from this article. When I kick off a quick run of the deployment, it downloads the flow, installs my python packages but errors out. I see the error when I check the agent container logs as well as in the UI under flow runs on the right side under “State Message”.

Here’s the full error:

Submission failed.
kubernetes.client.exceptions.ApiException: (403) Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'cde075a5-295d-4ae6-91d3-ac77bb1b54a7', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'e1dae13e-5384-4153-ab9d-600428fb2323', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'af721c74-081b-44a4-aafd-cbe63a8a4ff6', 'Date': 'Fri, 13 Jan 2023 18:02:30 GMT', 'Content-Length': '340'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"namespaces \"kube-system\" is forbidden: User \"system:serviceaccount:prefect:default\" cannot get resource \"namespaces\" in API group \"\" in the namespace \"kube-system\"","reason":"Forbidden","details":{"name":"kube-system","kind":"namespaces"},"code":403} 

The only thing the Flow Run logs show is:

Downloading flow code from storage at 'flows'

The container log show a little more, particularly in kubernetes.py line 397 in _get_cluser_uid :

18:02:30.398 | INFO    | prefect.agent - Submitting flow run 'edbbab26-07ab-4719-b008-b4b7c3995f9e'
18:02:30.903 | ERROR   | prefect.agent - Failed to submit flow run 'edbbab26-07ab-4719-b008-b4b7c3995f9e' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/prefect/agent.py", line 424, in _submit_run_and_capture_errors
    result = await infrastructure.run(task_status=task_status)
  File "/usr/local/lib/python3.9/site-packages/prefect/infrastructure/kubernetes.py", line 279, in run
    pid = await run_sync_in_worker_thread(self._get_infrastructure_pid, job)
  File "/usr/local/lib/python3.9/site-packages/prefect/utilities/asyncutils.py", line 91, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.9/site-packages/prefect/infrastructure/kubernetes.py", line 361, in _get_infrastructure_pid
    cluster_uid = self._get_cluster_uid()
  File "/usr/local/lib/python3.9/site-packages/prefect/infrastructure/kubernetes.py", line 397, in _get_cluster_uid
    namespace = client.read_namespace("kube-system")
  File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 22476, in read_namespace
    return self.read_namespace_with_http_info(name, **kwargs)  # noqa: E501
  File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 22555, in read_namespace_with_http_info
    return self.api_client.call_api(
  File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
  File "/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py", line 241, in GET
    return self.request("GET", url,
  File "/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py", line 235, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'cde075a5-295d-4ae6-91d3-ac77bb1b54a7', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'e1dae13e-5384-4153-ab9d-600428fb2323', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'af721c74-081b-44a4-aafd-cbe63a8a4ff6', 'Date': 'Fri, 13 Jan 2023 18:02:30 GMT', 'Content-Length': '340'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"namespaces \"kube-system\" is forbidden: User \"system:serviceaccount:prefect:default\" cannot get resource \"namespaces\" in API group \"\" in the namespace \"kube-system\"","reason":"Forbidden","details":{"name":"kube-system","kind":"namespaces"},"code":403}

18:02:30.908 | INFO    | prefect.agent - Completed submission of flow run 'edbbab26-07ab-4719-b008-b4b7c3995f9e'

my guess is that we query namespaces for some sanity check to e.g. ensure we don’t try to deploy a flow run as a K8s job to a namespace that doesn’t exist and we want to give more helpful error messages

I asked the team to validate, thanks for adding more info

I got a very thorough answer from our internal Kubernetes expert:

The kube-system namespace is the only one guaranteed to exist (it cannot be deleted), so we use the UID of the kube-system namespace object itself as a form of cluster identity. We just read the namespace object itself:

We don’t need any other permissions than the namespace object.

It’s possible to override this behavior and provide a unique ID manually with an environment variable, PREFECT_KUBERNETES_CLUSTER_UID — this is admittedly a little hacky but a lot of people are using this trick. Joe Beda is one of the founders of Kubernetes (along with Craig Mcluckie and Brendan Burns). Here’s one of his tweets about it:

Another thread between a bunch of the kube folks (Tim did a lot of work on Kubernetes networking):

Thanks for in depth response! I tried using the environment variable but it didn’t work, not sure why. So I went ahead and granted the service account the get namespace permission in the kube-system namespace and it was able to complete the job successfully. When I have more time I might explore why the environment variable didn’t work but for now I have more pressing tasks ahead.

On a side note though, I wonder if maybe it might rather work to query the namespace configured in the infrastructure block to get the uid, or the job manifest if specified. And if no namespace is configured, use the default namespace since that’s what it defaults to if none is given. Not knowing how the uid is used or nuances of how everything works under the covers, its just a suggestion. Giving the service account in my namespace a permission in the kube-system namespace isn’t my first choice (though it seems necessary at this point) and as you said, the environment variable seems a bit hacky.

Whether you’re able to use my suggestion or not though, I really appreciate all of your help!

1 Like

thanks for the suggestion - I won’t be able to top the advice from those experts so I shared all I know about it. keep us posted when you try the env var

I used the code from the the MR to add the role bindings that allow the agent to read the UID from the kube-system namespace. However, I still encountered the error OP was getting and upon further debugging, I realized that it was because the agent was still using the default service account to access the kube-system namespace, despite my having defined a custom service account in the Kubernetes deployment and specifying it in the Kubernetes Job block in the Prefect cloud. I was able to fix the issue by changing the ClusterRoleBinding service account from the custom to default. However, it would be beneficial to have the Kubernetes team review this and make it possible to use custom service accounts.

The kubernetes deployment for reference:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prefect-agent
  namespace: data-eng
  labels:
    app: prefect-agent
spec:
  selector:
    matchLabels:
      app: prefect-agent
  replicas: 1
  template:
    metadata:
      labels:
        app: prefect-agent
    spec:
      containers:
      - name: agent
        image: prefecthq/prefect:2.7.8-python3.9
        command: ["prefect", "agent", "start", "-q", "k8s-us-west-2-prod"]
        imagePullPolicy: "IfNotPresent"
        env:
          - name: PREFECT_API_URL
            value: https://api.prefect.cloud/api/accounts/Y/workspaces/X        
          - name: PREFECT_API_KEY
            valueFrom:
              secretKeyRef:
                name: prefect-cloud-api-token
                key: prefect-cloud-api-token-value
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: data-eng
  name: prefect-agent
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log", "pods/status"]
  verbs: ["get", "watch", "list"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: [ "get", "list", "watch", "create", "update", "patch", "delete" ]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: prefect-agent-role-binding
  namespace: data-eng
subjects:
- kind: ServiceAccount
  name: prefect-agent # Must be the name of the ServiceAccount.
  namespace: data-eng
roleRef:
  kind: Role
  name: prefect-agent
  apiGroup: rbac.authorization.k8s.io

---
# The default manifest generated by `prefect kubernetes manifest agent` uses the default sa in the namespace.
# We will explicitly create a sa and attach the annotation required.

# Add `eks.amazonaws.com/role-arn` annotation to the service account to inject credentials into the pod.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prefect-agent 
  namespace: data-eng
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::Z:role/k8s-staging-irsa-phq-prefect

---
# The following is to allow the service account to list and read the uid of the kube-system namespace
# https://discourse.prefect.io/t/deployment-fails-trying-to-get-namespace/2199/6
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prefect-agent
rules:
  - apiGroups: [""]
    resources: ["namespaces"]
    verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prefect-agent-cluster-role-binding
subjects:
  - kind: ServiceAccount
    name: default # Using default instead of prefect-agent here becuase it doesn't work otherwise
    namespace: data-eng
roleRef:
  kind: ClusterRole
  name: prefect-agent
  apiGroup: rbac.authorization.k8s.io

You can see I have defined prefect-agent as serviceAccount in the PrefectCloud but still was facing the error.
Screen Shot 2023-01-26 at 11.32.39 AM

The error for reference:

Submission failed. kubernetes.client.exceptions.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Audit-Id': '3f0e7b59-771e-48d9-879a-e489d5dbfb73', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'b3eb2c14-2628-49eb-8784-a06280ad964a', 'X-Kubernetes-Pf-Prioritylevel-Uid': '61cf3235-f37a-4ec1-9178-2dc1ec34a568', 'Date': 'Wed, 25 Jan 2023 05:53:01 GMT', 'Content-Length': '341'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"namespaces \"kube-system\" is forbidden: User \"system:serviceaccount:data-eng:default\" cannot get resource \"namespaces\" in API group \"\" in the namespace \"kube-system\"","reason":"Forbidden","details":{"name":"kube-system","kind":"namespaces"},"code":403}