Use AKS Workload Identity

In Azure Kubernetes Service (AKS), Workload Identity is a feature that enables you to associate specific pods with specific access rights. This is particularly beneficial when multiple teams are using Prefect for their jobs on the same AKS cluster. With Workload Identity, you can achieve this multi-team setup while ensuring that each team’s Prefect jobs are isolated from accessing resources belonging to other teams.

In simpler terms, Workload Identity allows you to create a secure boundary between teams using Prefect on AKS. Each team’s jobs can only access the resources they are supposed to, without having permissions to interfere with or access resources owned by other teams. This ensures better security and isolation for your workloads in a shared AKS environment.

A quick setup guide

Azure has a number of different guides, but the best one I have found is the one on their github: Introduction - Azure AD Workload Identity

Before we start, I will assume you have the following:

  • An Azure AKS cluster, and you have a basic familiarity with kubernetes.
  • Enabled Azure RBAC on most of your Azure resources, and you are comfortable with managed identities and granting access.
  • A Key Vault, and a user-assigned managed identity that has access to the key vault. Having the managed identity be user assigned is key.

I am using a Key Vault here as an example, but this of course works with any other type of resource that a managed Identity can access.

I won’t repeat the entire Azure walkthrough here, but the general idea is that you federate the Azure concept of a Managed Identity and the kubernetes concept of a Service Account. The basic steps are as follows:

  1. OpenID Connect (OIDC) must be enabled. This is a simple command:
az aks update -g <my-resource-group> -n <my-AKS-cluster> --enable-oidc-issuer

While you are at it, grab the OIDC service account issuer, which you will need later:

az aks show --resource-group <my-resource-group> --name <my-AKS-cluster> --query "oidcIssuerProfile.issuerUrl" -otsv
  1. A Mutating Admissions Webhook must be installed. This is a very simple helm chart, all you need to do is to bring your azure Tenant ID:

    helm repo add azure-workload-identity https://azure.github.io/azure-workload-identity/charts
    helm repo update 
    helm install workload-identity-webhook azure-workload-identity/workload-identity-webhook \
      --namespace azure-workload-identity-system \
      --create-namespace \
      --set azureTenantID="<my-tenant-id>"
    
  2. Create and deploy a Kubernetes service account

apiVersion: v1
kind: ServiceAccount
metadata:
	annotations:
		azure.workload.identity/client-id: <managed-identity-client-id>
name: <a-name-for-my-service-account>
namespace: <my-namespace>
  1. Create a federated credential in Azure. You need the OIDC service account issuer from step 1:
    az identity federated-credential create \
    --name "<kubernetes-federated-credential-name>" \
    --identity-name <user-assigned-identity-name> \
    --resource-group <resource-group-of-the-managed-identity> \
    --issuer <oidc-service-account-issuer> \
    --subject "system:serviceaccount:<namespace>:<service-account-name>
    

That’s it as far as AKS configuration goes. In order to use it in Prefect (or elsewhere) we need to make sure our jobs have two things: A Service Account name, and a label that tells Azure to use workload identity. And in your python code, you need to make sure you use ManagedIdentityCredential instead of DefaultAzureCredential. ManagedIdentityCredential takes an argument, client_id, which you already used in step 3.

Using Workload Identity in Prefect

Ideally, you could define a work pool that ensured that both the label and the service account name got added to any prefect run, but from my trials and errors I have had trouble getting the label to attach to pods an not just the job. I am certain someone with a little more experience customizing work pools will be able to do this properly.

But what worked was to define a job specification with a customization:

from prefect.infrastructure import KubernetesJob

  
namespace = "<my-teams-namespace>"
image_name = "<my-sweet-image>"

patch = [
	{
		"op": "add",
		"path": "/spec/template/metadata",
		"value": {
			"labels": {
				"azure.workload.identity/use": "true"
			}
		}
	}
]

k8s_job = KubernetesJob(
	namespace=namespace,
	image=image_name,
	customizations=patch,
	finished_job_ttl=3600,
	job_watch_timeout_seconds=600,
	pod_watch_timeout_seconds=600,
	service_account_name=<my-service-account-name>,
	labels={"azure.workload.identity/use":"true"},
)

k8s_job.save("<my-image-name>", overwrite=True)

Deploying a simple flow that accesses the key vault now works:

from azure.identity import ManagedIdentityCredential
from azure.keyvault.secrets import SecretClient

  
@flow(name="Workload Identity Secret")
def wid_secret():

	secret_name = 'test-secret'
	kv_url = "<my-keyvault>"
	
	credential = ManagedIdentityCredential(client_id='<managed-identity-client-id>')
	client = SecretClient(vault_url=kv_url, credential=credential)
	secret_value = client.get_secret(secret_name).value

	print(f"Connected to Azure Key Vault and retrieved secret '{secret_name}' with value '{secret_value}'")

Hopefully, this should print your secret to the log (and hopefully you aren’t using something is actually secret to test it).

Scaling this out, you will probably create several managed identities with different permissions, and connect each of them to a kubernetes Service Account via an azure federated identity. This means repeating step 3 and 4 for each managed identity you create and want to connect to a workload identity.

A few notes on permissions

In Kubernetes, service accounts are allowed to run in any namespace by default. When defining service accounts in the job specification, be aware that users with permissions to create Prefect blocks also determines the access level for the deployments.

To fix this, you can restrict a service account to a specific namespace. This approach lets teams have separate namespaces with dedicated agents, and freely create blocks within their own namespace. Because agents aren’t able to start jobs in other namespaces, this ensures that teams can only use their own dedicated managed identity.

1 Like

sweet! thank you @radbrt for the post

adding the marvin tag so the slackbot can use this as context