View in #prefect-server on Slack
Problem Description
@Alexander_Melkoff: Hello! I’ve just deployed Prefect Server into Kubernetes on AWS using the official helm chart and I’m trying to make my first hello-world flow to work. I am using GitLab storage and I’m wondering what is the correct (and secure) way to pass GITLAB_ACCESS_TOKEN
?
@Anna_Geller: Currently Gitlab storage expects the name of PrefectSecret, so you would need to set the local secret on your agent:
prefect agent xxx start --env PREFECT__CONTEXT__SECRETS__GITLAB_ACCESS_TOKEN="your_token"
for a KubernetesAgent
, you can set it using an environment variable:
env:
- name: PREFECT__CONTEXT__SECRETS__GITLAB_ACCESS_TOKEN
value: 'your_token'
and then reference it in your storage:
from prefect import Flow
from prefect.storage import GitLab
flow = Flow(
"gitlab-flow",
GitLab(
repo="org/repo", # name of repo
path="flows/my_flow.py", # location of flow file in repo
access_token_secret="GITLAB_ACCESS_TOKEN" # name of personal access token secret
)
)
@Alexander_Melkoff: For some reason I can’t make that work… I’ve patched Prefect agent with new env value with
kubectl -n prefect set env deployment.apps/prefect-agent PREFECT__CONTEXT__SECRETS__GITLAB_ACCESS_TOKEN="XYZ"
checked that agent pod was restarted has new env in the list with other envs
Environment:
PREFECT__CLOUD__API: <http://prefect-apollo.prefect:4200/graphql>
NAMESPACE: prefect
IMAGE_PULL_SECRETS: []
PREFECT__CLOUD__AGENT__LABELS: []
JOB_MEM_REQUEST:
JOB_MEM_LIMIT:
JOB_CPU_REQUEST:
JOB_CPU_LIMIT:
IMAGE_PULL_POLICY:
SERVICE_ACCOUNT_NAME: prefect-serviceaccount
PREFECT__BACKEND: server
PREFECT__CLOUD__AGENT__AGENT_ADDRESS: <http://0.0.0.0:8080>
PREFECT__CONTEXT__SECRETS__GITLAB_ACCESS_TOKEN: XYZ
but flow run still fails with this
Failed to load and execute Flow's environment: ValueError('Local Secret "GITLAB_ACCESS_TOKEN" was not found.')
@Anna_Geller: I think you may have to set it when you start the agent (or before that). For example:
@Alexander_Melkoff: Oh, I see. I rely on the helm chart to start the agent so I can’t edit the deployment in that manner without forking the entire helm chart. Any other workarounds? Will custom job template work?
@Anna_Geller: I asked the team and you should also be able to set it in the job template for your flow runs. You can set Kubernetes Secret from an environment variable for this as described here.
@Sam_Werbalowsky: I have a configmap with secrets in it, and then reference that configmap in the job_template
in the UI, you can edit the job_template directly so you can at least test if it works
@Alexander_Melkoff: @Anna_Geller, @Sam_Werbalowsky thank you for your advice!
Well, it took me a while to figure out. I’ll share my solution here just in case anyone is interested. In addition to GitLab access token I wanted to also store AWS secrets to provide access to datasets stored in s3 and custom docker images stored in ECR. I’ve created a user account for Prefect with all the necessary permissions, and I use AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
values created for that account.
Kubernetes Secret
We use Terraform to manage Kubernetes IaC, so it is used to create the secret
# <http://prefect.tf|prefect.tf>
resource "kubernetes_secret" "prefect-secrets" {
metadata {
name = "prefect-secrets"
namespace = kubernetes_namespace.prefect.metadata.0.name
}
data = {
"gitlab-access-token" = local.secrets.gitlab_access_token
"prefect-aws-access-key-id" = local.secrets.prefect_aws_access_key_id
"prefect-aws-secret-access-key" = local.secrets.prefect_aws_secret_access_key
}
}
Instead of storing values in the code, we use references to AWS Secrets manager.
AWS Secrets Manager
Since it is a bad idea to store secret values in the code, we use AWS Secrets Manager to store values. Terraform retrieves them when configuration is applied to the K8s cluster and creates or updates Kubernetes secret
# <http://secrets.tf|secrets.tf>
data "aws_secretsmanager_secret_version" "secrets" {
secret_id = "k8-secrets"
}
locals {
secrets = jsondecode(data.aws_secretsmanager_secret_version.secrets.secret_string)
}
Custom Job Template
Now we add references to Kubernetes secret in custom job template
# prefect-job-template.yaml
apiVersion: batch/v1
kind: Job
spec:
template:
spec:
containers:
- name: flow
env:
- name: PREFECT__CONTEXT__SECRETS__GITLAB_ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: prefect-secrets
key: gitlab-access-token
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: prefect-secrets
key: prefect-aws-access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: prefect-secrets
key: prefect-aws-secret-access-key
Now we can specify that as job_template
in KubernetesRun
, but it would be more convenient to store it in s3 and use job_template_path
instead. In order to achieve that we also need to provide AWS credentials to Prefect’s kubernetes agent. That means we need to add references to Kubernetes secret to Prefect’s helm_release resource in Terraform
# <http://prefect.tf|prefect.tf>
...
set {
name = "agent.env[0].name"
value = "AWS_ACCESS_KEY_ID"
}
set {
name = "agent.env[0].valueFrom.secretKeyRef.name"
value = kubernetes_secret.prefect-secrets.metadata.0.name
}
set {
name = "agent.env[0].valueFrom.secretKeyRef.key"
value = "prefect-aws-access-key-id"
}
set {
name = "agent.env[1].name"
value = "AWS_SECRET_ACCESS_KEY"
}
set {
name = "agent.env[1].valueFrom.secretKeyRef.name"
value = kubernetes_secret.prefect-secrets.metadata.0.name
}
set {
name = "agent.env[1].valueFrom.secretKeyRef.key"
value = "prefect-aws-secret-access-key"
}
...
Older versions of Prefect’s helm chart don’t have agent.env
section! Took me some time to notice that.
Flow
Now we can use this simple flow to check if everything works.
# workflow.py
import prefect
from prefect import task, Flow
from prefect.storage import GitLab
from prefect.run_configs import KubernetesRun
from prefect.executors import LocalExecutor
gitlab_storage = GitLab(
repo="gitlab-group-name/prefect-test-flow-repo-name",
ref="main",
path="workflow.py",
access_token_secret="GITLAB_ACCESS_TOKEN",
)
kubernetes_run_config = KubernetesRun(job_template_path="<s3://bucket-name/prefect-job-template.yaml>")
local_executor = LocalExecutor()
@task
def hello_task():
logger = prefect.context.get("logger")
<http://logger.info|logger.info>("Hello, cloud!")
with Flow("hello-flow") as flow:
hello_task()
flow.storage = gitlab_storage
flow.run_config = kubernetes_run_config
flow.executor = local_executor
@Anna_Geller: This is great! Thank you so much for sharing!
@Sam_Werbalowsky: Awesome , nice stuff! Are you planning to use dask executor at any point? I have set that up and there are some things you can do with variables and secrets there as well, so happy to share some of that if you go that route.
@Alexander_Melkoff: My plan is to go through basic stuff first and then experiment with Dask. I’m not sure if Prefect`s capability of creating temporary Dask clusters will be enough for us or we’ll have to spin up permanent Dask cluster. We’ll probably get there in a couple of weeks