View in #prefect-community on Slack
@Gaurav: Hello, I have successfully deployed a Kubernetes Prefect Agent on Azure Kubernetes Cluster. I am trying to run a simple flow that utilizes a LocalDaskExecutor on the AKS Virtual Nodes. For this, I am using a custom job template for the pod, because it needs some customized node selectors and tolerations that Azure publishes. the following is snippet of my job_template:
job_template={
"apiVersion": "batch/v1",
"kind": "Job",
"spec": {
"template": {
"metadata": {
"labels": {
"execution-model": "serverless"
}
},
"spec": {
"containers": [
{
"name": "flow"
}
],
"nodeSelector": {
"execution-model": "serverless"
},
"tolerations": [
{
"key": "http://virtual-kubelet.io/provider|virtual-kubelet.io/provider",
"operator": "Exists"
}
]
}
}
}
However the flow fails. When i ran kubectl get events. I notice the following output:
Warning ProviderCreateFailed pod/prefect-job-XXXXX-XXXXX ACI does not support providing args without specifying the command. Please supply both command and args to the pod spec.
Just some more information - I also ran the same flow successfully on a alternate deployment on AWS EKS Fargate, using an AWS Kubernetes Agent.
Any guidance is really appreciated
@Kyle_McChesney: Not sure if this is directly a prefect issue, seems like maybe some kind of k8s “protection” logic. Doesn’t like when you pass ARGs but dont specify a CMD
it may be cloud-specific (i.e. not enabled in AWS)
@Gaurav: Interesting, thanks for your prompt response!
@Kevin_Kho: Yeah Kyle looks right here that this is not a Prefect thing necessarily
@Matthias: This not related to Prefect, at least not directly. This is a specific issue that pops up when using Azure AKS virtual nodes (see link posted by Kyle above). The reason it works on AWS fargate is that both of these services are different under the hood.
@Anna_Geller: There are some great answers here already, but if you still haven’t solved it yet Gaurav, could you share your Dockerfile of the image you use, as well as your flow object configuration i.e. storage, run config, and the executor? The SO answer doesn’t seem right to me as it discusses an issue with dask KubeCluster
, while you mentioned you are using a LocalDaskExecutor
which should just use local threads and processes rather than a KubeCluster.
For troubleshooting, I’d recommend taking it more step-by-step and trying a simple hello world flow on AKS with no custom job template first (just using the Prefect base image) to ensure your Prefect AKS setup is working fine before moving to custom job templates and Dask.
@Matthias: The SO answer discusses the underlying issue but indeed, the issue popped up in a different context (that just happens to be dask KubeCluster
). The real problem is that if you want to use AKS with virtual nodes, not only do you have to add custom annotations/node selectors, but you also have to supply both a command and args to the pod spec (which is the exact error message you got). Apparently, ACI uses the combination under the hood and would lead to unexpected behaviour if you supply one without the other see
So you have to add these to the custom job template