What are common pitfalls and quick tips when setting up Prefect flow execution in ECS?
First of all, don’t get confused with the cross-terminology. An ECS Task is not the same thing as a Prefect task. ECS tasks are run as part of an ECS Cluster, they launch container(s) as defined in the ECS Task definition. An ECS task definition is the blueprint for the ECS task that describes which Docker container(s) to run and what you want to have happen inside these container(s).
The ECS task running the Prefect worker should be set up as an ECS service since it is a long running process and you want it to be re-built automatically if it ever stops unexpectedly. ECS services are used to guarantee that you always have some number of Tasks running at all times. For example, if a Task’s container exits due to an error, or the underlying EC2 instance fails and is replaced, the ECS Service will replace the failed Task. This makes ECS services perfect for managing a long running process like the Prefect Worker.
On the other hand, ECS Tasks are instances of a Task Definition. An ECS Task Execution (as opposed to service execution) launches container(s) (as defined in the task definition) until they are stopped or exit on their own. This makes ECS Task executions perfect for running an ephemeral/temporary process like a Prefect Flow Run.
You have two options for a capacity provider, either EC2 or Fargate. Fargate makes it easier to get started but it will increase the time it takes to spin up infrastructure for each flow run. Provisioning EC2 instances for the ECS cluster can reduce this lead time.
This terraform module will help you get started quickly.
Just adding onto Taylor’s post. There is also an ECS Fargate recipe that instead uses the Prefect Worker.