Switching from agents to workers in a monorepo

jeremy_thomas · March 20, 2024, 1:56pm

Hello,

We have been using Prefect since 1.2 and have been using a Prefect agent to run our infrastructure blocks in Prefect 2 since migration. I see that agents are being deprecated, so I started looking into workers, and I see some significant issues with the design patterns involved that are preventing us from making a smooth migration.

First of all, in the upgrade guide, I see this text near the bottom:

With agents, you might have had multiple deployment.yaml files, but under worker deployment patterns, each repo will have a single prefect.yaml file located at the root of the repo that contains deployment configuration for all flows in that repo.

We work in a monorepo with dozens of projects, and using 1 single file to maintain everyone’s separate deployments sounds like a minor version of hell on Earth. How does this work with CI/CD processes? When I want to deploy 1 new flow, the entire repository worth of flows is deployed again?

Another issue with this pattern is storage. We use 2 separate storage blocks: GCS cloud storage for dev and GitHub for prod. With projects scoped to their folder within the repo, this means the GCS stored code can be lightweight and stored in separate folders for each flow, meaning uploading / downloading code is fast. By moving everything to the repository root, does that mean every dev deployment will have to upload / download the entire repository to run 1 flow? This is unsustainable and a waste of resources, including cloud storage space. We cannot use GitHub for dev flows, because we would need a separate storage block for every branch someone was developing in, which seems counter-intuitive to the block pattern.

Overall, workers seem like an unnecessary degradation in quality of life for Prefect, and if agents are no longer going to be supported, I am not sure I see a reasonable path to upgrading to workers in our future, unless we can solve these issues. I would appreciate any guidance or help in solving these issues for our team, and if anyone has an example of how they have made workers function efficiently in a monorepo setting, I would love to speak with you about how you made that work.

desertaxle · March 21, 2024, 1:58pm

Hey @jeremy_thomas! We have a monorepo at Prefect that we use to manage the deployments of all of our internal flows.

The line you referenced from the documentation looks like it needs to be updated because we’ve since added support for multiple prefect.yaml files in a repository.

In our monorepo, we have different domains for our flows, and each domain has its own prefect.yaml file that defines how to deploy flows in that domain. We then deploy flows in our CI pipeline from those domains via prefect deploy with the --prefect-file flag to point to one of our domain prefect.yaml files. Moving the prefect.yaml files into sub-directories allows us to control code upload and download at a more granular level because those operations use the location of the prefect.yaml file as an anchor.

This pattern works well for us and may work well for your team too. Please let me know if you have any other questions or concerns, would like me to go into more detail, or would like to talk about different patterns of deployment management!

jeremy_thomas · March 21, 2024, 3:02pm

Thanks for your reply! That documentation change would be very appreciated.

So in the case of dev and prod flows with workers, using them in the way you have described, each sub-directory would have 2 prefect.yaml files, one for each worker (we set some env vars in our dev infrastructure block to limit the ability to modify prod data from dev flows), is that correct? This would be very similar to how we already use deployments, so that would work for us.

In this case, to generate the yaml files, we would navigate to that sub-directory, and run prefect deploy the first time to use the wizard and generate the first yaml file, but then to create the 2nd file, running the prefect deploy command again would default to running that deployment, because the yaml file would be found, right?

My main concern is ease of use for those on my team that are not as engineering savvy. I need them to have a simple process to generate the files they need, and it seems like the default behavior of Prefect may make this overly complicated for them.

desertaxle · March 21, 2024, 4:09pm

You can have one prefect.yaml file in each subdirectory with multiple deployment configurations. When you run prefect deploy in a directory with an existing prefect.yaml file, the deployment wizard will give you the choice to deploy using an existing config or create a new one. You can save that new config to the prefect.yaml file for future use. It should be reasonably intuitive, but if it isn’t, we’re open to enhancement requests to make it easier to use!

jeremy_thomas · March 21, 2024, 5:30pm

Yeah, the more I read about the deployment files, it seems like it could be easy.

One minor enhancement I can already see is custom recipes. If we could make our own recipe that allows users to generate an almost complete yaml, specific to our deployments, that would be the best case scenario.

I will work on transferring some flows over to workers with your advice in mind, and will reach back out if I encounter further issues. Thank you again for always being quick to answer my questions, whether here or in GitHub!

merlin · March 23, 2024, 5:57am

Hi Jeremy I can confirm after recently migrating off of agents to prefect.yaml worker deployments that it is overall a pretty sane way to organize deployments and run the deployment process. (As long as you feel yaml is sane of course). I rejoiced when they added the --prefect-file option to the CLI prefect deploy command.

I use a single gitlab repo for storage, but I think the docs are fairly clear about establishing different pull: steps for GCS or github depending on the deployment/worker combo.

For my small team monorepo, I organize deployments into only two environments. Prod calls our Prefect Cloud account, and ‘dev’ calls a self-hosted prefect instance on EC2.

The worker+yaml configuration model is more amenable to CICD. Also, you dont need to run the deployment steps except when the deployment configuration changes. If the flow code changes and gets pushed the existing deployments will run with the HEAD of the defined branch.

So when it comes to deploying from a feature branch for testing, I’m still deploying to the same ‘dev’ Prefect server. The pull step can be branch specific, and if you append the branch name to the deployment name it would avoid the trouble of multiple people deploying to the same dev Prefect instance. I suppose you could spin up a Prefect server for each branch getting tested. I use docker compose for the DB storage and server containers, its pretty easy to manage once its set up and could scale to many instances for testing branch-specific deployments.

Instead of all that we test our flow code and then merge to the ‘dev’ branch where deployments can be tested as well. So feature branches are not getting deployed individually.

Here’s a prefect.yaml snippet I’m using now:

build:
    - prefect.deployments.steps.run_shell_script:
        id: current-commit
        script: git rev-parse --short HEAD
        stream_output: false
    - prefect.deployments.steps.run_shell_script:
        id: branch-name
        script: git symbolic-ref --short HEAD
        stream_output: false
...
pull:
- prefect.deployments.steps.git_clone:
    repository: "{{ prefect.blocks.github.myproject.repository }}"
    access_token: "{{ prefect.blocks.github.myproject.access_token }}"
    branch: "{{ branch-name.stdout}}"
    include_submodules: true
...
  version_label:
    commit_hash: &version "{{ current-commit.stdout }}"

That builds the deployment based on the branch I’m working in (which is always ‘dev’ branch for deployments in my case.)

I hope that helps.

Topic		Replies	Views
Choosing the right flow storage, repository structure, agents and execution layer for new teams Archive organizational , prefect-1-0 , prefect-2-0 , ci-cd , environments , getting-started	4	1242	February 19, 2022
Prefect 2.10 is here with Workers, Projects, Variables, versioned docs, and more! 🎆 Announcements prefect-2-0 , release-notes	0	1200	April 6, 2023
We are still working on the deployment UX and CI/CD workflows to package your flows and code depependencies to your production systems Announcements prefect-2-0 , deployment , best-practices	0	852	August 2, 2022
Deployment to Prefect 3 with Minio storage Help migration-guide , deployment	0	16	October 21, 2024
Prefect 2.1.0 has just arrived! It includes Python-based Deployments, improvements to work queues, tons of new integrations and features! Announcements prefect-2-0 , release-notes	0	1279	August 17, 2022

Switching from agents to workers in a monorepo

Related topics