I manage a custom workflow manager stack that handles the scheduling and implementing of (mostly) ETL processes to download and store data. I’ve been investigating Prefect OSS and have decided to make the next step and start testing whether it can provide improvements for my team that wouldn’t require the management and upkeep on our custom build. In beginning to map things out, I’ve come to a fork in the road that I’d love some guidance on how to handle.
Our current system has a persistent scheduler that stores schedules in a MongoDB collection and creates future jobs based upon those schedules. Those schedules persist between deployments because the MongoDB data is untouched during deployment and just reused every time we deploy/rebuild. They are editable in a UI.
When I first started researching Prefect a year ago (2.0 was still in beta), deployments I think were handled or viewed much differently. Now, it appears as though a deployment is more 1:1 with a flow, or is at least more presented that way. The goal from a scheduling standpoint is to allow non-developers to hop into the Prefect UI and add/edit a schedule and then when we were to deploy a new build, the schedule would persist.
With that said, when is it appropriate to deploy a flow again? Never? Changing the flow file allows for scheduling to persist. How does flow.server() play into this? The quickstart guidance explains that it is the easiest way to deploy a flow, but that rarely makes it the correct way.
I appreciate any insight you all can provide on this topic and look forward to following along here. Thanks again.
I’m not the right person to answer, as I find the updates to deployment documentation since around prefect 2.6 to he confusing.
Here’s what I do (Prefect OSS) to manage deployment, which creates a persistent schedule. When I change flow code I have to rerun the deployment command – because with Filesystem storage – which saves a fresh copy of the flow code to deployed code location.
Anyway, I deploy by calling a bash script that contains prefect cli command ‘prefect deployment build …etc’ with parameters and environment variables as part of the script file. It works pretty well in a CICD arrangement, where merge to code repo triggers pytest and the deployment build step for all the build scriptfiles in a folder. Schedule can be updated this way, but can be changed in the UI.
But, a real developer could do a lot better and would probably prefer to manage many many deployments in a different way. I like the depoloyment as configuration style, so the ‘my_flow.serve(…)’ way baffles me.
This is great, thanks. I wish they would allow decoupling the scheduling from deployment because, for my ETL purposes at least, I’d want it to be more persistent.
I have to look at the configuration deployment you mentioned and probably create a deployment manager app. This would also be needed if I’m going to run multiple schedules on one deployment….something I’m surprised isn’t native.
You can ignore the branch checking, that relates to running two prefect servers with docker compose for a dev and main branch. The dev one doesn’t get a scheduled job time.
Anyway, you can programmatically set the schedule for deployed jobs with prefect deployment set-schedule --cron '<cron exp>'. In my conception, the deployment command does two important things: load the flow information to the prefect DB for scheduling or triggering job runs, and updating the flow code which is simply stored in a filesystem block location on the same machine. TBH it all works pretty well, I can manage a lot of deployed jobs, and keeping them organized in a stack of .sh files (or you can do the same with python api) is ok. Seems to lack that next level of elegance though, storing it all as .yaml configuration files as source code for deployment definitions would be better.