Why Prefect 2.0 no longer has Git storage (GitHub, Bitbucket, CodeCommit, ...)?

Prefect 1.0 has a concept of Git-based storage classes that were used to clone user’s repository at runtime and deploy that flow run using the code from the cloned repository.

However, we realized that this caused a lot of confusion and raised the question of:

  • what is the difference between build-time and runtime?
    Isn’t the process of pushing the code to a Git repository supposed to trigger an application build using some CI/CD system such as a GitHub Actions workflow? why is Prefect doing that every time at runtime instead?
  • if Prefect is pulling the flow code at runtime, what should be part of a CI/CD, and what should be part of the Prefect flow run? if Prefect is already pulling the repository at runtime, will it also build my Docker image at runtime and build custom Python packages within my environment?

image

  • where am I supposed to version my flow code - is Git commit SHA or my flow version a single source of truth?
  • why is Prefect not using my custom modules and only using the flow code? if Prefect already pulls the entire repository, why can it not also install my modules into my environment at runtime and build Docker images for me using requirements.txt and Dockerfile in my GitHub at runtime? Why can it not automatically figure out what and how to package within this GitHub repository?
  • if Prefect is already pulling the code from the Git repo, can it determine when it needs to pull based on changes made to this repo?
  • how does it differ for monorepo vs. a per-project or even per-flow repo setup? does Prefect has best practice recommendations for the “correct” way of using that? (Prefect cannot do that from a single Git storage abstraction as this setup is so different depending on each individual use case)
  • why is Prefect using the wrong version of my flow? oh shoot, I forgot to commit and push the latest changes to my code.

Our answer to this problem in Prefect 2.0

We want to support GitHub as a way for users to store and version their code, but we believe the problem users try to solve here is:

I’d like to push my code to GitHub and have my code packaged and deployed to Prefect

Our answer to this:

We think a better solution is a robust set of CI/CD recipes (and potentially GitHub Actions provided in Marketplace) that will let you handle packaging your code e.g., as a Docker image and building deployments when needed (i.e., when your code actually changed).

Having this implemented as storage is not the right approach because this problem involves more than just storage - it’s a CI/CD problem

This is why we will provide official CI/CD recipes allowing you to deploy your flows, package the dependencies and do it in a proper reliable engineering process, and redeploy only when it’s needed. This cleanly separates “build-time” concerns (packaging dependencies and deploying flows) from runtime concerns (executing your code with run-time parameters, etc.).

To follow along, subscribe to the DataOps Tutorials for Prefect 2.0 - Prefect Community category on Discourse to get notified about new recipes - especially those tagged with Topics tagged ci-cd

:point_right: Note: for now, there are no immediate plans to introduce GitHub as storage for the following reasons, but we may reconsider this decision later.

2 Likes

The only benefit to the GH storage seemed to be the ability to have a boiler-plate runtime image that you just drop code into. That was going to allow some level of DRY but didn’t really add much flexibility beyond what you can achieve with CI/CD and git practices like git-flow

Note that as of Prefect 2.3.0 GitHub storage is available as a read-only storage block.

1 Like