GitHub storage block and dynamically setting branches

jeremy_thomas · July 24, 2023, 3:45pm

I would like to use the GitHub storage block for our flows, but it seems like the branch name (called reference in the block) is only respected at the block level, not the deployment level. I have the following code to create my deployment:

...
storage = await GitHub.load("repo")
storage.reference = Repository(".").head.shorthand
infra = await VertexAICustomTrainingJob.load("vertex")

deploy = await Deployment.build_from_flow(
    flow=hello_flow,
    name="hello-flow",
    storage=storage,
    infrastructure=infra,
)
...

And the yaml file created from this deployment has the correct reference:

...
storage:
  repository: https://github.com/<redacted>.git
  reference: prefect2-git-storage
  access_token: '**********'
  include_git_objects: true
...

But when the flow runs, the code is running from the main branch. The block is saved in Prefect without a reference, and I was hoping to be able to set it at the deployment level for development purposes, but I’m not sure if this is possible. Any help would be greatly appreciated!

nate · July 24, 2023, 9:01pm

hi @jeremy_thomas - if you’re just getting started with your deployment strategy, I would recommend taking a look at these docs and this example.

You can define a git_clone pull step and template in the branch that you need for a given deployment, whether via a block value, a env var, or the result of a shell script.

Let me know if you develop any questions

On your actual question

but it seems like the branch name (called reference in the block) is only respected at the block level, not the deployment level

this is correct - if I’m understanding your point, this is as designed.

jeremy_thomas · July 25, 2023, 1:05pm

Nate,

We have our deployments working using GCS as our storage backend, but this is a very slow solution when there is lots of code for a flow. It also results in our code being duplicated; GitHub and a GCS bucket.

I understand the design idea of having blocks being somewhat static, but we have infrastructure overrides - why not storage overrides? It would allow us to make blocks be just as reusable, and give us the ability to tweak settings per deployment if needed.

nate · July 28, 2023, 6:30pm

hi @jeremy_thomas - the resources I linked above show how you can define deployments in the prefect.yaml file, in particular you can define a generic pull step (how the worker gets flow code for a given deployment flow run, which can be shell script, git clone, whatever you want) that can be overridden on a deployment basis.

In effect, this should accomplish what you might be interested in as far as “storage overrides”

worth noting that the prefect.yaml + prefect deploy story is our main recommendation for deployment management, in contrast to the infra block / build_from_flow story

Topic		Replies	Views
Path Option Not Working in Deployment with GitHub Storage Block Help prefect-2-0 , deployment , troubleshooting , github , blocks , marvin	1	805	May 21, 2023
GitHub Hosting Flows Help prefect-2-0 , github	5	1301	January 12, 2023
Change where data from an S3 block is copied to for execution Help prefect-2-0 , deployment , s3-storage , infrastructure-blocks	1	172	February 13, 2024
Prefect 2.0 Docker Container using GitLab storage Archive prefect-2-0 , storage , docker , git , gitlab , github-storage	2	1412	July 29, 2022
Deployments are now simpler and declarative Show and Tell prefect-2-0 , deployment , blocks , infrastructure-blocks	0	1772	July 27, 2022

GitHub storage block and dynamically setting branches

Related topics