How to add extra metadata to a flow so that it can be reflected from the UI

View in #prefect-community on Slack

Ben_Ayers-Glassey @Ben_Ayers-Glassey: Hello all! Is there a way to store arbitrary data on a flow, in such a way that you can get it back from the server (i.e. via prefect describe flows ...)?
What I’m wanting to do is mark each flow with the git repo and commit from which the flow was registered.
I’m thinking of situations where a flow run fails, so you want to see the source code; but if you’re in an organization with many people making their own flows in different Git repos, you may not know which repo the flow is from, and so you want a way to easily find the source for a given flow.
We’ve been asking people to manually add a link to the Git repo in every flow’s “README” section in the Prefect Cloud UI, but as far as I can tell, you can’t populate that README section via Python or commandline.

@Anna_Geller: Doing this via README is actually quite a cool idea! One user from the community contributed a script to do that programmatically - see this Discourse topic.

Another option would be to add this as an environment variable to your run configuration (those are displayed in the UI in the flow page):

flow.run_config = UniversalRun(env={"GIT_PATH": "<https://github.com/PrefectHQ/prefect/tree/orion>"})

Alternatively, this could be a bit tedious but you could leverage the parameter task to display such information in the UI, e.g. :

from prefect import Flow, Parameter

with Flow("your-flow") as flow:
    git_parameter = Parameter("git-path", default="<https://github.com/PrefectHQ/prefect/tree/orion>")
    flow.add_task(git_parameter)

The environment variable approach seems to be the easiest

Prefect Community: How to automatically upload a README to the Prefect Cloud UI from a Python script

Ben_Ayers-Glassey @Ben_Ayers-Glassey: > Doing this via README is actually quite a cool idea! One user from the community contributed a script to do that programmatically - see this Discourse topic.
:dizzy_face: Hitting the GraphQL directly?? That’s very fancy!

> Another option would be to add this as an environment variable to your run configuration (those are displayed in the UI in the flow page)
Ah ok, that’s a much simpler approach, more in line with what I had in mind.
Thank you!

Edit: Ah, it seems the environment variable trick doesn’t appear in prefect describe flows -n my_flow.
But the parameter does.

Out of curiosity, is there any plan (or would you consider) to add support for arbitrary metadata? Just a mapping from strings to strings, whose sole purpose is to live on a flow and be accessible via e.g. flow.metadata in Python, or prefect describe flows -n my_flow | jq .metadata?

@Anna_Geller: Not in Prefect 1.0 but we could think about some way of attaching additional metadata in Prefect 2.0 - I’ll forward this to the Product team
I thought your intention was to display that in the UI?

Ben_Ayers-Glassey @Ben_Ayers-Glassey: Ideally both UI and commandline.
> Not in Prefect 1.0 but we could think about some way of attaching additional metadata in Prefect 2.0 - I’ll forward this to the Product team
Thank you! :slightly_smiling_face:

@Anna_Geller: Sure! If you had to choose one, which one would you consider more important? Probably UI correct?

Ben_Ayers-Glassey @Ben_Ayers-Glassey: :thinking_face: Personally, I prefer commandline. And I tend to assume that getting something via an API will usually give me at least as much, if not more, information than a UI. I was fairly surprised the README was only in the UI!

I think the most important part is that whatever this thing is (README or env var or parameter or generic metadata), we should be able to write it in an automatic way. That way, we can have a flow-registering script which we can run in a git repo, and have it automatically add the repo’s URL and current commit to the flow, in a way that we can find it later.
And I generally assume that data which can be written from commandline/Python can also be read that way. It would be kind of surprising otherwise. Although with Prefect’s flows, sometimes information is stored in the serialized flow object, which is in “our” side of the “hybrid data model” – i.e. in the Storage – but not in the Cloud. For instance, that seems to be the case with the env vars on the RunConfig. But the purpose of metadata would be to have it stored in the Cloud, so you can grab it quickly without having to deserialize the flow from Storage.

@Anna_Geller: Thanks for responding, so CLI seems to be your preference

Ben_Ayers-Glassey @Ben_Ayers-Glassey: Yes! :slightly_smiling_face:
Ah, I see there is already an issue for this: https://github.com/PrefectHQ/prefect/issues/4154

GitHub: Allow to add flow Readme when registering flow · Issue #4154 · PrefectHQ/prefect

(With a good snippet showing how to use Client to automatically update the README from a .md file living beside the flow definition: https://github.com/PrefectHQ/prefect/issues/4154#issuecomment-930245627)

GitHub: Allow to add flow Readme when registering flow · Issue #4154 · PrefectHQ/prefect

@Anna_Geller: this is quite cool, indeed! well found, and thanks for sharing!