Is it possible to pass a file to a prefect flow via a parameter? It is not a large file (~2KB), but the file contains metadata needed for one of the tasks and will change every time the flow is run, so it is a hassle to ask the user to upload it to somewhere on s3 every time they want to run the flow and then copy that path. But maybe that is the only option?
Interesting, how is this flow being triggered? You may consider implementing this process separately from Prefect and the flow may just grab the file from the place this process stored it.
“hassle to ask the user to upload it to somewhere”
Inputting a file to a flow would require putting it somewhere specific for it to be ‘passed’ to a prefect flow, meaning it would need a fixed location at some point.
Having the flow preconfigured to read that path (presumably where the updated/output file lives normally) would be easiest.
I’m curious what your approach to passing a file to a flow would be otherwise, even just conceptually, ignoring whether it is possible in prefect or not.
Interesting, I wouldn’t want to give any generic recommendations before I know what problem do you try to solve. Could you describe your use case a bit more? Are you planning to process some files on a cadence e.g. for raw data ingestion? Or do you trigger such workflows ad hoc? Is this some static or rather more event driven use case?
My use case is that I am automating a pipeline which analyzes data coming off of a lab instrument. The scientist triggering the workflow has a metadata file that they generate for each run of the instrument (probably stored on their laptop). They somehow need to get that information to the prefect flow. The only way I can think of to do this is to have them upload that metadata file to s3 and then pass the s3 path to the flow to download. But it would be great if I could skip the uploading to s3 part. The workflow will need to be triggered each time the instrument is run.
Thanks so much for explaining that use case. There are so many ways to approach that problem. Probably the easiest way would be to allow those scientists to put those files into some shared Network File System (say, AWS EFS) and then you could have a flow that periodically polls that directory for files and moves those that have been processed to a directory “processed” or something like that. Instead of NFS, they could also upload those files to S3 and you could still follow the same pattern of periodically polling this directory where the files are supposed to land and moving those that have been processed.
Just some ideas for you to start exploring the problem space
Ok thanks Anna! We are currently setting up Sharepoint as our EFS. I’m not sure how easy it is to interact with in that way but I will look into it.