I’ve been using Prefect 1 for that last couple of years, and now I’m trying to upgrade to 2.
In my first attempt I just used the same organization as my Prefect 1 code but just updated the relevant syntax. In a simplified version of my attempt, I have a repo where prefect is one directory in the root and under it I have:
wf_storage = S3(
bucket_path=S3_BUCKET + "/flows/myflow",
# Set creds in my local env
)
wf_storage.save("prefect-s3-block", overwrite=True)
@flow()
def myflow(
param1: str,
param2: int
):
pprint(locals())
When I run this main.py I see that the deployment is shown in the Prefect cloud UI. Then I start my agent (after setting AWS env) on my local machine and in the same repo root where I have access to prefect directory.
Now if I start a flow run from the UI, I get this in the flow run log:
Note that I renamed the flow in the simplified code above. I don’t know why my agent doesn’t have access to the flow code? Its current dir is the same as the directory where I call my main.py, and it also has AWS env set, so that it could get the code from AWS S3 as well.
You want S3 BUCKET to be just the bucket name. Is that what it’s set to? Does the storage block look correct if you inspect it in the UI?
At runtime, the agent isn’t finding the flow code storage. Prefect isn’t using the environment variable at run time - it is looking in the Deployment and sees that the flow code is in the storage block and the agent is trying to pull down that code from S3.
The new Prefect Projects is in beta and will be the preferred way to create a deployment. There’s an example showing S3 here. Make sure you have 2.10.3 - the latest Prefect version.
Version: 2.10.2
API version: 0.8.4
Python version: 3.11.0
Git commit: 16747be2
Built: Fri, Apr 7, 2023 10:19 AM
OS/Arch: linux/x86_64
Profile: default
Server type: cloud
want S3 BUCKET to be just the bucket name
Yes, I missed updating that bit in the problem description. S3_BUCKET stored the name of the bucket in my original script. This is how it looks like in the UI:
Also I checked the S3 bucket and the location in the path has the flow script copied to it.
Prefect isn’t using the environment variable at run time
I see, so does that mean I need to have my Block defined with the AWS credentials? I thought since my agent is running in an environment that knows the credentials, it just needs to know the path from the block, and not the credentials.
The problem I have is that I’m using MFA on AWS, and setting key ID, key secret and token didn’t work with S3 block, but setting values in the env and then running the code without the block credentials just worked!
I used similar approach in Prefect 1, where I used the same bucket for “registering” my Flows and then made sure my Agent’s and registering environment both have the credentials.
I really prefer to get this working by using the minimum number of concepts possible. If it should work without the Projects concept, I prefer to avoid it for now.
Hi @SorooshMani-NOAA My interpretation here is that there’s a misalignment of paths between the remote storage and the local storage. Try passing path="flows/myflow" to your build_from_flow call and change bucket_path=S3_BUCKET + "/flows/myflow" to just bucket_path=S3_BUCKET.
By default, we use a temporary directory for each run which disappears after the run. If you want to check what actually gets pulled down from S3 you can set an explicit Process infrastructure with a working directory (docs):
from prefect.infrastructure import Process
deployment = Deployment.build_from_flow(
...,
infrastructure=Process(working_dir="/path/to/some/directory"),
...
)
Thanks, this was really helpful for debugging. This is what I see now after adding path and updating the bucket_path as you suggested. I also added the Process infrastructure:
Let’s say I set working_dir="/some/path/workdir/", and Deployment’s path to be flows/myflow. When I run the Prefect code to apply the deployment, I see that the Prefect flow code is copied to my S3 bucket and under flows/myflow, i.e. s3://my_s3_name/flows/myflow/prefect/workflow/*.
Then when trying to run this deployment, I see that in the specified working directory a myfow directory is created and all the code is under it, i.e. /some/path/workdir/myflow/prefect/workflow/*. But the problem is that the running agent is trying to import the code from the specified work-dir and not the actual download path, so looking for the flow in /some/path/workdir instead of /some/path/workdir/myflow/. Is it a Prefect bug or is it something that I’m not setting correctly?
Please let me know if I can provide more information. Thanks again!
I don’t have any solutions yet, I’m still waiting for a reply from the Prefect team. I’m not sure if the issue is how my code is organized or it’s something that needs to be fixed in Prefect. I really appreciate it if someone from Prefect (or anyone else) could give us a helloworld example for working with s3 flow storage successfully!
I was simply following the tutorial, for S3 storage with a sub-path. The deployment store the code in the sub-path on the S3 fine, and the path is described in the deploy.yaml file:
In all the examples and tutorials I never saw S3 storage used with MFA: it only asks for key ID and Secret, but no the token. Is is possible that this issue I’m facing is due to trying to use S3 with key ID and Secret that also requires a token? i.e. temp credentials? Now that I think about it again, if the path was correct, it should have worked probably, so maybe the issue is still how path is processed!
If you do not currently use path in your S3 storage block on an affected version, then set path=/
If you already use path in your S3 storage block, then ensure it has a trailing / - like path=/custom/s3/path/flows/ - note the trailing / at the end.