Flow Code Not Found

I’ve been using Prefect 1 for that last couple of years, and now I’m trying to upgrade to 2.

In my first attempt I just used the same organization as my Prefect 1 code but just updated the relevant syntax. In a simplified version of my attempt, I have a repo where prefect is one directory in the root and under it I have:

prefect/
└── workflow
    ├── flows
    │   └── end_to_end.py
    └── main.py

My flow code is (in end_to_end.py):

wf_storage = S3(
    bucket_path=S3_BUCKET + "/flows/myflow",
    # Set creds in my local env
)
wf_storage.save("prefect-s3-block", overwrite=True)

@flow()
def myflow(
    param1: str,
    param2: int
):
    pprint(locals())

and in main.py I have (not the complete file):

wf_storage = S3.load("prefect-s3-block") # load a pre-defined block

deployment = Deployment.build_from_flow(
    flow=myflow,
    name="myflow-deploy",
    work_queue_name="test",
    storage=wf_storage,
)

if __name__ == "__main__":
    deployment.apply(upload=True)

When I run this main.py I see that the deployment is shown in the Prefect cloud UI. Then I start my agent (after setting AWS env) on my local machine and in the same repo root where I have access to prefect directory.

Now if I start a flow run from the UI, I get this in the flow run log:

Note that I renamed the flow in the simplified code above. I don’t know why my agent doesn’t have access to the flow code? Its current dir is the same as the directory where I call my main.py, and it also has AWS env set, so that it could get the code from AWS S3 as well.

2 Likes

What version of Prefect are you using?

prefect version will give you all the details.

You want S3 BUCKET to be just the bucket name. Is that what it’s set to? Does the storage block look correct if you inspect it in the UI?

At runtime, the agent isn’t finding the flow code storage. Prefect isn’t using the environment variable at run time - it is looking in the Deployment and sees that the flow code is in the storage block and the agent is trying to pull down that code from S3.

1 Like

The new Prefect Projects is in beta and will be the preferred way to create a deployment. There’s an example showing S3 here. Make sure you have 2.10.3 - the latest Prefect version.

This is the version information:

Version:             2.10.2
API version:         0.8.4
Python version:      3.11.0
Git commit:          16747be2
Built:               Fri, Apr 7, 2023 10:19 AM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         cloud

want S3 BUCKET to be just the bucket name

Yes, I missed updating that bit in the problem description. S3_BUCKET stored the name of the bucket in my original script. This is how it looks like in the UI:

Also I checked the S3 bucket and the location in the path has the flow script copied to it.

Prefect isn’t using the environment variable at run time

I see, so does that mean I need to have my Block defined with the AWS credentials? I thought since my agent is running in an environment that knows the credentials, it just needs to know the path from the block, and not the credentials.

The problem I have is that I’m using MFA on AWS, and setting key ID, key secret and token didn’t work with S3 block, but setting values in the env and then running the code without the block credentials just worked!

I used similar approach in Prefect 1, where I used the same bucket for “registering” my Flows and then made sure my Agent’s and registering environment both have the credentials.

1 Like

I really prefer to get this working by using the minimum number of concepts possible. If it should work without the Projects concept, I prefer to avoid it for now.

1 Like

Hi @SorooshMani-NOAA :wave: My interpretation here is that there’s a misalignment of paths between the remote storage and the local storage. Try passing path="flows/myflow" to your build_from_flow call and change bucket_path=S3_BUCKET + "/flows/myflow" to just bucket_path=S3_BUCKET.

By default, we use a temporary directory for each run which disappears after the run. If you want to check what actually gets pulled down from S3 you can set an explicit Process infrastructure with a working directory (docs):

from prefect.infrastructure import Process

deployment = Deployment.build_from_flow(
    ...,
    infrastructure=Process(working_dir="/path/to/some/directory"),
    ...
)

Thanks, this was really helpful for debugging. This is what I see now after adding path and updating the bucket_path as you suggested. I also added the Process infrastructure:

Let’s say I set working_dir="/some/path/workdir/", and Deployment’s path to be flows/myflow. When I run the Prefect code to apply the deployment, I see that the Prefect flow code is copied to my S3 bucket and under flows/myflow, i.e. s3://my_s3_name/flows/myflow/prefect/workflow/*.

Then when trying to run this deployment, I see that in the specified working directory a myfow directory is created and all the code is under it, i.e. /some/path/workdir/myflow/prefect/workflow/*. But the problem is that the running agent is trying to import the code from the specified work-dir and not the actual download path, so looking for the flow in /some/path/workdir instead of /some/path/workdir/myflow/. Is it a Prefect bug or is it something that I’m not setting correctly?

Please let me know if I can provide more information. Thanks again!

1 Like

I have the same problem , is there a resolution or answer anywhere?

1 Like

I don’t have any solutions yet, I’m still waiting for a reply from the Prefect team. I’m not sure if the issue is how my code is organized or it’s something that needs to be fixed in Prefect. I really appreciate it if someone from Prefect (or anyone else) could give us a helloworld example for working with s3 flow storage successfully!

1 Like

I was simply following the tutorial, for S3 storage with a sub-path. The deployment store the code in the sub-path on the S3 fine, and the path is described in the deploy.yaml file:

storage:
  bucket_path: junction-schedules-export/Test/test_data/prefect
  aws_access_key_id: null
  aws_secret_access_key: null
  _block_document_id: d57996c5-9ba0-4506-96ec-1d8c922628d8
  _block_document_name: log-test
  _is_anonymous: false
  block_type_slug: s3
  _block_type_slug: s3
path: s3-subdir

But the run always has the error:

FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/td/y1rqqq0925ngzqgpxkr8ntf80000gn/T/tmpgsknvnqaprefect/log_flow.py'

i.e. it is not downloading the code from the specified sub-directory. It looks like, either
a) A missing parameter, in run?
b) Or a bug in Prefect!

This is becoming a potential blocker on live deployment where I work, a response would be appreciated.

PS Works perfectly without the S3 sub-path.

2 Likes

Hi,
How exactly do you work with your code without “path”, all source code in the root of bucket?
Please, let me know your workaround.

I’m struggling with the same issue and can’t figure it out.
Thank you!

1 Like

In all the examples and tutorials I never saw S3 storage used with MFA: it only asks for key ID and Secret, but no the token. Is is possible that this issue I’m facing is due to trying to use S3 with key ID and Secret that also requires a token? i.e. temp credentials? Now that I think about it again, if the path was correct, it should have worked probably, so maybe the issue is still how path is processed!

1 Like

If all your code in in the root, you can use path=/ - this has been an unfortunate byproduct of an upstream change. You can see the issue here - Flow could not be retrieved from deployment with s3fs==2023.3.0 · Issue #8710 · PrefectHQ/prefect · GitHub

If you do not currently use path in your S3 storage block on an affected version, then set path=/
If you already use path in your S3 storage block, then ensure it has a trailing / - like path=/custom/s3/path/flows/ - note the trailing / at the end.

That’s it! The added ending / fixed my issue. I can rest in peace now!!! Thank you!

gokil