How to properly manage external code dependencies in a Prefect 1.0 flow?

Hello!
I would like to implement one central prefect project, where over time it will be possible to add flows independent of each other. The structure of the project is something like this:

prefect/
├── src/
│   ├── flows/
│   │   ├── test_pack1/
│   │   │   ├── common/
│   │   │   │   ├── __init__.py
│   │   │   │   └── test_module.py
│   │   │   ├── .env
│   │   │   ├── __init__.py
│   │   │   ├── requirements.txt
│   │   │   └── test_pack1_flow.py
│   │   ├── test_pack2/
│   │   │   ├── __init__.py
│   │   │   ├── .env
│   │   │   ├── requirements.txt
│   │   │   └── test_pack2_flow.py
│   │   ├── __init__.py
│   │   └── Dockerfile
│   ├── utilities/
│   │   ├── __init__.py
│   │   ├── storage.py
│   │   ├── builder.py
│   │   ├── executor.py
│   │   └── run_config.py
│   ├── .env
│   ├── __init__.py
│   └── main.py
├── .gitignore
├── poetry.lock
└── pyproject.toml

I would like each flow in the flows/ folder to be independent of the central project and created as a separate docker container.

builder.py at startup searches for all flows in flows/ folder, sets a specific configuration and registers them on the server.

But I ran into the problem of importing third-party packages. Let’s say in the test_package1/ in requirements.txt there is SQLAlchemy==1.4.34. And in test_pack1/common/test_module.py there is an import sqlalchemy. And test_pack1/test_pack1_flow.py have a @task with function from test_module.py. When the FlowBuilder class looks for a flow variable in the file test_pack1_flow.py it does this using the function flow = extract_flow_from_file(str(flow_module)). At this step, a ModuleNotFoundError error occurs, since there is no such dependency in the prefect central application(in pyproject.toml). But when the docker container is created, after flow.register(), of course it will already be there. How can I handle this step? Or maybe I’m doing something wrong?

I use Docker Storage, Docker Run and Local Executor.

1 Like

Central project

Can you explain what you mean by a central project? Do you mean that the central project is a root project containing shared modules?

Looping over flows to register them

This is not really needed if you leverage the prefect register CLI command. It already contains that functionality:

Attaching different storage and run_config based on environment

To set custom storage and run configs, you could have something as simple as a simple function setting different storage or run config depending on your environment. Here is an example:

Ensuring the same dependencies in local development environment and in production Docker image

The way I usually handle that is by installing the same package within your local environment as well as within a Docker image. Here is an example repo:

If you do:

pip install .

It will install this package (as defined in setup.py) locally.

Then, we do the same in the Dockerfile in this line.


LMK if you still have any questions about it.

I mean a project containing only prefect itself in dependencies (in pyproject.toml). It will be stored on the VPS, its area of responsibility will include launching the prefect server, a local (docker) agent, and checking (registering) new flows, which are planned to be added, somehow using ci/cd and the root project will not be associated with individual flows and their dependencies. Root project is only engaged in findind and registering flows.

So you mean that I have to install all the dependencies from each individual flow into the root project? That is, in my case, to transfer all the dependencies from small requirements.txt into one main pyproject.toml?
Then what to do in this case: test_pack1_flow.py uses for example pandas==1.4.2, and test_pack2_flow.py uses pandas==1.2.4?

Okay, I get it. Thanks!

Why do you think this would be required? Your Prefect base image will already contain Prefect so there’s no need to install it separately.

To start Prefect Server, it’s just one command. I’m not sure it requires a separate project:

prefect server start --expose

The same for a Prefect Docker agent:

prefect agent docker start --label yourlabel

Registering your flows can also be done from a single CLI command. If you need some resources about CI, check out those Discourse topics:
https://discourse.prefect.io/tag/ci-cd

I think your best option is to use the CLI command. This way you don’t need to build any extra functionality for that and this makes it that much easier to maintain your flows.

prefect register --project xyz -p flows/

Providing this path to flows/ will already loop over all flows for you and register them only when there are some changes made to your flow, if your metadata didn’t change, the flow won’t be re-registered so you can safely use that in your CI pipeline.

This topic dives deeper into it:

Not necessarily - you can install only the package you need. I’m only implying that having your code dependencies per flow/project packaged with setup.py, allows you to easily install it the same way both locally AND within your production Docker image (dev/prod parity). I don’t use poetry so can’t say how this should be done in pyproject.toml, probably the same way.

And btw, if you’re just getting started with Prefect, perhaps it makes sense to start directly with Prefect 2.0? We don’t have a public timeline on when it’s out of beta, but this would be more beneficial for you long-term:
https://discourse.prefect.io/t/should-i-start-with-prefect-2-0-orion-skipping-prefect-1-0/544

OK, thanks for the answers. I’ll explore all the options.

1 Like