Hello everyone,
pretty new Prefect 2 user here. I have the following situation:
- I am running Prefect (
2.7.7
) via docker-compose (don’t think it matters in this situation, but worth mentioning). The repository I’m using for this is https://github.com/rpeden/prefect-docker-compose, and it has worked for another example that I’ve tried. - I have a Docker infrastructure block set up (this works - docker socket is mounted) with the block name
docker-data-export
, sodocker-container/docker-data-export
. - I have a flow definition set up which uses e.g.
pandas
.
# extract_data.py
import pandas as pd
@task
def save_df(df: pd.DataFrame):
...
@flow(timeout_seconds=300, log_prints=True)
def read_and_save_data(folder: str):
df = ...
save_df(df)
if __name__ == '__main__':
temp_dir = tempfile.mkdtemp()
try:
read_and_save_data(temp_dir)
finally:
rmtree(temp_dir)
In my docker-data-export
infrastructure definition, I have EXTRA_PIP_PACKAGES
defined which includes e.g. pandas==1.5.2
.
From the prefect CLI container in the Docker compose example, I’m running the following command:
prefect deployment build -n "Data export" \
-q default_queue \
-sb "remote-file-system/local-flows-minio" \
-ib "docker-container/docker-data-export" \
"data_exports/extract_data.py:read_and_save_data"
unfortunately, this yields the following error:
Script at 'data_exports/extract_data.py' encountered an exception: ModuleNotFoundError("No module named 'pandas'")
If I do e.g. pip install pandas
, the issue is gone (and it complains about the next library).
Why does the CLI need to know pandas
(or any other library that I’m using)?
Does this mean that I should set up a different venv for every one of my deployments?