pretty new Prefect 2 user here. I have the following situation:
- I am running Prefect (
2.7.7) via docker-compose (don’t think it matters in this situation, but worth mentioning). The repository I’m using for this is https://github.com/rpeden/prefect-docker-compose, and it has worked for another example that I’ve tried.
- I have a Docker infrastructure block set up (this works - docker socket is mounted) with the block name
- I have a flow definition set up which uses e.g.
# extract_data.py import pandas as pd @task def save_df(df: pd.DataFrame): ... @flow(timeout_seconds=300, log_prints=True) def read_and_save_data(folder: str): df = ... save_df(df) if __name__ == '__main__': temp_dir = tempfile.mkdtemp() try: read_and_save_data(temp_dir) finally: rmtree(temp_dir)
docker-data-export infrastructure definition, I have
EXTRA_PIP_PACKAGES defined which includes e.g.
From the prefect CLI container in the Docker compose example, I’m running the following command:
prefect deployment build -n "Data export" \ -q default_queue \ -sb "remote-file-system/local-flows-minio" \ -ib "docker-container/docker-data-export" \ "data_exports/extract_data.py:read_and_save_data"
unfortunately, this yields the following error:
Script at 'data_exports/extract_data.py' encountered an exception: ModuleNotFoundError("No module named 'pandas'")
If I do e.g.
pip install pandas, the issue is gone (and it complains about the next library).
Why does the CLI need to know
pandas (or any other library that I’m using)?
Does this mean that I should set up a different venv for every one of my deployments?