Error when creating a deployment with the CLI (ModuleNotFound)

Hello everyone,
pretty new Prefect 2 user here. I have the following situation:

  1. I am running Prefect (2.7.7) via docker-compose (don’t think it matters in this situation, but worth mentioning). The repository I’m using for this is GitHub - rpeden/prefect-docker-compose: A repository that makes it easy to get up and running with Prefect 2 using Docker Compose., and it has worked for another example that I’ve tried.
  2. I have a Docker infrastructure block set up (this works - docker socket is mounted) with the block name docker-data-export, so docker-container/docker-data-export.
  3. I have a flow definition set up which uses e.g. pandas.
# extract_data.py
import pandas as pd

@task
def save_df(df: pd.DataFrame):
   ...

@flow(timeout_seconds=300, log_prints=True)
def read_and_save_data(folder: str):
    df = ...
    save_df(df)

if __name__ == '__main__':
    temp_dir = tempfile.mkdtemp()
    try:
        read_and_save_data(temp_dir)
    finally:
        rmtree(temp_dir)

In my docker-data-export infrastructure definition, I have EXTRA_PIP_PACKAGES defined which includes e.g. pandas==1.5.2.

From the prefect CLI container in the Docker compose example, I’m running the following command:

prefect deployment build -n "Data export" \
-q default_queue \
-sb "remote-file-system/local-flows-minio" \
-ib "docker-container/docker-data-export" \
"data_exports/extract_data.py:read_and_save_data"

unfortunately, this yields the following error:

Script at 'data_exports/extract_data.py' encountered an exception: ModuleNotFoundError("No module named 'pandas'")

If I do e.g. pip install pandas, the issue is gone (and it complains about the next library).

Why does the CLI need to know pandas (or any other library that I’m using)?
Does this mean that I should set up a different venv for every one of my deployments?

For anyone who might have faced a similar issue: I’ve ended up using the Python API to create my deployments. This is the recommended way for more complex deployment scenarios.

See the following Slack thread for more information: Slack

Can’t access the slack thread. Is this information available elsewhere?

Hey @duckdatum sorry for the late reply.
The Slack thread is still available for me, but I can relay what I learned here.

CLI Deployments at the time weren’t recommended for production use cases, but prototyping. I’m not sure if this has changed with the addition of the deployment wizards, but I imagine that the import problems I describe here may remain nonetheless.

One way to solve this issue was to install whatever library was needed, which would essentially boil down to having a venv per Deployment (which I thought was quite a pain to manage).

What I ended up doing was the following:

def define_raw_deployment():
    # Allows defining a custom schema since it is not automatically recognized when deployed via Python API
    schema = {
        "title": "Parameters",
        "type": "object",
        "properties": {
            "pipeline": {"type": "string", "default": "raw_export"},
            "data_type": {
                "type": "string",
                "enum": ["raw", "primary"],
                "required": "true",
                "description": "Whether to process and upload raw or primary data",
            }
        },
        "required": ["pipeline", "data_type"]
    }
    parameters = {
        "pipeline": "raw_export",
        "data_type": 'raw',
    }
    
    infrastructure = DockerContainer.load("infrastructure-block-name")
    storage = S3.load("s3-storage-where-deployment-definitions-will-be-saved")
    
    # Mon - Fri @ 4:55
    schedule = CronSchedule(cron="55 4 * * 1-5", timezone="Europe/Berlin", day_or=True)
    
    deployment: Deployment = Deployment(
        name="raw",
        version=1,
        description="Enter deployment description here",
        tags=["mytag"],
        schedule=schedule,
        is_schedule_active=True,
        work_queue_name="default",
        flow_name="name_of_flow",
        parameters=parameters,
        infrastructure=infrastructure,
        storage=storage,
        entrypoint="flow_definitions/extract_site_data.py:read_and_save_data",
        parameter_openapi_schema=schema,
    )

    deployment.apply(upload=True)

if __name__ == "__main__":
    print("Defining deployments")
    define_raw_deployment()
    print("Done")

You can define multiple deployments per script. These can be invoked and executed on the prefect host CLI as a python script, e.g.:

python flow_definitions/my_deployment_file.py

Note that with this approach there might still be some gotchas with regards to relative imports. Additionally, it will always upload your “full” definitions to the indicated storage bucket, and I haven’t found a good way to support versioning yet without coming up with a rather complicated folder structure. I currently have all my entrypoints in the same directory as the deployment files, though your mileage may vary I guess.

Hope it helps.

1 Like

Neat, thanks for the reply. I’m not sure if the slack is publicly accessible, but it was giving me trouble with access.

It seems like your solution is to replace the prefect deployment build command with an alternative python function? So we’d just run that instead? Neat idea actually.

Have you had a chance to try this new Flow.serve() functionality? It may offer a way around this issue as well, but I haven’t tested anything yet.

Hi I stumbled on the same error, but I could figure out something out to continue using CLI deployment. I don’t know if this is the same case for filpano, but I had a custom ‘logging.yml’ file on my .prefect folder at user home.

This was causing the cli to break during deployment. If I remove the logging.yml file from prefect home, deploy, and then copy back the file. CLI deployment works fine.

Hope this can help someone