Calling `Deployment.build_from_flow` from jupyterhub notebook doesn't create deployments

Hi all,

I hope you can help me with creating deployments from jupyterhub notebooks.

As my colleagues prefer to run everything from notebooks, they asked me if I could come up with a method for them to schedule their notebooks in prefect (because of awesome scheduling capabilities), and to do that from the comfort of their other notebooks. For this, they only need to run a notebook using papermill in a specific conda environment, get the result as HTML and mail it to designated email addresses.

Toward that goal, I created a small package that contains a flow for running a notebook and the following entry function. I installed this package in the environment where prefect orion and agents run, as well as in all conda environments. The entry function is this one:

@validate_arguments
def schedule(
    name: str,
    notebook_url: AnyHttpUrl,
    tags: List[str],
    queue: str,
    schedule: schemas.schedules.SCHEDULE_TYPES,
    parameters: Optional[Dict[str, Any]] = None,
    retries: NonNegativeInt = 0,
    timeout_seconds: Optional[Union[int, float]] = None,
    email_to: Optional[Union[List[str], str]] = None,
) -> Deployment:
    flow_name = _get_flow_name() # decorate the name with the calling user's name

    with temporary_settings(updates={PREFECT_API_URL: "..."}):
        rflow = run_report.with_options(
            retries=retries,
            name=flow_name,
            retry_delay_seconds=60,
        )
        if timeout_seconds:
            rflow = rflow.with_options(
                timeout_seconds=timeout_seconds,
            )
        d = Deployment.build_from_flow(
            flow=rflow,
            parameters={
                "name": name,
                "notebook_url": notebook_url,
                "parameters": parameters,  # what we pass to the notebook
                "retries": retries,
                "email_to": email_to,
            },
            name=name,
            schedule=schedule,
            version=1,
            work_queue_name=queue,
            tags=tags,
            skip_upload=True,
            # must have path and entrypoint set to avoid https://github.com/PrefectHQ/prefect/issues/6777
            path=".",
            entrypoint="nikolas_prefect_utils.core:run_report",
        )

        return d

When I call the above from the command line it creates a flow and the deployment. Of course, I have to call d.apply().

However, this is failing for the asked use-case, to run it from Jupyterhub notebooks. First, I don’t get a deployment, but a coroutine. Second, if I do this in a cell:

   d = await nikolas_prefect_utils.schedule(....)
   await d.apply()

I get back UUID, yet there is no flow nor deployment on our prefect instance.

Does anyone have an idea what I’m doing wrong here? How to debug where is the issue coming from?

2 Likes

afaik we have an open internal issue for this but there are some complications with this due to Jupyter limitations, I’d recommend not using Jupyter for deployments and leveraging Python scripts instead, e.g. iterate in Jupyter for local development but for production deployments use scripts

one limitation is that you would need to run everything async to make that work

Hi Anna,

Thanks, I pass that suggestion to my colleagues.

Just to make sure I understand, for get deployments from jupyter to work, I need to run everything async. But in my example I do use async (that is, I invoke await) twice. I even get UUID back, but no deployment. Could it be that this is a bug?

1 Like

I generally was trying to persuade you not to do it from Jupyter at all :smile:

sorry for not being more helpful here but I don’t know enough here

if you suspect this might be a bug, it’s best to submit a bug report as a GitHub issue - the integrations team engineers will give you much better guidance than me here

This repo would be a great place to open an issue: GitHub - PrefectHQ/prefect-jupyter: Prefect integrations interacting with Jupyter.

Have you thought about recommending a process where those data scientists/Jupyter users commit their flows and CI/CD takes care of deployments? would be cleaner for them (no clutter in notebooks) + less complicated than deploying from Jupyter

you can always add schedule from the UI