Hi all,
I hope you can help me with creating deployments from jupyterhub notebooks.
As my colleagues prefer to run everything from notebooks, they asked me if I could come up with a method for them to schedule their notebooks in prefect (because of awesome scheduling capabilities), and to do that from the comfort of their other notebooks. For this, they only need to run a notebook using papermill in a specific conda environment, get the result as HTML and mail it to designated email addresses.
Toward that goal, I created a small package that contains a flow for running a notebook and the following entry function. I installed this package in the environment where prefect orion and agents run, as well as in all conda environments. The entry function is this one:
@validate_arguments
def schedule(
name: str,
notebook_url: AnyHttpUrl,
tags: List[str],
queue: str,
schedule: schemas.schedules.SCHEDULE_TYPES,
parameters: Optional[Dict[str, Any]] = None,
retries: NonNegativeInt = 0,
timeout_seconds: Optional[Union[int, float]] = None,
email_to: Optional[Union[List[str], str]] = None,
) -> Deployment:
flow_name = _get_flow_name() # decorate the name with the calling user's name
with temporary_settings(updates={PREFECT_API_URL: "..."}):
rflow = run_report.with_options(
retries=retries,
name=flow_name,
retry_delay_seconds=60,
)
if timeout_seconds:
rflow = rflow.with_options(
timeout_seconds=timeout_seconds,
)
d = Deployment.build_from_flow(
flow=rflow,
parameters={
"name": name,
"notebook_url": notebook_url,
"parameters": parameters, # what we pass to the notebook
"retries": retries,
"email_to": email_to,
},
name=name,
schedule=schedule,
version=1,
work_queue_name=queue,
tags=tags,
skip_upload=True,
# must have path and entrypoint set to avoid https://github.com/PrefectHQ/prefect/issues/6777
path=".",
entrypoint="nikolas_prefect_utils.core:run_report",
)
return d
When I call the above from the command line it creates a flow and the deployment. Of course, I have to call d.apply()
.
However, this is failing for the asked use-case, to run it from Jupyterhub notebooks. First, I don’t get a deployment, but a coroutine. Second, if I do this in a cell:
d = await nikolas_prefect_utils.schedule(....)
await d.apply()
I get back UUID, yet there is no flow nor deployment on our prefect instance.
Does anyone have an idea what I’m doing wrong here? How to debug where is the issue coming from?