How can I change the number of Dask workers in a DaskExecutor based on a custom parameter value?

anna_geller · January 31, 2022, 10:24pm

You can pass a Callable to the DaskExecutor that sizes the DaskExecutor dynamically at runtime.

from prefect import Flow
from prefect.executors import DaskExecutor

def dynamic_executor():
    from distributed import LocalCluster
    # could be instead some other class e.g. from dask_cloudprovider.aws import FargateCluster
    return LocalCluster(n_workers=prefect.context.parameters["n_workers"])

with Flow("example", executor=DaskExecutor(cluster_class=dynamic_executor)) as flow:
    flow.add_task(Parameter("n_workers", default=5))

anna_geller · April 12, 2022, 11:20am

Example use case:

View in #prefect-community on Slack

@Eddie_Atkinson: This is a really silly question, but I can’t quite clue out the answer from Dask’s and Prefect’s documentation. My aim is to use Prefect to orchestrate flows with varying memory requirements using a Dask cluster.

As an MVP I’ve set up an ECSRun flow using the LocalDaskExecutor with 30GB of RAM. For large jobs this flow OOMs and gets killed by the Prefect scheduler. My question is this: If I set up a Dask cluster to run these jobs would it gracefully handle memory issues?

That is to say if I had 30GB of RAM in the cluster and a job that required 50GB would Dask OOM or would it simply run slower? Do I need to modify my code to use Dask dataframes or is there some smarts here I’m not quite across?

@Kevin_Kho: It would OOM. Dask does have memory spillover but I think the default is that 30% can be shuffled to disk. You memory requirements would like a lot more. It is also not performant once you hit this. So you really need to bump up resources. What you can do though is parameterize the size of the Dask cluster. See this

Prefect Community: How can I change the number of Dask workers in a DaskExecutor based on a custom parameter value?

Not a silly question btw

@Eddie_Atkinson: The parameterisation is really cool
Thanks

Topic		Replies	Views
How to allocate more memory or more worker nodes on a per flow run basis? Archive prefect-1-0 , storage , run_config , kubernetes , dask , executor , kubernetes-run , coiled , dask-executor , memory_request , cpu_request	0	939	January 31, 2022
How to pass custom worker resource annotations to Dask scheduler when using DaskExecutor? Archive prefect-1-0 , dask , dask-executor , cpu-memory-resource-allocation	0	621	April 1, 2022
How can I configure a specific Dask cluster class? Archive migration-guide , prefect-1-0 , prefect-2-0 , aws , dask , executor , task-runner , parallel-processing , fargate , dask-task-runner , dask-cloud-provider , docker-image	0	762	January 21, 2022
How can I parallelize execution across 8 CPU cores? Archive migration-guide , prefect-1-0 , prefect-2-0 , dask , executor , task-runner , parallel-processing , dask-task-runner , local-dask-executor , infrastructure , multithreading , multiprocessing	0	2294	January 31, 2022
How can I configure my flow to run with Dask? Archive migration-guide , prefect-1-0 , prefect-2-0 , dask , executor , task-runner , parallel-processing , dask-task-runner , getting-started	2	2395	July 5, 2022

How can I change the number of Dask workers in a DaskExecutor based on a custom parameter value?

Related topics