2.0b10 much slower than 2.0b7?

I run some code directly or in 2.0b7 it does 30+ iterations a second completing in a few seconds. Yet on 2.0b10 it is massively slower at around 8 seconds per iteration. I posted similar issue on 2.0b7 when using ConcurrentTaskRunner. It worked fine switching to SequentialTaskRunner but on the the new release this is also really really slow.

Example code is below. I will try to create example that does not include spacy. However I just wonder if there is a simple explanation and I just need to tweak my code or settings for the new release?:

import logging
import os


log = logging.getLogger(__name__)

from prefect import flow, task
from prefect.task_runners import SequentialTaskRunner, ConcurrentTaskRunner

from tqdm.auto import tqdm
import spacy

def task1():
    log.warning("running the task")
    nlp = spacy.load("en_core_web_sm")
    log.warning("loaded data")
    out = [nlp(x) for x in tqdm(["the cat sat on the mat." * 20] * 100)]

def flow1():
    log.warning("running the flow ******** ")

if __name__ == "__main__":
    log.warning("completed raw function")

Sometimes I just need to post the question for the answer to appear in my head :grinning:

nlp.pipe(x, n_process=1)

Generally any packages with their own multiprocessing may conflict with prefect2 so it is necessary to force them to be single process.

For the same reason for huggingface pipelines I set:

os.environ["TOKENIZERS_PARALLELISM"] = "False"
os.environ["OMP_NUM_THREADS"] = "1"

Would be useful if possible to detect this issue then raise an error rather than just hanging. If not then put a note in the docs. It is not obvious what restrictions are necessary e.g. threads, processes, async functions…

Unsure why it worked fine in the previous version but not in the latest - something changed I guess.

I was going to say that an alternative solution might be to use prefect2 tags to force prefect2 to execute some tasks without multiprocessing. However that may not work because this issue still occurs with SequentialTaskRunner even though that is presumably not using any multiprocessing?

This worked with spacy. However when I run a pytorch model I have the same issue. It worked in previous version of prefect2 but in the new version it is very slow even with SequentialTaskRunner running just a single task.

Thanks for testing! Was wondering, what about 2.0b9? I assume the same?

And would you be able to share some output logs?

1 Like

@simon_mackenzie check out this announcement about the current status of Prefect 2.0:

I would wait with any performance optimizations until the stable release of 2.0 on Wednesday - thanks for understanding