Subflows and backfilling

Hi all, I’m completely new to Prefect. Actually, I’m not even using it, as I’m looking for an orchestration system to escape the hell of plenty cronjobs executing papermill. I’m trying to understand how to tackle something that is integral part of my workflow in Prefect and hope you can help me clarify.

All the processing I’m doing is on day granularity. That is, I have one job that produces data for one day, then I have several other jobs that read the parquet file produced by the first job, and produce some other parquet files. Then the third level of jobs consumes those parquet files and produces reports that are emailed and/or HTTP POSTed to an internal service.

All of this was running on schedule, having an ample time in between the first, the second, and the third layer. If I understand Prefect model, I could have one deployment with the top-level flow calling tasks to produce the first parquet file, and then having flows for all other jobs and call them from the top level flow.

However, I’m not clear how would backfill work in this case. Sometimes I need to change the top-level job and then regenerate 3 months worth of parquet files, and I would like to trigger all dependent jobs. However, sometimes I only need to change the third level job, and in that case I don’t need to regenerate the top level parquets. If I understand correctly, for backfilling I would have to create a new python file and call the right flows (with right parameters, like execution date), for example, the third-level job, in a loop. Would that also invoke the parent flows?

1 Like

The easiest way would be to leverage parameters to ingest based on those parametrized dates. We have a blog post on the roadmap to tackle that

EDIT: this blog post is available here

Thanks! I’ve managed to build a setup and I’m trying out the backfilling now, as described. It seems to work, but I find it really hard to follow the executions because the flow runs have generated names. Is there a way to assign more descriptive names, for example, to reflect the date being backfilled.

1 Like

Excellent question! We do have an open issue for it here:

but I’ll circle back with the product team to see how best we can support it. Thanks for bringing this up and great job figuring out the backfill pattern. If you would like to share your solution here, that would be great!