Replicating Prefect 1 Result Target Functionality in Prefect 2

Haddon_Korinek · November 23, 2022, 2:55pm

Hi There,

We’re exploring what an upgrade path from 1.x to 2.x would look like for us.

We currently extend Prefect 1’s Result class and use Task ‘target’ functionality as way to achieve tiered result caching for our tasks. This “TieredResult” can be given two uris: read_write_uri, read_only_uri. When running things in your local environment, the read_write_uri points to a result output directory that you have write access to whereas the read_only_uri points to a directory in which you only have read access to. This read_only_uri lets us have an architecture where our main shared research environment (which is constantly running) can produce artifacts which all researchers can reference in order to avoid re-running every flow in their local environment (which is not feasible). People can check out the code and modify small sections of the flow graph to ensure their changes still fit into the overall research graph. When they re-run the entire flow locally, only the parts they changed will end up with ‘target’ file cache misses.

I dug into the Prefect 2 caching a bit and don’t yet see a way to achieve something like this, but I could be missing something. In Prefect 2, the concept of “targets” seems to have disappeared and instead there only exists caching logic around the usage of “cache_key_fn”. The caching layer works by proposing a “Running” state change to the Orion server which then looks up the request cache key in a SQL database. If it exists, it will reject the “Running” state change and move it to Completed (Cached) instead. I tried looking for ways to hook into the database layer and possibly extend OrionDBInterface so that Orion can work with multiple databases (your local one + the shared one) and use the shared database as one of the sources for cache hits. But it looks like it would be very difficult to maintain.

For use of ‘result_storage’ in flows, it is very easy to get functionality like I’m looking for by creating a
custom Block which references both WritableFileSystem and ReadableFileSystem blocks. But I’m having difficulty in finding a way to use this concept for cache hits.

Any help would be greatly appreciated.

Thank you,

Haddon

anna_geller · November 23, 2022, 3:58pm

you could implement a cache_key_fn function that checks if a given file exists (e.g. looking it up using a file system block) and if so, returns its path as a cache_key

you are definitely on the right track here in using cache_key_fn and file system blocks

anna_geller · November 23, 2022, 4:11pm

I opened a GitHub issue for you here:

github.com/PrefectHQ/prefect

Add a `cache_key_fn` utility to support file-dependent task caching similar to targets in v1

opened 04:10PM - 23 Nov 22 UTC

anna-geller

enhancement v2 status:triage v1-parity

### First check - [X] I added a descriptive title to this issue. - [X] I used t…he GitHub search to find a similar request and didn't find it. - [X] I searched the Prefect documentation for this feature. ### Prefect Version 2.x ### Describe the current behavior Some users need guidance during migration from v1 targets. We could add a cache key function utility to make that easier for them ### Describe the proposed behavior Similar to `from prefect.tasks import task_input_hash`, add a function for file-based caching we could leverage file system blocks here as well ### Example Use https://discourse.prefect.io/t/replicating-prefect-1-result-target-functionality-in-prefect-2/1959/2 ### Additional context _No response_

Haddon_Korinek · November 23, 2022, 4:21pm

Thanks for helping Anna.

If I return a path from cache_key_fn that points to one of these pre computed shared artifacts, wouldn’t it still trigger a cache miss when running locally because my local Orion DB doesn’t have the corresponding row in the task_run_state_cache table? When I was stepping through the code, it looked like the only way to re-use one of these Result objects was if I had a cache hit.

anna_geller · November 23, 2022, 8:21pm

you probably know more about it than I do, so feel free to add any more context to the GH issue

TTBOMK it’s mainly about the cache key - when it matches, the task run is cached, if not, it’s recomputed

Topic		Replies	Views
How to use targets to cache task run results based on a presence of a specific file (i.e. how to use Makefile or Luigi type file-based caching)? Archive prefect-1-0 , caching , results , targets	8	2933	December 21, 2022
Tracing cached results Help caching , data-science-workflows	0	385	May 24, 2023
Does Prefect support caching on a flow level rather than only on a task level? How to avoid task run recomputation of several tasks in a flow? Archive prefect-1-0 , caching , results , checkpointing	7	1990	October 21, 2022
Prefect 2.7.8 adds timeline view, a new task decorator to invalidate cache, support for memory limits and privileged containers for Docker infrastructure, and more! Announcements prefect-2-0 , release-notes	0	711	January 12, 2023
Custom Prefect Result Show and Tell prefect-2-0	2	839	December 23, 2022

Replicating Prefect 1 Result Target Functionality in Prefect 2

Related topics