Replicating Prefect 1 Result Target Functionality in Prefect 2

Hi There,

We’re exploring what an upgrade path from 1.x to 2.x would look like for us.

We currently extend Prefect 1’s Result class and use Task ‘target’ functionality as way to achieve tiered result caching for our tasks. This “TieredResult” can be given two uris: read_write_uri, read_only_uri. When running things in your local environment, the read_write_uri points to a result output directory that you have write access to whereas the read_only_uri points to a directory in which you only have read access to. This read_only_uri lets us have an architecture where our main shared research environment (which is constantly running) can produce artifacts which all researchers can reference in order to avoid re-running every flow in their local environment (which is not feasible). People can check out the code and modify small sections of the flow graph to ensure their changes still fit into the overall research graph. When they re-run the entire flow locally, only the parts they changed will end up with ‘target’ file cache misses.

I dug into the Prefect 2 caching a bit and don’t yet see a way to achieve something like this, but I could be missing something. In Prefect 2, the concept of “targets” seems to have disappeared and instead there only exists caching logic around the usage of “cache_key_fn”. The caching layer works by proposing a “Running” state change to the Orion server which then looks up the request cache key in a SQL database. If it exists, it will reject the “Running” state change and move it to Completed (Cached) instead. I tried looking for ways to hook into the database layer and possibly extend OrionDBInterface so that Orion can work with multiple databases (your local one + the shared one) and use the shared database as one of the sources for cache hits. But it looks like it would be very difficult to maintain.

For use of ‘result_storage’ in flows, it is very easy to get functionality like I’m looking for by creating a
custom Block which references both WritableFileSystem and ReadableFileSystem blocks. But I’m having difficulty in finding a way to use this concept for cache hits.

Any help would be greatly appreciated.

Thank you,

Haddon

you could implement a cache_key_fn function that checks if a given file exists (e.g. looking it up using a file system block) and if so, returns its path as a cache_key

you are definitely on the right track here in using cache_key_fn and file system blocks

I opened a GitHub issue for you here:

Thanks for helping Anna.

If I return a path from cache_key_fn that points to one of these pre computed shared artifacts, wouldn’t it still trigger a cache miss when running locally because my local Orion DB doesn’t have the corresponding row in the task_run_state_cache table? When I was stepping through the code, it looked like the only way to re-use one of these Result objects was if I had a cache hit.

you probably know more about it than I do, so feel free to add any more context to the GH issue

TTBOMK it’s mainly about the cache key - when it matches, the task run is cached, if not, it’s recomputed