Hi There,
We’re exploring what an upgrade path from 1.x to 2.x would look like for us.
We currently extend Prefect 1’s Result class and use Task ‘target’ functionality as way to achieve tiered result caching for our tasks. This “TieredResult” can be given two uris: read_write_uri, read_only_uri. When running things in your local environment, the read_write_uri points to a result output directory that you have write access to whereas the read_only_uri points to a directory in which you only have read access to. This read_only_uri lets us have an architecture where our main shared research environment (which is constantly running) can produce artifacts which all researchers can reference in order to avoid re-running every flow in their local environment (which is not feasible). People can check out the code and modify small sections of the flow graph to ensure their changes still fit into the overall research graph. When they re-run the entire flow locally, only the parts they changed will end up with ‘target’ file cache misses.
I dug into the Prefect 2 caching a bit and don’t yet see a way to achieve something like this, but I could be missing something. In Prefect 2, the concept of “targets” seems to have disappeared and instead there only exists caching logic around the usage of “cache_key_fn”. The caching layer works by proposing a “Running” state change to the Orion server which then looks up the request cache key in a SQL database. If it exists, it will reject the “Running” state change and move it to Completed (Cached) instead. I tried looking for ways to hook into the database layer and possibly extend OrionDBInterface so that Orion can work with multiple databases (your local one + the shared one) and use the shared database as one of the sources for cache hits. But it looks like it would be very difficult to maintain.
For use of ‘result_storage’ in flows, it is very easy to get functionality like I’m looking for by creating a
custom Block which references both WritableFileSystem and ReadableFileSystem blocks. But I’m having difficulty in finding a way to use this concept for cache hits.
Any help would be greatly appreciated.
Thank you,
Haddon