Does Prefect 2.0 support Redis for flow storage?

View in #prefect-community on Slack

Zhiyuan_Ma @Zhiyuan_Ma: Hi, I was wondering if anyone has tried using Redis as the storage backend for orion (prefect 2)? I see there is no built-in support to it. https://orion-docs.prefect.io/concepts/storage/
The closest we get is KV Server Storage. To use redis, does that mean I need to expose a HTTP API in between Orion and the redis server?

Storage - Prefect 2.0

Jeremiah @Jeremiah: Redis support for storage is definitely possible! It is on our roadmap but if you want to implement it yourself, the KV storage is a good model.

Zhiyuan_Ma @Zhiyuan_Ma: Thanks you for the prompt reply! I am glad to hear that redis support is on the way. My project involves a large data processing pipeline that is currently built with redis + celery. We use redis + rejson (https://oss.redis.com/redisjson/) as our kv store for keeping track of the celery tasks, celery beat schedules, and intemediate data product cache.

I see that the KV storage expect JSON as data blob, are the data blob always queried altogether on the KV server, or the internal calls to the KV server API involves querying a subtree of the JSON data?

I am asking this because the JSON blob we stored in the redis sometimes is quite large and the nice thing about REJSON is that it can query deeply nested values in a large JSON blob.

Anna_Geller @Anna_Geller: > We use redis + rejson (https://oss.redis.com/redisjson/) as our kv store for keeping track of the celery tasks
Given that Prefect 2.0 ships with work queues, it’s possible that you no longer need to use Redis for that. Queued runs are stored in the backend.
In general, I think you don’t need to do much to configure storage. Prefect will take care of serializing the flow as needed - you only need to build your flow and configure storage via CLI or on the DeploymentSpec. Not sure if this answers your question though.

Zhiyuan_Ma @Zhiyuan_Ma: I see what you mean. With Redis + Celery, we use Redis to do ALL the data management, including the queue and task metadata and task results.

With prefect2, like you said, the queue, flow, task and task scheduling is builtin (stored in the sqlite or postgres database), so there is no need to use redis for those any more. However, our use case still needs redis as the data store, to store application state, intermediate data product.

What our celery tasks do is to create/update these data products stored in the redis in realtime, trigged by external events (e.g, a new observation request, new data files appearing on the filesystem, etc)
As far as I understand, the prefect2 store is exactly to do this, managing task results.

I can imagine with the KV store, I can run a task and post the result as json to the KV store. In our case, the JSON data blob can be large. A lot of the tasks are actually only updating a single nested value of the large json, due to performance concerns. That is why I am asking if the KV Store query used by prefect2 are expected to be always for the whole JSON, or prefect2 actually could handle partial update of the stored values.

Anna_Geller @Anna_Geller: Thanks for explaining, sounds like a perfect use case for Prefect 2.0!

> prefect2 store is exactly to do this, managing task results.
not exactly. Storage is to store flow code. Task run result storage is a bit different and this is something we are still working on, so it’s best if you watch our announcements for new releases and wait a bit until this feature is out.

Zhiyuan_Ma @Zhiyuan_Ma: I see. I’ll keep an eye for announcement. Thank you for the answers.

In this case, I guess at the least, we can bypass the prefect store mechanism for storing result, by always returning None, or some state code (success/fail) in all the tasks, and in the task body, we do whatever we want to talk the redis database and update the application state