I would like to create program which does the following:
At the frequency of once a day, generate a list of 100 or so random usernames
After the above task is complete, at the frequency of once every hour, generate a random score and assign it to each of the users total score (by summation to previous total).
How would I go about constructing a DAG for this? Will I necessarily need 2 flows, or sub flows?
Essentially, is there a way to create 2 flows, where one is the parent and another is a child, and have them both run on different schedules, i.e the child flow getting executed more frequently?
I imagine in python without prefect, I would schedule these tasks as individual scripts running at repeated intervals. The main flow, by my understanding, would write to a database and the sub-flow which is scheduled to happen more frequently would read from this database. I am a beginner with dataflow pipeline tools such as prefect or airflow, and would like to proceed with prefect2.0.
Perhaps I can rephrase my problem. In prefect 1.0, what I would have done in this case would be to have a single flow, with let us say a starter task which happens once a day, and once this task has triggered there would be sub tasks which trigger every 15 minutes. I would set the schedule of this flow to happen once in every 15 minutes, with the starter task having a cache duration of 1 day, hence it is only executed once a day. The positive about this is that the data from the cache persists across multiple flow runs. This does not seem to the case with prefect 2.0, as whenever I schedule a flow, each instance seems to be independent of the previous one, and tasks do not carry over their caches information across flow runs. Should I go about solving this problem in a different way?
This is just from a sample project, but let me give an example maybe. I have 3 pipelines.
i) Sync pipeline - Syncs all user login data from a source
ii) Aggregate pipeline - At regular intervals throughtout the day, we accumulate which users spent time using the app and how much time they spent on it.
iii) Cache pipeline - At the end of the day, we’d like to store this information into a database.
I am just a beginner too, so I apologize if my explanations are vague. How would I go writing a prefect program to do the above?