I’ve been using Prefect to run my data analysis workflows for my research in pharmacoepidemiology. In my projects I often find myself having to leave Python land to use packages only available in R (e.g. the MICE package for multiple imputation). Coming from Snakemake (another workflow tool) this was already built in, but in Prefect I’ve had to create my own integration.
Since this has proven very useful to me, I wanted to share it with the rest of the community. If you have any feedback or a better way of doing it, please let me know!
A minimal working example is available here:
Basically what it does is:
- Create a new task class RTask inheriting Task
- The run method of RTask takes an R script file and any arguments
- Any Pandas DataFrames or Series passed as arguments are written to temp files and the paths to these files are replaced as the arguments
- The R script is run with the supplied arguments and the resulting dataframe (if any) is loaded into a Pandas DataFrame and returned
All the best,