Running R scripts with Prefect

Hi!

I’ve been using Prefect to run my data analysis workflows for my research in pharmacoepidemiology. In my projects I often find myself having to leave Python land to use packages only available in R (e.g. the MICE package for multiple imputation). Coming from Snakemake (another workflow tool) this was already built in, but in Prefect I’ve had to create my own integration.

Since this has proven very useful to me, I wanted to share it with the rest of the community. If you have any feedback or a better way of doing it, please let me know!

A minimal working example is available here:

Basically what it does is:

  1. Create a new task class RTask inheriting Task
  2. The run method of RTask takes an R script file and any arguments
  3. Any Pandas DataFrames or Series passed as arguments are written to temp files and the paths to these files are replaced as the arguments
  4. The R script is run with the supplied arguments and the resulting dataframe (if any) is loaded into a Pandas DataFrame and returned

All the best,
Peter

7 Likes

Wow, this is amazing! Seamlessly passing data from a Python to R task - so cool! :100:

I’ll ask our integrations team whether they can add the RTask to the task library.

Thank you so much for sharing, this will help so many people.

2 Likes

Awesome @peter, thanks for sharing!

2 Likes

Glad you like it!
That would be awesome!

I don’t know how common R is in this part of the data science sphere, but if more people are using it with Prefect, would it be possible to add an “R” tag to the forums? Since it’s just a one letter word it’s not possible to search for just “R” using the search box.

1 Like

Good catch - I added tags, including “r” :slight_smile:

2 Likes

Adding for tracking:

1 Like