Parallel R code on SLURM

From HPCwiki
Jump to navigation Jump to search

Using R code on SLURM for embarrassingly parallel calculations

The most well-known R packages that provide parallel functionality, e.g. doParallel or doSNOW, do not work properly on Anunna. Using these packages will be particularly problematic when you try to run (array) jobs over multiple nodes. However, the rslurm package allows you to do embarrassingly parallel calculations on SLURM. The package automatically divides the computation over multiple nodes and writes the necessary submission scripts. It also includes functions to retrieve and combine the output from different nodes, as well as wrappers for common SLURM commands.

Example code

library(rslurm)
sjob <- slurm_apply(test_func, pars, jobname = 'test_apply',
                   nodes = 2, cpus_per_node = 2, submit = FALSE)

Please be aware that new Slurm jobs will have a few seconds of lead time before executing, so try to make sure that your new tasks are appropriately long-living (~60s is a good minimum). Otherwise most of your 'compute' time will be waiting for jobs to start and stop.

External links

Vignette for the rslurm package