Parallel R code on SLURM: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
Line 2: Line 2:
== Using R code on SLURM for embarrassingly parallel calculations ==
== Using R code on SLURM for embarrassingly parallel calculations ==


The most well-known R packages that provide parallel functionality, e.g. doParallel or doSNOW, do not work properly on the HPC. Using these packages will be particularly problematic when you try to run (array) jobs over multiple nodes. However, the [https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html rslurm package] allows you to do [https://en.wikipedia.org/wiki/Embarrassingly_parallel embarrassingly parallel] calculations on SLURM. The package automatically divides the computation over multiple nodes and writes the necessary submission scripts. It also includes functions to retrieve and combine the output from different nodes, as well as wrappers for common SLURM commands.
The most well-known R packages that provide parallel functionality, e.g. doParallel or doSNOW, do not work properly on Anunna. Using these packages will be particularly problematic when you try to run (array) jobs over multiple nodes. However, the [https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html rslurm package] allows you to do [https://en.wikipedia.org/wiki/Embarrassingly_parallel embarrassingly parallel] calculations on SLURM. The package automatically divides the computation over multiple nodes and writes the necessary submission scripts. It also includes functions to retrieve and combine the output from different nodes, as well as wrappers for common SLURM commands.


=== Example code ===
=== Example code ===

Latest revision as of 20:07, 19 February 2019

Using R code on SLURM for embarrassingly parallel calculations

The most well-known R packages that provide parallel functionality, e.g. doParallel or doSNOW, do not work properly on Anunna. Using these packages will be particularly problematic when you try to run (array) jobs over multiple nodes. However, the rslurm package allows you to do embarrassingly parallel calculations on SLURM. The package automatically divides the computation over multiple nodes and writes the necessary submission scripts. It also includes functions to retrieve and combine the output from different nodes, as well as wrappers for common SLURM commands.

Example code

library(rslurm)
sjob <- slurm_apply(test_func, pars, jobname = 'test_apply',
                   nodes = 2, cpus_per_node = 2, submit = FALSE)

Please be aware that new Slurm jobs will have a few seconds of lead time before executing, so try to make sure that your new tasks are appropriately long-living (~60s is a good minimum). Otherwise most of your 'compute' time will be waiting for jobs to start and stop.

External links

Vignette for the rslurm package