Parallel R code on SLURM: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
(Created the page Parallel R code on SLURM)
 
No edit summary
Line 3: Line 3:


The most well-known R packages that provide parallel functionality, e.g. doParallel or doSNOW, do not work properly on the HPC. Using these packages will be particularly problematic when you try to run (array) jobs over multiple nodes. However, the [https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html rslurm package] allows you to do [https://en.wikipedia.org/wiki/Embarrassingly_parallel embarrassingly parallel] calculations on SLURM. The package automatically divides the computation over multiple nodes and writes the necessary submission scripts. It also includes functions to retrieve and combine the output from different nodes, as well as wrappers for common SLURM commands.
The most well-known R packages that provide parallel functionality, e.g. doParallel or doSNOW, do not work properly on the HPC. Using these packages will be particularly problematic when you try to run (array) jobs over multiple nodes. However, the [https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html rslurm package] allows you to do [https://en.wikipedia.org/wiki/Embarrassingly_parallel embarrassingly parallel] calculations on SLURM. The package automatically divides the computation over multiple nodes and writes the necessary submission scripts. It also includes functions to retrieve and combine the output from different nodes, as well as wrappers for common SLURM commands.
=== Example code ===
<code lang='R'>library(rslurm)
sjob <- slurm_apply(test_func, pars, jobname = 'test_apply',
                    nodes = 2, cpus_per_node = 2, submit = FALSE)</code>
Please be aware that new Slurm jobs will have a few seconds of lead time before executing, so try to make sure that your new tasks are appropriately long-living (~60s is a good minimum). Otherwise most of your 'compute' time will be waiting for jobs to start and stop.


== External links ==
== External links ==
[https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html Vignette for the rslurm package]
[https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html Vignette for the rslurm package]

Revision as of 10:23, 23 May 2018

Using R code on SLURM for embarrassingly parallel calculations

The most well-known R packages that provide parallel functionality, e.g. doParallel or doSNOW, do not work properly on the HPC. Using these packages will be particularly problematic when you try to run (array) jobs over multiple nodes. However, the rslurm package allows you to do embarrassingly parallel calculations on SLURM. The package automatically divides the computation over multiple nodes and writes the necessary submission scripts. It also includes functions to retrieve and combine the output from different nodes, as well as wrappers for common SLURM commands.

Example code

library(rslurm)
sjob <- slurm_apply(test_func, pars, jobname = 'test_apply',
                   nodes = 2, cpus_per_node = 2, submit = FALSE)

Please be aware that new Slurm jobs will have a few seconds of lead time before executing, so try to make sure that your new tasks are appropriately long-living (~60s is a good minimum). Otherwise most of your 'compute' time will be waiting for jobs to start and stop.

External links

Vignette for the rslurm package