R: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
No edit summary
Phase 1 § 5 P1.5.6: merge Installing R packages locally + Control R environment using modules + Parallel R code on SLURM into R. Promoted R_LIBS_USER/.Renviron method, folded in Parallel R (rslurm), dropped stale pre-Lmod module-building content. (via update-page on MediaWiki MCP Server)
 
(10 intermediate revisions by one other user not shown)
Line 1: Line 1:
R is a language and environment for statistical computing and graphics, widely used across the life sciences. On Anunna you can use R from the command line in batch jobs, or interactively through [[RStudio]] in the Apps Portal. This page covers the provided R modules, installing your own packages into a local library, running R in a SLURM job, and parallelising R across nodes.


== Modules ==


At the HPC R can be used in the command line with batch scripts submitted via slurm or via a web GUI, RStudio, through Open Ondemand.
Anunna provides one R version per [[Environment Modules | module bucket]]. Load the bucket first, then the R version for that year:


=Modules=
<syntaxhighlight lang="bash">
module load 2023
module load R/4.3.2
</syntaxhighlight>


One version of R is installed for every year. These are accessible through environment modules. Thus in order to access a specific version of R one must first load the year module, followed by the available R version for that year.
Bundle modules add a large set of commonly-used packages on top of base R. The module files list the packages they contain, so you can search them with <code>module key</code>.


Additionally, extension bundle modules for R are also present.These modules contain a list of the installed extensions in their module files and are thus searcheable
=== Finding a package ===


===Searching for extensions===
To find which module provides a particular package, load a bucket and search with <code>module key</code>:


In order to search for a particular extension, use the module '''key''' command
<syntaxhighlight lang="bash">
 
module load 2023
<nowiki>module key [extensionName]</nowiki>
module key terra
</syntaxhighlight>


For instance, when searching for the '''terra''' extension,
which prints:


<nowiki>module load 2023
<syntaxhighlight lang="text">
module key terra</nowiki>
 
The following output is then printed
 
<nowiki>
-----------------------------------------------------------------------------------------------------------
The following modules match your search criteria: "terra"
The following modules match your search criteria: "terra"
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
Line 30: Line 30:
     Bundle of R packages from CRAN
     Bundle of R packages from CRAN


-----------------------------------------------------------------------------------------------------------</nowiki>
-----------------------------------------------------------------------------------------------------------
</syntaxhighlight>
 
So the <code>terra</code> package is in <code>R-bundle-CRAN/2023.12-foss-2023a</code>. Loading that bundle gives you the matching R version plus the bundled packages.
 
== Local package library ==
 
If you need a package that is not in a bundle, install it into a personal library. The cleanest way is to point R at a library directory with the <code>R_LIBS_USER</code> variable in <code>~/.Renviron</code>, so it is used automatically every time R starts.
 
Create the directory and register it once:
 
<syntaxhighlight lang="bash">
mkdir -p ~/R/library
echo 'R_LIBS_USER="~/R/library"' >> ~/.Renviron
</syntaxhighlight>
 
From then on, packages you install go there. Start R and install from CRAN:


This indicated that the '''terra''' extension is contained in the '''R-bundle-CRAN/2023.12-foss-2023a''' module. This bundle loads the corresponding R version for that year and adds extensions to it.  
<syntaxhighlight lang="r">
install.packages("ggplot2", repos = "https://cran.r-project.org", dependencies = TRUE)
</syntaxhighlight>


==Installing your own R extensions==
Check which library paths R is using with:


==Submitting Slurm jobs==
<syntaxhighlight lang="r">
.libPaths()
</syntaxhighlight>


To add a library for the current session only (without editing <code>~/.Renviron</code>), prepend it to the search path:


Slurm job script use bash as an interpreter (note the #!/bin/bash on the first line), so it cannot execute R code. Its job is to allocate resources in the cluster, load modules and execute any other bash command you need in your job.
<syntaxhighlight lang="r">
.libPaths(c("~/R/library", .libPaths()))
</syntaxhighlight>


Thus in order to launch an R job via slurm, one needs to have two scripts: an R script and a slurm (bash) script.
== Submitting R jobs ==


A SLURM script is interpreted by bash, so it cannot run R code directly. To run R in a [[Batch Jobs | batch job]] you need two files: an R script, and an sbatch script that loads the R module and runs it.


Here is an example of a very simple R script that will list all installed extensions.
A minimal R script — make it executable and give it an <code>Rscript</code> shebang so it runs as a program. The <code>.r</code> extension is just a label; what matters is the interpreter line:


<pre>
<syntaxhighlight lang="r">
#!/usr/bin/env Rscript
#!/usr/bin/env Rscript
installed.packages()[,1]
</pre>


Let's call this script '''list_ext.r''' and let's place in <nowiki>~/myRScripts</nowiki>. Note that the first line is used to point to the R interpreter, Rscript. In order to access it, the R (or R-bundle) module must have been loaded beforehand. This is done in the sbatch script.
installed.packages()[, 1]
</syntaxhighlight>
 
Save it as <code>~/myRScripts/list_ext.r</code> and make it executable:
 
<syntaxhighlight lang="bash">
chmod +x ~/myRScripts/list_ext.r
</syntaxhighlight>


'''Important''': Make sure that the R script, in this example '''list_ext.r''', is executable.
Then an sbatch script that loads R and runs it:


<pre>
<syntaxhighlight lang="bash">
#!/bin/bash
#!/bin/bash
#SBATCH --comment="List R extensions"  
#SBATCH --comment="List R extensions"
#SBATCH --time=0-0:10:00 # 10 minutes
#SBATCH --time=0-0:10:00
#SBATCH --mem=1G
#SBATCH --mem=1G
#SBATCH --ntasks=1
#SBATCH --ntasks=1
#SBATCH --output=output_%j.txt
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=
#SBATCH --job-name=r_job
 
module load 2023
module load 2023
module load R-bundle-CRAN/2023.12-foss-2023a
module load R-bundle-CRAN/2023.12-foss-2023a
 
#execute the R-script
~/myRScripts/list_ext.r
~/myRScripts/list_ext.r
</pre>
</syntaxhighlight>
 
Submit it with <code>sbatch</code>:
 
<syntaxhighlight lang="bash">
sbatch r_job.sh
</syntaxhighlight>
 
See [[Batch Jobs]] for the full set of <code>#SBATCH</code> directives.
 
== Parallel R ==
 
The popular parallel-R packages such as <code>doParallel</code> and <code>doSNOW</code> do not work reliably on Anunna, especially for array jobs spread across multiple nodes. For embarrassingly-parallel work, use the [https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html rslurm package] instead. It divides the computation across nodes, writes the SLURM submission scripts for you, and provides functions to collect and combine the results.
 
<syntaxhighlight lang="r">
library(rslurm)
sjob <- slurm_apply(test_func, pars, jobname = 'test_apply',
                    nodes = 2, cpus_per_node = 2, submit = FALSE)
</syntaxhighlight>
 
Each SLURM job takes a few seconds to start, so make sure each task does enough work to be worth it — roughly 60 seconds or more per task — otherwise most of the wall time is spent starting and stopping jobs rather than computing.
 
== See also ==
 
* [[Environment Modules]]
* [[Installing Personal Software]]
* [[RStudio]]
* [[Jupyter]]
* [[Python]]
* [[Batch Jobs]]


Since this is just an illustrative job, we have only allocated 1 GB of RAM and a single CPU.
== External links ==


* RStudio in Open OnDemand
* [https://cran.r-project.org/ CRAN]
* [https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html rslurm package vignette]
* [https://en.wikipedia.org/wiki/R_(programming_language) R on Wikipedia]

Latest revision as of 14:08, 16 June 2026

R is a language and environment for statistical computing and graphics, widely used across the life sciences. On Anunna you can use R from the command line in batch jobs, or interactively through RStudio in the Apps Portal. This page covers the provided R modules, installing your own packages into a local library, running R in a SLURM job, and parallelising R across nodes.

Modules

Anunna provides one R version per module bucket. Load the bucket first, then the R version for that year:

module load 2023
module load R/4.3.2

Bundle modules add a large set of commonly-used packages on top of base R. The module files list the packages they contain, so you can search them with module key.

Finding a package

To find which module provides a particular package, load a bucket and search with module key:

module load 2023
module key terra

which prints:

The following modules match your search criteria: "terra"
-----------------------------------------------------------------------------------------------------------

  R-bundle-CRAN: R-bundle-CRAN/2023.12-foss-2023a
    Bundle of R packages from CRAN

-----------------------------------------------------------------------------------------------------------

So the terra package is in R-bundle-CRAN/2023.12-foss-2023a. Loading that bundle gives you the matching R version plus the bundled packages.

Local package library

If you need a package that is not in a bundle, install it into a personal library. The cleanest way is to point R at a library directory with the R_LIBS_USER variable in ~/.Renviron, so it is used automatically every time R starts.

Create the directory and register it once:

mkdir -p ~/R/library
echo 'R_LIBS_USER="~/R/library"' >> ~/.Renviron

From then on, packages you install go there. Start R and install from CRAN:

install.packages("ggplot2", repos = "https://cran.r-project.org", dependencies = TRUE)

Check which library paths R is using with:

.libPaths()

To add a library for the current session only (without editing ~/.Renviron), prepend it to the search path:

.libPaths(c("~/R/library", .libPaths()))

Submitting R jobs

A SLURM script is interpreted by bash, so it cannot run R code directly. To run R in a batch job you need two files: an R script, and an sbatch script that loads the R module and runs it.

A minimal R script — make it executable and give it an Rscript shebang so it runs as a program. The .r extension is just a label; what matters is the interpreter line:

#!/usr/bin/env Rscript

installed.packages()[, 1]

Save it as ~/myRScripts/list_ext.r and make it executable:

chmod +x ~/myRScripts/list_ext.r

Then an sbatch script that loads R and runs it:

#!/bin/bash
#SBATCH --comment="List R extensions"
#SBATCH --time=0-0:10:00
#SBATCH --mem=1G
#SBATCH --ntasks=1
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=r_job

module load 2023
module load R-bundle-CRAN/2023.12-foss-2023a

~/myRScripts/list_ext.r

Submit it with sbatch:

sbatch r_job.sh

See Batch Jobs for the full set of #SBATCH directives.

Parallel R

The popular parallel-R packages such as doParallel and doSNOW do not work reliably on Anunna, especially for array jobs spread across multiple nodes. For embarrassingly-parallel work, use the rslurm package instead. It divides the computation across nodes, writes the SLURM submission scripts for you, and provides functions to collect and combine the results.

library(rslurm)
sjob <- slurm_apply(test_func, pars, jobname = 'test_apply',
                    nodes = 2, cpus_per_node = 2, submit = FALSE)

Each SLURM job takes a few seconds to start, so make sure each task does enough work to be worth it — roughly 60 seconds or more per task — otherwise most of the wall time is spent starting and stopping jobs rather than computing.

See also