R: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
No edit summary
Phase 1 § 5 P1.5.6: merge Installing R packages locally + Control R environment using modules + Parallel R code on SLURM into R. Promoted R_LIBS_USER/.Renviron method, folded in Parallel R (rslurm), dropped stale pre-Lmod module-building content. (via update-page on MediaWiki MCP Server)
 
(7 intermediate revisions by one other user not shown)
Line 1: Line 1:
R is a language and environment for statistical computing and graphics, widely used across the life sciences. On Anunna you can use R from the command line in batch jobs, or interactively through [[RStudio]] in the Apps Portal. This page covers the provided R modules, installing your own packages into a local library, running R in a SLURM job, and parallelising R across nodes.


== Modules ==


At the HPC R can be used in the command line with batch scripts submitted via slurm or via a web GUI, RStudio, through Open Ondemand.
Anunna provides one R version per [[Environment Modules | module bucket]]. Load the bucket first, then the R version for that year:


=Modules=
<syntaxhighlight lang="bash">
module load 2023
module load R/4.3.2
</syntaxhighlight>


One version of R is installed for every year. These are accessible through environment modules. Thus in order to access a specific version of R one must first load the year module, followed by the available R version for that year.
Bundle modules add a large set of commonly-used packages on top of base R. The module files list the packages they contain, so you can search them with <code>module key</code>.


Additionally, extension bundle modules for R are also present.These modules contain a list of the installed extensions in their module files and are thus searcheable
=== Finding a package ===


===Searching for extensions===
To find which module provides a particular package, load a bucket and search with <code>module key</code>:


In order to search for a particular extension, use the module '''key''' command
<syntaxhighlight lang="bash">
module load 2023
module key terra
</syntaxhighlight>


<nowiki>module key [extensionName]</nowiki>
which prints:


For instance, when searching for the '''terra''' extension,
<syntaxhighlight lang="text">
 
<nowiki>module load 2023
module key terra</nowiki>
 
The following output is then printed
 
<nowiki>
-----------------------------------------------------------------------------------------------------------
The following modules match your search criteria: "terra"
The following modules match your search criteria: "terra"
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
Line 30: Line 30:
     Bundle of R packages from CRAN
     Bundle of R packages from CRAN


-----------------------------------------------------------------------------------------------------------</nowiki>
-----------------------------------------------------------------------------------------------------------
</syntaxhighlight>
 
So the <code>terra</code> package is in <code>R-bundle-CRAN/2023.12-foss-2023a</code>. Loading that bundle gives you the matching R version plus the bundled packages.


This indicated that the '''terra''' extension is contained in the '''R-bundle-CRAN/2023.12-foss-2023a''' module. This bundle loads the corresponding R version for that year and adds extensions to it.
== Local package library ==


== User Local Library==
If you need a package that is not in a bundle, install it into a personal library. The cleanest way is to point R at a library directory with the <code>R_LIBS_USER</code> variable in <code>~/.Renviron</code>, so it is used automatically every time R starts.


R allows the user to create its own local environment. There the user can install its own packages. you can then store these extension libraries in a local folder. It is handy to keep track of the version of R used, so try to keep your folders organized.
Create the directory and register it once:


The first step is to a load a version of R
<syntaxhighlight lang="bash">
mkdir -p ~/R/library
echo 'R_LIBS_USER="~/R/library"' >> ~/.Renviron
</syntaxhighlight>


<pre>
From then on, packages you install go there. Start R and install from CRAN:
module load 2023
module load R/4.3.2
R
</pre>


<syntaxhighlight lang="r">
install.packages("ggplot2", repos = "https://cran.r-project.org", dependencies = TRUE)
</syntaxhighlight>


The commands above load the 2023 bucket, loads the R/4.3.2 module and executes the R interactive runtime.
Check which library paths R is using with:
Once inside the runtime (or a jupyter notebook running an R kernel), you can check your library paths with


<pre>
<syntaxhighlight lang="r">
.libPaths()
.libPaths()
</pre>
</syntaxhighlight>


===Creating a new library ===
To add a library for the current session only (without editing <code>~/.Renviron</code>), prepend it to the search path:


<syntaxhighlight lang="r">
.libPaths(c("~/R/library", .libPaths()))
</syntaxhighlight>


set the path to your new library in the variable new_library
== Submitting R jobs ==


<pre>new_library='/home/WUR/user001/.R_432_ext/'</pre>
A SLURM script is interpreted by bash, so it cannot run R code directly. To run R in a [[Batch Jobs | batch job]] you need two files: an R script, and an sbatch script that loads the R module and runs it.


create the library folder
A minimal R script — make it executable and give it an <code>Rscript</code> shebang so it runs as a program. The <code>.r</code> extension is just a label; what matters is the interpreter line:


<pre>dir.create(file.path(new_library), showWarnings = TRUE) </pre>
<syntaxhighlight lang="r">
#!/usr/bin/env Rscript


If the folder already exists, R will display a warning.
installed.packages()[, 1]
</syntaxhighlight>


Once created, you can then append the newly created folder to your libPaths
Save it as <code>~/myRScripts/list_ext.r</code> and make it executable:


<pre>.libPaths(c(folder, .libPaths()) )</pre>
<syntaxhighlight lang="bash">
chmod +x ~/myRScripts/list_ext.r
</syntaxhighlight>


you can check if the operation was successful by running
Then an sbatch script that loads R and runs it:


<pre>
<syntaxhighlight lang="bash">
.libPaths()
#!/bin/bash
</pre>
#SBATCH --comment="List R extensions"
#SBATCH --time=0-0:10:00
#SBATCH --mem=1G
#SBATCH --ntasks=1
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=r_job


module load 2023
module load R-bundle-CRAN/2023.12-foss-2023a


===Installing New Extensions ===
~/myRScripts/list_ext.r
</syntaxhighlight>


In order to install a new package, one needs to use the command in the example below.
Submit it with <code>sbatch</code>:
<pre>
install.packages("ggplot2", repos="http://cran.r-project.org", libs="~/R_432_ext")
</pre>


<syntaxhighlight lang="bash">
sbatch r_job.sh
</syntaxhighlight>


In this example, we are installing the ggplot2 extension. We need to specify the repository we are downloading the package from, which in this case is the r-project website. Finally we need to specify the destination location of the libraries.
See [[Batch Jobs]] for the full set of <code>#SBATCH</code> directives.


==Submitting Slurm jobs==
== Parallel R ==


The popular parallel-R packages such as <code>doParallel</code> and <code>doSNOW</code> do not work reliably on Anunna, especially for array jobs spread across multiple nodes. For embarrassingly-parallel work, use the [https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html rslurm package] instead. It divides the computation across nodes, writes the SLURM submission scripts for you, and provides functions to collect and combine the results.


Slurm job script use bash as an interpreter (note the #!/bin/bash on the first line), so it cannot execute R code. Its job is to allocate resources in the cluster, load modules and execute any other bash command you need in your job.
<syntaxhighlight lang="r">
library(rslurm)
sjob <- slurm_apply(test_func, pars, jobname = 'test_apply',
                    nodes = 2, cpus_per_node = 2, submit = FALSE)
</syntaxhighlight>


Thus in order to launch an R job via slurm, one needs to have two scripts: an R script and a slurm (bash) script.
Each SLURM job takes a few seconds to start, so make sure each task does enough work to be worth it — roughly 60 seconds or more per task — otherwise most of the wall time is spent starting and stopping jobs rather than computing.


== See also ==


Here is an example of a very simple R script that will list all installed extensions.
* [[Environment Modules]]
 
* [[Installing Personal Software]]
<pre>
* [[RStudio]]
#!/usr/bin/env Rscript
* [[Jupyter]]
* [[Python]]
installed.packages()[,1]
* [[Batch Jobs]]
</pre>
 
Let's call this script '''list_ext.r''' and let's place in <nowiki>~/myRScripts</nowiki>. Note that the first line is used to point to the R interpreter, Rscript. In order to access it, the R (or R-bundle) module must have been loaded beforehand. This is done in the sbatch script.
 
'''Important''': Make sure that the R script, in this example '''list_ext.r''', is executable.
 
<pre>
#!/bin/bash
#SBATCH --comment="List R extensions"
#SBATCH --time=0-0:10:00 # 10 minutes
#SBATCH --mem=1G
#SBATCH --ntasks=1
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=
module load 2023
module load R-bundle-CRAN/2023.12-foss-2023a
#execute the R-script
~/myRScripts/list_ext.r
</pre>


Since this is just an illustrative job, we have only allocated 1 GB of RAM and a single CPU.
== External links ==


* RStudio in Open OnDemand
* [https://cran.r-project.org/ CRAN]
* [https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html rslurm package vignette]
* [https://en.wikipedia.org/wiki/R_(programming_language) R on Wikipedia]

Latest revision as of 14:08, 16 June 2026

R is a language and environment for statistical computing and graphics, widely used across the life sciences. On Anunna you can use R from the command line in batch jobs, or interactively through RStudio in the Apps Portal. This page covers the provided R modules, installing your own packages into a local library, running R in a SLURM job, and parallelising R across nodes.

Modules

Anunna provides one R version per module bucket. Load the bucket first, then the R version for that year:

module load 2023
module load R/4.3.2

Bundle modules add a large set of commonly-used packages on top of base R. The module files list the packages they contain, so you can search them with module key.

Finding a package

To find which module provides a particular package, load a bucket and search with module key:

module load 2023
module key terra

which prints:

The following modules match your search criteria: "terra"
-----------------------------------------------------------------------------------------------------------

  R-bundle-CRAN: R-bundle-CRAN/2023.12-foss-2023a
    Bundle of R packages from CRAN

-----------------------------------------------------------------------------------------------------------

So the terra package is in R-bundle-CRAN/2023.12-foss-2023a. Loading that bundle gives you the matching R version plus the bundled packages.

Local package library

If you need a package that is not in a bundle, install it into a personal library. The cleanest way is to point R at a library directory with the R_LIBS_USER variable in ~/.Renviron, so it is used automatically every time R starts.

Create the directory and register it once:

mkdir -p ~/R/library
echo 'R_LIBS_USER="~/R/library"' >> ~/.Renviron

From then on, packages you install go there. Start R and install from CRAN:

install.packages("ggplot2", repos = "https://cran.r-project.org", dependencies = TRUE)

Check which library paths R is using with:

.libPaths()

To add a library for the current session only (without editing ~/.Renviron), prepend it to the search path:

.libPaths(c("~/R/library", .libPaths()))

Submitting R jobs

A SLURM script is interpreted by bash, so it cannot run R code directly. To run R in a batch job you need two files: an R script, and an sbatch script that loads the R module and runs it.

A minimal R script — make it executable and give it an Rscript shebang so it runs as a program. The .r extension is just a label; what matters is the interpreter line:

#!/usr/bin/env Rscript

installed.packages()[, 1]

Save it as ~/myRScripts/list_ext.r and make it executable:

chmod +x ~/myRScripts/list_ext.r

Then an sbatch script that loads R and runs it:

#!/bin/bash
#SBATCH --comment="List R extensions"
#SBATCH --time=0-0:10:00
#SBATCH --mem=1G
#SBATCH --ntasks=1
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=r_job

module load 2023
module load R-bundle-CRAN/2023.12-foss-2023a

~/myRScripts/list_ext.r

Submit it with sbatch:

sbatch r_job.sh

See Batch Jobs for the full set of #SBATCH directives.

Parallel R

The popular parallel-R packages such as doParallel and doSNOW do not work reliably on Anunna, especially for array jobs spread across multiple nodes. For embarrassingly-parallel work, use the rslurm package instead. It divides the computation across nodes, writes the SLURM submission scripts for you, and provides functions to collect and combine the results.

library(rslurm)
sjob <- slurm_apply(test_func, pars, jobname = 'test_apply',
                    nodes = 2, cpus_per_node = 2, submit = FALSE)

Each SLURM job takes a few seconds to start, so make sure each task does enough work to be worth it — roughly 60 seconds or more per task — otherwise most of the wall time is spent starting and stopping jobs rather than computing.

See also