R: Difference between revisions
No edit summary |
Phase 1 § 5 P1.5.6: merge Installing R packages locally + Control R environment using modules + Parallel R code on SLURM into R. Promoted R_LIBS_USER/.Renviron method, folded in Parallel R (rslurm), dropped stale pre-Lmod module-building content. (via update-page on MediaWiki MCP Server) |
||
| (7 intermediate revisions by one other user not shown) | |||
| Line 1: | Line 1: | ||
R is a language and environment for statistical computing and graphics, widely used across the life sciences. On Anunna you can use R from the command line in batch jobs, or interactively through [[RStudio]] in the Apps Portal. This page covers the provided R modules, installing your own packages into a local library, running R in a SLURM job, and parallelising R across nodes. | |||
== Modules == | |||
Anunna provides one R version per [[Environment Modules | module bucket]]. Load the bucket first, then the R version for that year: | |||
= | <syntaxhighlight lang="bash"> | ||
module load 2023 | |||
module load R/4.3.2 | |||
</syntaxhighlight> | |||
Bundle modules add a large set of commonly-used packages on top of base R. The module files list the packages they contain, so you can search them with <code>module key</code>. | |||
=== Finding a package === | |||
To find which module provides a particular package, load a bucket and search with <code>module key</code>: | |||
<syntaxhighlight lang="bash"> | |||
module load 2023 | |||
module key terra | |||
</syntaxhighlight> | |||
which prints: | |||
<syntaxhighlight lang="text"> | |||
The following modules match your search criteria: "terra" | The following modules match your search criteria: "terra" | ||
----------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | ||
| Line 30: | Line 30: | ||
Bundle of R packages from CRAN | Bundle of R packages from CRAN | ||
-----------------------------------------------------------------------------------------------------------</ | ----------------------------------------------------------------------------------------------------------- | ||
</syntaxhighlight> | |||
So the <code>terra</code> package is in <code>R-bundle-CRAN/2023.12-foss-2023a</code>. Loading that bundle gives you the matching R version plus the bundled packages. | |||
== Local package library == | |||
If you need a package that is not in a bundle, install it into a personal library. The cleanest way is to point R at a library directory with the <code>R_LIBS_USER</code> variable in <code>~/.Renviron</code>, so it is used automatically every time R starts. | |||
Create the directory and register it once: | |||
<syntaxhighlight lang="bash"> | |||
mkdir -p ~/R/library | |||
echo 'R_LIBS_USER="~/R/library"' >> ~/.Renviron | |||
</syntaxhighlight> | |||
From then on, packages you install go there. Start R and install from CRAN: | |||
R | |||
<syntaxhighlight lang="r"> | |||
install.packages("ggplot2", repos = "https://cran.r-project.org", dependencies = TRUE) | |||
</syntaxhighlight> | |||
Check which library paths R is using with: | |||
< | <syntaxhighlight lang="r"> | ||
.libPaths() | .libPaths() | ||
</ | </syntaxhighlight> | ||
To add a library for the current session only (without editing <code>~/.Renviron</code>), prepend it to the search path: | |||
<syntaxhighlight lang="r"> | |||
.libPaths(c("~/R/library", .libPaths())) | |||
</syntaxhighlight> | |||
== Submitting R jobs == | |||
A SLURM script is interpreted by bash, so it cannot run R code directly. To run R in a [[Batch Jobs | batch job]] you need two files: an R script, and an sbatch script that loads the R module and runs it. | |||
A minimal R script — make it executable and give it an <code>Rscript</code> shebang so it runs as a program. The <code>.r</code> extension is just a label; what matters is the interpreter line: | |||
< | <syntaxhighlight lang="r"> | ||
#!/usr/bin/env Rscript | |||
installed.packages()[, 1] | |||
</syntaxhighlight> | |||
Save it as <code>~/myRScripts/list_ext.r</code> and make it executable: | |||
< | <syntaxhighlight lang="bash"> | ||
chmod +x ~/myRScripts/list_ext.r | |||
</syntaxhighlight> | |||
Then an sbatch script that loads R and runs it: | |||
< | <syntaxhighlight lang="bash"> | ||
. | #!/bin/bash | ||
#SBATCH --comment="List R extensions" | |||
#SBATCH --time=0-0:10:00 | |||
#SBATCH --mem=1G | |||
#SBATCH --ntasks=1 | |||
#SBATCH --output=output_%j.txt | |||
#SBATCH --error=error_output_%j.txt | |||
#SBATCH --job-name=r_job | |||
module load 2023 | |||
module load R-bundle-CRAN/2023.12-foss-2023a | |||
~/myRScripts/list_ext.r | |||
</syntaxhighlight> | |||
Submit it with <code>sbatch</code>: | |||
< | |||
</ | |||
<syntaxhighlight lang="bash"> | |||
sbatch r_job.sh | |||
</syntaxhighlight> | |||
See [[Batch Jobs]] for the full set of <code>#SBATCH</code> directives. | |||
== | == Parallel R == | ||
The popular parallel-R packages such as <code>doParallel</code> and <code>doSNOW</code> do not work reliably on Anunna, especially for array jobs spread across multiple nodes. For embarrassingly-parallel work, use the [https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html rslurm package] instead. It divides the computation across nodes, writes the SLURM submission scripts for you, and provides functions to collect and combine the results. | |||
<syntaxhighlight lang="r"> | |||
library(rslurm) | |||
sjob <- slurm_apply(test_func, pars, jobname = 'test_apply', | |||
nodes = 2, cpus_per_node = 2, submit = FALSE) | |||
</syntaxhighlight> | |||
Each SLURM job takes a few seconds to start, so make sure each task does enough work to be worth it — roughly 60 seconds or more per task — otherwise most of the wall time is spent starting and stopping jobs rather than computing. | |||
== See also == | |||
* [[Environment Modules]] | |||
* [[Installing Personal Software]] | |||
* [[RStudio]] | |||
* [[Jupyter]] | |||
* [[Python]] | |||
* [[Batch Jobs]] | |||
== External links == | |||
* | * [https://cran.r-project.org/ CRAN] | ||
* [https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html rslurm package vignette] | |||
* [https://en.wikipedia.org/wiki/R_(programming_language) R on Wikipedia] | |||
Latest revision as of 14:08, 16 June 2026
R is a language and environment for statistical computing and graphics, widely used across the life sciences. On Anunna you can use R from the command line in batch jobs, or interactively through RStudio in the Apps Portal. This page covers the provided R modules, installing your own packages into a local library, running R in a SLURM job, and parallelising R across nodes.
Modules
Anunna provides one R version per module bucket. Load the bucket first, then the R version for that year:
module load 2023
module load R/4.3.2
Bundle modules add a large set of commonly-used packages on top of base R. The module files list the packages they contain, so you can search them with module key.
Finding a package
To find which module provides a particular package, load a bucket and search with module key:
module load 2023
module key terra
which prints:
The following modules match your search criteria: "terra"
-----------------------------------------------------------------------------------------------------------
R-bundle-CRAN: R-bundle-CRAN/2023.12-foss-2023a
Bundle of R packages from CRAN
-----------------------------------------------------------------------------------------------------------
So the terra package is in R-bundle-CRAN/2023.12-foss-2023a. Loading that bundle gives you the matching R version plus the bundled packages.
Local package library
If you need a package that is not in a bundle, install it into a personal library. The cleanest way is to point R at a library directory with the R_LIBS_USER variable in ~/.Renviron, so it is used automatically every time R starts.
Create the directory and register it once:
mkdir -p ~/R/library
echo 'R_LIBS_USER="~/R/library"' >> ~/.Renviron
From then on, packages you install go there. Start R and install from CRAN:
install.packages("ggplot2", repos = "https://cran.r-project.org", dependencies = TRUE)
Check which library paths R is using with:
.libPaths()
To add a library for the current session only (without editing ~/.Renviron), prepend it to the search path:
.libPaths(c("~/R/library", .libPaths()))
Submitting R jobs
A SLURM script is interpreted by bash, so it cannot run R code directly. To run R in a batch job you need two files: an R script, and an sbatch script that loads the R module and runs it.
A minimal R script — make it executable and give it an Rscript shebang so it runs as a program. The .r extension is just a label; what matters is the interpreter line:
#!/usr/bin/env Rscript
installed.packages()[, 1]
Save it as ~/myRScripts/list_ext.r and make it executable:
chmod +x ~/myRScripts/list_ext.r
Then an sbatch script that loads R and runs it:
#!/bin/bash
#SBATCH --comment="List R extensions"
#SBATCH --time=0-0:10:00
#SBATCH --mem=1G
#SBATCH --ntasks=1
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=r_job
module load 2023
module load R-bundle-CRAN/2023.12-foss-2023a
~/myRScripts/list_ext.r
Submit it with sbatch:
sbatch r_job.sh
See Batch Jobs for the full set of #SBATCH directives.
Parallel R
The popular parallel-R packages such as doParallel and doSNOW do not work reliably on Anunna, especially for array jobs spread across multiple nodes. For embarrassingly-parallel work, use the rslurm package instead. It divides the computation across nodes, writes the SLURM submission scripts for you, and provides functions to collect and combine the results.
library(rslurm)
sjob <- slurm_apply(test_func, pars, jobname = 'test_apply',
nodes = 2, cpus_per_node = 2, submit = FALSE)
Each SLURM job takes a few seconds to start, so make sure each task does enough work to be worth it — roughly 60 seconds or more per task — otherwise most of the wall time is spent starting and stopping jobs rather than computing.