R

From HPCwiki
Revision as of 14:08, 16 June 2026 by Haars0011 (talk | contribs) (Phase 1 § 5 P1.5.6: merge Installing R packages locally + Control R environment using modules + Parallel R code on SLURM into R. Promoted R_LIBS_USER/.Renviron method, folded in Parallel R (rslurm), dropped stale pre-Lmod module-building content. (via update-page on MediaWiki MCP Server))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

R is a language and environment for statistical computing and graphics, widely used across the life sciences. On Anunna you can use R from the command line in batch jobs, or interactively through RStudio in the Apps Portal. This page covers the provided R modules, installing your own packages into a local library, running R in a SLURM job, and parallelising R across nodes.

Modules

Anunna provides one R version per module bucket. Load the bucket first, then the R version for that year:

module load 2023
module load R/4.3.2

Bundle modules add a large set of commonly-used packages on top of base R. The module files list the packages they contain, so you can search them with module key.

Finding a package

To find which module provides a particular package, load a bucket and search with module key:

module load 2023
module key terra

which prints:

The following modules match your search criteria: "terra"
-----------------------------------------------------------------------------------------------------------

  R-bundle-CRAN: R-bundle-CRAN/2023.12-foss-2023a
    Bundle of R packages from CRAN

-----------------------------------------------------------------------------------------------------------

So the terra package is in R-bundle-CRAN/2023.12-foss-2023a. Loading that bundle gives you the matching R version plus the bundled packages.

Local package library

If you need a package that is not in a bundle, install it into a personal library. The cleanest way is to point R at a library directory with the R_LIBS_USER variable in ~/.Renviron, so it is used automatically every time R starts.

Create the directory and register it once:

mkdir -p ~/R/library
echo 'R_LIBS_USER="~/R/library"' >> ~/.Renviron

From then on, packages you install go there. Start R and install from CRAN:

install.packages("ggplot2", repos = "https://cran.r-project.org", dependencies = TRUE)

Check which library paths R is using with:

.libPaths()

To add a library for the current session only (without editing ~/.Renviron), prepend it to the search path:

.libPaths(c("~/R/library", .libPaths()))

Submitting R jobs

A SLURM script is interpreted by bash, so it cannot run R code directly. To run R in a batch job you need two files: an R script, and an sbatch script that loads the R module and runs it.

A minimal R script — make it executable and give it an Rscript shebang so it runs as a program. The .r extension is just a label; what matters is the interpreter line:

#!/usr/bin/env Rscript

installed.packages()[, 1]

Save it as ~/myRScripts/list_ext.r and make it executable:

chmod +x ~/myRScripts/list_ext.r

Then an sbatch script that loads R and runs it:

#!/bin/bash
#SBATCH --comment="List R extensions"
#SBATCH --time=0-0:10:00
#SBATCH --mem=1G
#SBATCH --ntasks=1
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=r_job

module load 2023
module load R-bundle-CRAN/2023.12-foss-2023a

~/myRScripts/list_ext.r

Submit it with sbatch:

sbatch r_job.sh

See Batch Jobs for the full set of #SBATCH directives.

Parallel R

The popular parallel-R packages such as doParallel and doSNOW do not work reliably on Anunna, especially for array jobs spread across multiple nodes. For embarrassingly-parallel work, use the rslurm package instead. It divides the computation across nodes, writes the SLURM submission scripts for you, and provides functions to collect and combine the results.

library(rslurm)
sjob <- slurm_apply(test_func, pars, jobname = 'test_apply',
                    nodes = 2, cpus_per_node = 2, submit = FALSE)

Each SLURM job takes a few seconds to start, so make sure each task does enough work to be worth it — roughly 60 seconds or more per task — otherwise most of the wall time is spent starting and stopping jobs rather than computing.

See also