R

From HPCwiki
Revision as of 15:08, 5 December 2024 by Honfi001 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


At the Anunna, R can be used in the command line with batch scripts submitted via slurm or via a web GUI, RStudio, through Open Ondemand.

Modules

One version of R is installed for every year. These are accessible through environment modules. Thus in order to access a specific version of R one must first load the year module, followed by the available R version for that year.

Additionally, extension bundle modules for R are also present.These modules contain a list of the installed extensions in their module files and are thus searcheable

Searching for extensions

In order to search for a particular extension, use the module key command

module key [extensionName]

For instance, when searching for the terra extension,

module load 2023
module key terra

The following output is then printed

-----------------------------------------------------------------------------------------------------------
The following modules match your search criteria: "terra"
-----------------------------------------------------------------------------------------------------------

  R-bundle-CRAN: R-bundle-CRAN/2023.12-foss-2023a
    Bundle of R packages from CRAN

-----------------------------------------------------------------------------------------------------------

This indicated that the terra extension is contained in the R-bundle-CRAN/2023.12-foss-2023a module. This bundle loads the corresponding R version for that year and adds extensions to it.

User Local Library

R allows the user to create its own local environment. There the user can install its own packages. you can then store these extension libraries in a local folder. It is handy to keep track of the version of R used, so try to keep your folders organized.

The first step is to a load a version of R

module load 2023
module load R/4.3.2
R


The commands above load the 2023 bucket, loads the R/4.3.2 module and executes the R interactive runtime. Once inside the runtime (or a jupyter notebook running an R kernel), you can check your library paths with

.libPaths()

Creating a new library

set the path to your new library in the variable new_library

new_library='/home/WUR/user001/.R_432_ext/'

create the library folder

dir.create(file.path(new_library), showWarnings = TRUE) 

If the folder already exists, R will display a warning.

Once created, you can then append the newly created folder to your libPaths

.libPaths(c(new_library, .libPaths()) )

you can check if the operation was successful by running

.libPaths()

Installing New Extensions

In order to install a new package, one needs to use the command in the example below.

install.packages("ggplot2", repos="http://cran.r-project.org", libs="~/.R_432_ext", dependencies=TRUE)


In this example, we are installing the ggplot2 extension. We need to specify the repository we are downloading the package from, which in this case is the r-project website. Finally we need to specify the destination location of the libraries.

Submitting Slurm jobs

Slurm job script use bash as an interpreter (note the #!/bin/bash on the first line), so it cannot execute R code. Its job is to allocate resources in the cluster, load modules and execute any other bash command you need in your job.

Thus in order to launch an R job via slurm, one needs to have two scripts: an R script and a slurm (bash) script.


Here is an example of a very simple R script that will list all installed extensions.

#!/usr/bin/env Rscript
 
installed.packages()[,1]

Let's call this script list_ext.r and let's place in ~/myRScripts. It is worth pointing out that extensions in linux are meaningless, what determines how the script is going to be executed is the interpreter. We are assigning the script with the .r extension to act as a label and make the r-scripts easier to search and identify at a glance.

Note that the first line is used to point to the R interpreter, Rscript. In order to access the Rscriot interpreter, the R (or R-bundle) module must have been loaded beforehand. This is done in the sbatch script.

Important: Make sure that the R script, in this example list_ext.r, is executable.

#!/bin/bash
#SBATCH --comment="List R extensions" 
#SBATCH --time=0-0:10:00 # 10 minutes
#SBATCH --mem=1G
#SBATCH --ntasks=1
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=r_job.sh
 
module load 2023
module load R-bundle-CRAN/2023.12-foss-2023a
 
#execute the R-script
~/myRScripts/list_ext.r
 

Since this is just an illustrative job, we have only allocated 1 GB of RAM and a single CPU. We can store the text above under the name r_job.sh.

Thus we can submit the job with the command

sbatch r_job.sh