Latest revision as of 10:33, 25 March 2026

Running Apptainer with GPUs

Apptainer can pass through GPU hardware from the host into a container, allowing you to run GPU-accelerated workloads (such as deep learning inference or training) inside a fully contained environment. This page covers how to use both NVIDIA and AMD GPUs on the Anunna cluster.

Important: Before you begin, make sure the following are in place:

Your .sif image files should be stored on Lustre (e.g. in your scratch space), not in your home directory.
Set your Apptainer cache to Lustre, since these images can get quite large:

export APPTAINER_CACHEDIR=$myScratch/Apptainer

How GPU Passthrough Works

Apptainer does not include GPU drivers inside the container. Instead, it binds the GPU drivers and libraries from the host system into the container at runtime. This means:

The container must include software built for the correct GPU framework (CUDA for NVIDIA, ROCm for AMD).
The host must have the matching GPU drivers installed (which Anunna already has on the GPU nodes).
You tell Apptainer to enable GPU access using a flag: --nv for NVIDIA or --rocm for AMD.

NVIDIA GPUs (`gpu` partition)

To use NVIDIA GPUs, you need to:

Request a node on the gpu partition.
Use the --nv flag when running your container.

The --nv flag tells Apptainer to:

Make the /dev/nvidiaX device entries available inside the container.
Locate and bind the CUDA libraries from the host into the container.
Set LD_LIBRARY_PATH so the container uses the host's GPU libraries.

Example: Transcribing Audio with Whisper (NVIDIA)

OpenAI Whisper is a speech recognition model that benefits greatly from GPU acceleration. Let's build a container that runs Whisper on an NVIDIA GPU.

The Definition File

Create a file called whisper_nvidia.def:

Bootstrap: docker
From: pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel

%post
    apt-get update
    apt-get install -y ffmpeg git
    pip install openai-whisper
    apt-get clean

%environment
    export LC_ALL=C

%runscript
    exec whisper --device cuda"$@"

%help
    Whisper speech recognition container (NVIDIA GPU).
    Usage: apptainer run --nv whisper_nvidia.sif <audio_file> [options]

A quick walkthrough of what is happening here:

Section	Purpose
`Bootstrap / From`	Uses the official PyTorch Docker image, which already includes Python, PyTorch, CUDA, and cuDNN.
`%post`	Installs `ffmpeg` (required by Whisper for audio decoding), `git`, and the Whisper package itself.
`%environment`	Sets the locale to avoid encoding warnings.
`%runscript`	Makes `whisper` the default command, passing through any arguments.
`%help`	Provides usage information (accessible via `apptainer run-help`).

Building the Image

Request a compute node and build:

module reset
module load utilities Apptainer
apptainer build whisper_nvidia.sif whisper_nvidia.def

Running Whisper

Once built, run Whisper on the sample audio file. First, request a GPU node:

srun --partition=gpu --gres=gpu:1 --pty bash

Then load Apptainer and run:

module reset
module load utilities Apptainer
apptainer run --nv whisper_nvidia.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model base
apptainer run --nv whisper_nvidia.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model large

The --nv flag is what makes the GPU visible to the container. Without it, PyTorch would not detect any CUDA devices and Whisper would fall back to CPU (much slower).

You can verify GPU access from inside the container with:

apptainer exec --nv whisper_nvidia.sif python -c "import torch; print(torch.cuda.is_available())"

This should print True.

AMD GPUs (`gpu_amd` partition)

To use AMD GPUs, you need to:

Request a node on the gpu_amd partition.
Use the --rocm flag when running your container.

The --rocm flag tells Apptainer to:

Make the /dev/dri/ and /dev/kfd device entries available inside the container.
Locate and bind the ROCm libraries from the host into the container.
Set LD_LIBRARY_PATH so the container uses the host's ROCm libraries.

Example: Transcribing Audio with Whisper (AMD)

The same Whisper workflow, but using an AMD GPU with the ROCm stack.

The Definition File

Create a file called whisper_amd.def:

Bootstrap: docker
From: rocm/pytorch:rocm7.2_ubuntu24.04_py3.13_pytorch_release_2.10.0


%post
    apt-get update
    apt-get install -y ffmpeg git
    pip install openai-whisper
    apt-get clean

%environment
    export LC_ALL=C

%runscript
    exec whisper --device cuda "$@"

%help
    Whisper speech recognition container (AMD ROCm GPU).
    Usage: apptainer run --rocm whisper_amd.sif <audio_file> [options]  --model <model_size>

The structure is identical to the NVIDIA version. The only difference is the base image: instead of pytorch/pytorch (which includes CUDA), we use rocm/pytorch (which includes ROCm). PyTorch's API is the same regardless of the backend — torch.cuda.is_available() returns True on ROCm as well, since ROCm maps onto the CUDA API.

Building the Image

module reset
module load utilities Apptainer
apptainer build whisper_amd.sif whisper_amd.def

Running Whisper

Request an AMD GPU node:

srun --partition=gpu_amd --gres=gpu:1 --pty bash

Then load Apptainer and run:

module reset
module load utilities Apptainer
apptainer run --rocm whisper_amd.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model base  
apptainer run --rocm whisper_amd.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model large

Note that even though we are on an AMD GPU, the --device cuda flag is correct. PyTorch's ROCm backend uses the same cuda device name for compatibility.

Verify GPU access:

apptainer exec --rocm whisper_amd.sif python -c "import torch; print(torch.cuda.is_available())"

Summary

	NVIDIA	AMD
Partition	`gpu`	`gpu_amd`
Apptainer flag	`--nv`	`--rocm`
Base image	`pytorch/pytorch` (includes CUDA)	`rocm/pytorch` (includes ROCm)
PyTorch device	`--device cuda`	`--device cuda` (same API)
Host devices bound	`/dev/nvidiaX`	`/dev/dri/`, `/dev/kfd`

The key takeaway: the only things that change between NVIDIA and AMD are the base container image, the Apptainer flag, and the Slurm partition. Your application code stays the same.

@@ Line 6: / Line 6: @@
 * Your <code>.sif</code> image files should be stored on '''Lustre''' (e.g. in your scratch space), not in your home directory.
-* Set your Apptainer cache to Lustre:
+* Set your Apptainer cache to Lustre, since these images can get quite large:
 <syntaxhighlight lang="bash">
@@ Line 40: / Line 40: @@
 Create a file called <code>whisper_nvidia.def</code>:
-<syntaxhighlight lang="bash">
+<syntaxhighlight lang="bash">Bootstrap: docker
-Bootstrap: docker
 From: pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel
@@ Line 54: / Line 53: @@
 %runscript
-     exec whisper "$@"
+     exec whisper --device cuda"$@"
 %help
      Whisper speech recognition container (NVIDIA GPU).
-     Usage: apptainer run --nv whisper_nvidia.sif <audio_file> [options]
+     Usage: apptainer run --nv whisper_nvidia.sif <audio_file> [options]</syntaxhighlight>
-</syntaxhighlight>
 A quick walkthrough of what is happening here:
@@ Line 101: / Line 99: @@
 module reset
 module load utilities Apptainer
-apptainer run --nv whisper_nvidia.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model base --device cuda
+apptainer run --nv whisper_nvidia.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model base
+apptainer run --nv whisper_nvidia.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model large
 </syntaxhighlight>
@@ Line 136: / Line 136: @@
 <syntaxhighlight lang="bash">
 Bootstrap: docker
-From: rocm/pytorch:rocm6.2.4_ubuntu22.04_py3.10_pytorch_release_2.5.0
+From: rocm/pytorch:rocm7.2_ubuntu24.04_py3.13_pytorch_release_2.10.0
 %post
@@ Line 148: / Line 149: @@
 %runscript
-     exec whisper "$@"
+     exec whisper --device cuda "$@"
 %help
      Whisper speech recognition container (AMD ROCm GPU).
-     Usage: apptainer run --rocm whisper_amd.sif <audio_file> [options]
+     Usage: apptainer run --rocm whisper_amd.sif <audio_file> [options]  --model <model_size>
 </syntaxhighlight>
@@ Line 175: / Line 176: @@
 Then load Apptainer and run:
-<syntaxhighlight lang="bash">
+<syntaxhighlight lang="bash">module reset
-module reset
 module load utilities Apptainer
-apptainer run --rocm whisper_amd.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model base --device cuda
+apptainer run --rocm whisper_amd.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model base
-</syntaxhighlight>
+apptainer run --rocm whisper_amd.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model large  </syntaxhighlight>
 Note that even though we are on an AMD GPU, the <code>--device cuda</code> flag is correct. PyTorch's ROCm backend uses the same <code>cuda</code> device name for compatibility.

Tutorials/Apptainer-GPUs: Difference between revisions

Latest revision as of 10:33, 25 March 2026

Contents

Running Apptainer with GPUs

How GPU Passthrough Works

NVIDIA GPUs (`gpu` partition)

Example: Transcribing Audio with Whisper (NVIDIA)

The Definition File

Building the Image

Running Whisper

AMD GPUs (`gpu_amd` partition)

Example: Transcribing Audio with Whisper (AMD)

The Definition File

Building the Image

Running Whisper

Summary

Navigation menu

Tutorials/Apptainer-GPUs: Difference between revisions

Latest revision as of 10:33, 25 March 2026

Running Apptainer with GPUs

How GPU Passthrough Works

NVIDIA GPUs (gpu partition)

Example: Transcribing Audio with Whisper (NVIDIA)

The Definition File

Building the Image

Running Whisper

AMD GPUs (gpu_amd partition)

Example: Transcribing Audio with Whisper (AMD)

The Definition File

Building the Image

Running Whisper

Summary

Navigation menu

Search

NVIDIA GPUs (`gpu` partition)

AMD GPUs (`gpu_amd` partition)