Tutorials/Apptainer-GPUs
Running Apptainer with GPUs
Apptainer can pass through GPU hardware from the host into a container, allowing you to run GPU-accelerated workloads (such as deep learning inference or training) inside a fully contained environment. This page covers how to use both NVIDIA and AMD GPUs on the Anunna cluster.
Important: Before you begin, make sure the following are in place:
- Your
.sifimage files should be stored on Lustre (e.g. in your scratch space), not in your home directory. - Set your Apptainer cache to Lustre, since these images can get quite large:
export APPTAINER_CACHEDIR=$myScratch/Apptainer
How GPU Passthrough Works
Apptainer does not include GPU drivers inside the container. Instead, it binds the GPU drivers and libraries from the host system into the container at runtime. This means:
- The container must include software built for the correct GPU framework (CUDA for NVIDIA, ROCm for AMD).
- The host must have the matching GPU drivers installed (which Anunna already has on the GPU nodes).
- You tell Apptainer to enable GPU access using a flag:
--nvfor NVIDIA or--rocmfor AMD.
NVIDIA GPUs (gpu partition)
To use NVIDIA GPUs, you need to:
- Request a node on the
gpupartition. - Use the
--nvflag when running your container.
The --nv flag tells Apptainer to:
- Make the
/dev/nvidiaXdevice entries available inside the container. - Locate and bind the CUDA libraries from the host into the container.
- Set
LD_LIBRARY_PATHso the container uses the host's GPU libraries.
Example: Transcribing Audio with Whisper (NVIDIA)
OpenAI Whisper is a speech recognition model that benefits greatly from GPU acceleration. Let's build a container that runs Whisper on an NVIDIA GPU.
The Definition File
Create a file called whisper_nvidia.def:
Bootstrap: docker
From: pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel
%post
apt-get update
apt-get install -y ffmpeg git
pip install openai-whisper
apt-get clean
%environment
export LC_ALL=C
%runscript
exec whisper --device cuda"$@"
%help
Whisper speech recognition container (NVIDIA GPU).
Usage: apptainer run --nv whisper_nvidia.sif <audio_file> [options]
A quick walkthrough of what is happening here:
| Section | Purpose |
|---|---|
Bootstrap / From |
Uses the official PyTorch Docker image, which already includes Python, PyTorch, CUDA, and cuDNN. |
%post |
Installs ffmpeg (required by Whisper for audio decoding), git, and the Whisper package itself.
|
%environment |
Sets the locale to avoid encoding warnings. |
%runscript |
Makes whisper the default command, passing through any arguments.
|
%help |
Provides usage information (accessible via apptainer run-help).
|
Building the Image
Request a compute node and build:
module reset
module load utilities Apptainer
apptainer build whisper_nvidia.sif whisper_nvidia.def
Running Whisper
Once built, run Whisper on the sample audio file. First, request a GPU node:
srun --partition=gpu --gres=gpu:1 --pty bash
Then load Apptainer and run:
module reset
module load utilities Apptainer
apptainer run --nv whisper_nvidia.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model base
apptainer run --nv whisper_nvidia.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model large
The --nv flag is what makes the GPU visible to the container. Without it, PyTorch would not detect any CUDA devices and Whisper would fall back to CPU (much slower).
You can verify GPU access from inside the container with:
apptainer exec --nv whisper_nvidia.sif python -c "import torch; print(torch.cuda.is_available())"
This should print True.
AMD GPUs (gpu_amd partition)
To use AMD GPUs, you need to:
- Request a node on the
gpu_amdpartition. - Use the
--rocmflag when running your container.
The --rocm flag tells Apptainer to:
- Make the
/dev/dri/and/dev/kfddevice entries available inside the container. - Locate and bind the ROCm libraries from the host into the container.
- Set
LD_LIBRARY_PATHso the container uses the host's ROCm libraries.
Example: Transcribing Audio with Whisper (AMD)
The same Whisper workflow, but using an AMD GPU with the ROCm stack.
The Definition File
Create a file called whisper_amd.def:
Bootstrap: docker
From: rocm/pytorch:rocm7.2_ubuntu24.04_py3.13_pytorch_release_2.10.0
%post
apt-get update
apt-get install -y ffmpeg git
pip install openai-whisper
apt-get clean
%environment
export LC_ALL=C
%runscript
exec whisper --device cuda "$@"
%help
Whisper speech recognition container (AMD ROCm GPU).
Usage: apptainer run --rocm whisper_amd.sif <audio_file> [options] --model <model_size>
The structure is identical to the NVIDIA version. The only difference is the base image: instead of pytorch/pytorch (which includes CUDA), we use rocm/pytorch (which includes ROCm). PyTorch's API is the same regardless of the backend — torch.cuda.is_available() returns True on ROCm as well, since ROCm maps onto the CUDA API.
Building the Image
module reset
module load utilities Apptainer
apptainer build whisper_amd.sif whisper_amd.def
Running Whisper
Request an AMD GPU node:
srun --partition=gpu_amd --gres=gpu:1 --pty bash
Then load Apptainer and run:
module reset
module load utilities Apptainer
apptainer run --rocm whisper_amd.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model base
apptainer run --rocm whisper_amd.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model large
Note that even though we are on an AMD GPU, the --device cuda flag is correct. PyTorch's ROCm backend uses the same cuda device name for compatibility.
Verify GPU access:
apptainer exec --rocm whisper_amd.sif python -c "import torch; print(torch.cuda.is_available())"
Summary
| NVIDIA | AMD | |
|---|---|---|
| Partition | gpu |
gpu_amd
|
| Apptainer flag | --nv |
--rocm
|
| Base image | pytorch/pytorch (includes CUDA) |
rocm/pytorch (includes ROCm)
|
| PyTorch device | --device cuda |
--device cuda (same API)
|
| Host devices bound | /dev/nvidiaX |
/dev/dri/, /dev/kfd
|
The key takeaway: the only things that change between NVIDIA and AMD are the base container image, the Apptainer flag, and the Slurm partition. Your application code stays the same.