Tutorials/Apptainer-GPUs

From HPCwiki
Revision as of 10:42, 25 March 2026 by Honfi001 (talk | contribs) (Created page with "= Running Apptainer with GPUs = Apptainer can pass through GPU hardware from the host into a container, allowing you to run GPU-accelerated workloads (such as deep learning inference or training) inside a fully contained environment. This page covers how to use both NVIDIA and AMD GPUs on the Anunna cluster. '''Important:''' Before you begin, make sure the following are in place: * Your <code>.sif</code> image files should be stored on '''Lustre''' (e.g. in your scrat...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Running Apptainer with GPUs

Apptainer can pass through GPU hardware from the host into a container, allowing you to run GPU-accelerated workloads (such as deep learning inference or training) inside a fully contained environment. This page covers how to use both NVIDIA and AMD GPUs on the Anunna cluster.

Important: Before you begin, make sure the following are in place:

  • Your .sif image files should be stored on Lustre (e.g. in your scratch space), not in your home directory.
  • Set your Apptainer cache to Lustre:
export APPTAINER_CACHEDIR=$myScratch/Apptainer

How GPU Passthrough Works

Apptainer does not include GPU drivers inside the container. Instead, it binds the GPU drivers and libraries from the host system into the container at runtime. This means:

  • The container must include software built for the correct GPU framework (CUDA for NVIDIA, ROCm for AMD).
  • The host must have the matching GPU drivers installed (which Anunna already has on the GPU nodes).
  • You tell Apptainer to enable GPU access using a flag: --nv for NVIDIA or --rocm for AMD.

NVIDIA GPUs (gpu partition)

To use NVIDIA GPUs, you need to:

  1. Request a node on the gpu partition.
  2. Use the --nv flag when running your container.

The --nv flag tells Apptainer to:

  • Make the /dev/nvidiaX device entries available inside the container.
  • Locate and bind the CUDA libraries from the host into the container.
  • Set LD_LIBRARY_PATH so the container uses the host's GPU libraries.

Example: Transcribing Audio with Whisper (NVIDIA)

OpenAI Whisper is a speech recognition model that benefits greatly from GPU acceleration. Let's build a container that runs Whisper on an NVIDIA GPU.

The Definition File

Create a file called whisper_nvidia.def:

Bootstrap: docker
From: pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel

%post
    apt-get update
    apt-get install -y ffmpeg git
    pip install openai-whisper
    apt-get clean

%environment
    export LC_ALL=C

%runscript
    exec whisper "$@"

%help
    Whisper speech recognition container (NVIDIA GPU).
    Usage: apptainer run --nv whisper_nvidia.sif <audio_file> [options]

A quick walkthrough of what is happening here:

Section Purpose
Bootstrap / From Uses the official PyTorch Docker image, which already includes Python, PyTorch, CUDA, and cuDNN.
%post Installs ffmpeg (required by Whisper for audio decoding), git, and the Whisper package itself.
%environment Sets the locale to avoid encoding warnings.
%runscript Makes whisper the default command, passing through any arguments.
%help Provides usage information (accessible via apptainer run-help).

Building the Image

Request a compute node and build:

module reset
module load utilities Apptainer
apptainer build whisper_nvidia.sif whisper_nvidia.def

Running Whisper

Once built, run Whisper on the sample audio file. First, request a GPU node:

srun --partition=gpu --gres=gpu:1 --pty bash

Then load Apptainer and run:

module reset
module load utilities Apptainer
apptainer run --nv whisper_nvidia.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model base --device cuda

The --nv flag is what makes the GPU visible to the container. Without it, PyTorch would not detect any CUDA devices and Whisper would fall back to CPU (much slower).

You can verify GPU access from inside the container with:

apptainer exec --nv whisper_nvidia.sif python -c "import torch; print(torch.cuda.is_available())"

This should print True.

AMD GPUs (gpu_amd partition)

To use AMD GPUs, you need to:

  1. Request a node on the gpu_amd partition.
  2. Use the --rocm flag when running your container.

The --rocm flag tells Apptainer to:

  • Make the /dev/dri/ and /dev/kfd device entries available inside the container.
  • Locate and bind the ROCm libraries from the host into the container.
  • Set LD_LIBRARY_PATH so the container uses the host's ROCm libraries.

Example: Transcribing Audio with Whisper (AMD)

The same Whisper workflow, but using an AMD GPU with the ROCm stack.

The Definition File

Create a file called whisper_amd.def:

Bootstrap: docker
From: rocm/pytorch:rocm6.2.4_ubuntu22.04_py3.10_pytorch_release_2.5.0

%post
    apt-get update
    apt-get install -y ffmpeg git
    pip install openai-whisper
    apt-get clean

%environment
    export LC_ALL=C

%runscript
    exec whisper "$@"

%help
    Whisper speech recognition container (AMD ROCm GPU).
    Usage: apptainer run --rocm whisper_amd.sif <audio_file> [options]

The structure is identical to the NVIDIA version. The only difference is the base image: instead of pytorch/pytorch (which includes CUDA), we use rocm/pytorch (which includes ROCm). PyTorch's API is the same regardless of the backend — torch.cuda.is_available() returns True on ROCm as well, since ROCm maps onto the CUDA API.

Building the Image

module reset
module load utilities Apptainer
apptainer build whisper_amd.sif whisper_amd.def

Running Whisper

Request an AMD GPU node:

srun --partition=gpu_amd --gres=gpu:1 --pty bash

Then load Apptainer and run:

module reset
module load utilities Apptainer
apptainer run --rocm whisper_amd.sif /lustre/shared/hpcCourses/Whisper/audio.mp3 --model base --device cuda

Note that even though we are on an AMD GPU, the --device cuda flag is correct. PyTorch's ROCm backend uses the same cuda device name for compatibility.

Verify GPU access:

apptainer exec --rocm whisper_amd.sif python -c "import torch; print(torch.cuda.is_available())"

Summary

NVIDIA AMD
Partition gpu gpu_amd
Apptainer flag --nv --rocm
Base image pytorch/pytorch (includes CUDA) rocm/pytorch (includes ROCm)
PyTorch device --device cuda --device cuda (same API)
Host devices bound /dev/nvidiaX /dev/dri/, /dev/kfd

The key takeaway: the only things that change between NVIDIA and AMD are the base container image, the Apptainer flag, and the Slurm partition. Your application code stays the same.