Choosing a node (constraints)

From HPCwiki
Revision as of 09:06, 16 June 2026 by Haars0011 (talk | contribs) (Phase 1 § 4 P1.4.3: split Choosing a node (constraints) out of Using Slurm § Defaults + § Using GPU (via create-page on MediaWiki MCP Server))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

SLURM picks a node for your job based on the partition you choose and the constraints you set. If you don't tell SLURM what you need, it uses defaults; for anything that exceeds a default, you have to be explicit.

Defaults

If you submit a job without any directives, SLURM assumes:

  • Partition: main
  • Quality of Service: std
  • CPUs: 1
  • Wall time: 1 hour
  • Maximum wall time: 3 weeks
  • Memory: 100 MB per node

For anything beyond these limits, set the relevant #SBATCH directive in your sbatch script.

Selecting GPU hardware

The cluster has nodes with NVIDIA GPUs (partition gpu) and nodes with AMD GPUs (partition gpu_amd). To run on a GPU node you need to do two things in your sbatch script: select the right partition, and request the GPU as a generic resource.

NVIDIA GPUs

Six nodes with NVIDIA GPUs are available in the gpu partition. Request one with:

#SBATCH --partition=gpu
#SBATCH --gres=gpu:1

Replace 1 with the number of GPUs you need. The --gres line is mandatory — without it, your job will either fail or run on the CPU instead of the GPU.

If you don't ask for a particular GPU model, SLURM picks whatever is free. The scheduler is configured to prefer A100s first, then A6000s, then V100s. The price per GPU-hour is the same regardless of model.

To restrict the job to a specific model, add a --constraint directive. To see the constraint names available for GPU nodes, run:

scontrol show -o node | grep -o -e "NodeName=\w*" -e "ActiveFeatures=[[:alnum:][:punct:]]*" | paste - - | column -t | grep gpu

Then constrain the job to one of the listed models:

# Restrict to A100 GPUs
#SBATCH --constraint='nvidia&A100'

A rough rule of thumb: the A100/80 GB cards are about twice as fast as the A6000/48 GB or V100/16 GB. Whether you can take advantage of that depends on whether your workload can actually use the extra memory and saturate the GPU.

AMD GPUs

Two nodes with AMD GPUs are available in the gpu_amd partition. The mechanics are identical to NVIDIA — request the partition and a GPU as a generic resource:

#SBATCH --partition=gpu_amd
#SBATCH --gres=gpu:1

See also