Performance Optimization/Multiple CPUs

From HPCwiki
Revision as of 13:03, 18 June 2026 by Haars0011 (talk | contribs) (IA migration §8: new Performance Optimization/Multiple CPUs (threads/OpenMP within one node) (via create-page on MediaWiki MCP Server))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Many programs can use more than one CPU core at once — through threads, OpenMP, or built-in multiprocessing. To use several cores on a single node, request them from the scheduler and tell your program how many to use. To scale beyond one node, see MPI or job arrays.

Requesting cores

For a threaded program, ask SLURM for cores on one node with --cpus-per-task — for example, 8 cores:

#!/bin/bash
#SBATCH --job-name=threads
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --time=01:00:00

# many tools read this to decide how many threads to use
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

./my_threaded_program

Requesting more cores than your program can actually use wastes the allocation and makes your job wait longer in the queue, so match the request to what the program will use.

Telling your program how many cores to use

Most programs need to be told how many cores to use; they do not pick it up automatically. Common ways:

  • OpenMP programs read the OMP_NUM_THREADS environment variable — set it from $SLURM_CPUS_PER_TASK, as above.
  • Many tools take a flag, for example -t, --threads, or -p — check the program's documentation.
  • In Python, R, and similar, use the language's multiprocessing or parallel facilities and pass the core count explicitly.

See also