Performance Optimization/Multiple nodes (MPI)
MPI (the Message Passing Interface) lets a single program run across many cores, and even many nodes, at once by passing messages between its processes. Use it for software written to scale beyond one node. To use several cores within a single node, see Performance Optimization/Multiple CPUs; for many independent tasks, see Performance Optimization/Multiple nodes (arrayjobs).
Compiling an MPI program
Load a software bucket, then a compiler and an MPI library, through the module system. A bucket has to be loaded before its modules are visible (see Environment Modules). To avoid library conflicts it is safest to start from a clean environment — note that purging also removes the slurm module, so reload it:
module purge
module load 2024
module load gcc openmpi/gcc slurm
As a simple example, here is the classic MPI "Hello World" in C:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char **argv) {
int size, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Get_processor_name(processor_name, &namelen);
printf("Hello MPI! Process %d of %d on %s\n", rank, size, processor_name);
MPI_Finalize();
}
Compile it with the MPI compiler wrapper:
mpicc hello_mpi.c -o hello_mpi
Running an MPI program
Launch the MPI processes with srun, which spreads them across the nodes your job was allocated. For example, two nodes with four tasks each:
srun --nodes=2 --ntasks-per-node=4 ./hello_mpi
In a batch job, request the nodes and tasks with #SBATCH and launch with srun:
#!/bin/bash
#SBATCH --job-name=mpi-test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --time=00:30:00
#SBATCH --output=mpi-%j.out
module purge
module load 2024
module load gcc openmpi/gcc slurm
srun ./hello_mpi