Performance Optimization/Multiple nodes (MPI): Difference between revisions

From HPCwiki
Jump to navigation Jump to search
Fixed first part of markup
IA migration §8: rewrite — keep OpenMPI hello-world (+ bucket load), drop stale mvapich2/ib0 B4F example, TODO for current MPI/interconnect setup (via update-page on MediaWiki MCP Server)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
MPI (the Message Passing Interface) lets a single program run across many cores, and even many nodes, at once by passing messages between its processes. Use it for software written to scale beyond one node. To use several cores within a single node, see [[Performance Optimization/Multiple CPUs]]; for many independent tasks, see [[Performance Optimization/Multiple nodes (arrayjobs)]].


== A simple 'Hello World' example ==
== Compiling an MPI program ==
Consider the following simple MPI version, in C, of the 'Hello World' example:


#include <stdio.h>
Load a software bucket, then a compiler and an MPI library, through the module system. A bucket has to be loaded before its modules are visible (see [[Environment Modules]]). To avoid library conflicts it is safest to start from a clean environment — note that purging also removes the <code>slurm</code> module, so reload it:
#include <mpi.h>
int main(int argc, char ** argv) {
  int size,rank,namelen;
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  MPI_Comm_size(MPI_COMM_WORLD,&size);
  MPI_Get_processor_name(processor_name, &namelen);
  printf("Hello MPI! Process %d of %d on %s\n", rank, size, processor_name);
  MPI_Finalize();
}


Before compiling, make sure that the compilers that are required available.
<syntaxhighlight lang="bash">
module list
module purge
module load 2024
module load gcc openmpi/gcc slurm
</syntaxhighlight>


To avoid conflicts between libraries, the safest way is purging all modules:
As a simple example, here is the classic MPI "Hello World" in C:
module purge


Then load both gcc and openmpi libraries. If modules were purged, then slurm needs to be reloaded too.
<syntaxhighlight lang="c">
module load gcc openmpi/gcc slurm
#include <stdio.h>
#include <mpi.h>
int main(int argc, char **argv) {
    int size, rank, namelen;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Get_processor_name(processor_name, &namelen);
    printf("Hello MPI! Process %d of %d on %s\n", rank, size, processor_name);
    MPI_Finalize();
}
</syntaxhighlight>


Compile the <code>hello_mpi.c</code> code.
Compile it with the MPI compiler wrapper:
mpicc hello_mpi.c -o test_hello_world


If desired, a list of libraries compiled into the executable can be viewed:
<syntaxhighlight lang="bash">
ldd test_hello_world
mpicc hello_mpi.c -o hello_mpi
</syntaxhighlight>


linux-vdso.so.1 (0x00007ffc6fb18000)
== Running an MPI program ==
libmpi.so.40 => /usr/lib/x86_64-linux-gnu/libmpi.so.40 (0x000014d19dfb2000)
libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x000014d19df8f000)
libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x000014d19dd9d000)
libopen-rte.so.40 => /usr/lib/x86_64-linux-gnu/libopen-rte.so.40 (0x000014d19dce3000)
libopen-pal.so.40 => /usr/lib/x86_64-linux-gnu/libopen-pal.so.40 (0x000014d19dc33000)
libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x000014d19dae4000)
libhwloc.so.15 => /usr/lib/x86_64-linux-gnu/libhwloc.so.15 (0x000014d19da93000)
/lib64/ld-linux-x86-64.so.2 (0x000014d19e0d9000)
libz.so.1 => /usr/lib/x86_64-linux-gnu/libz.so.1 (0x000014d19da77000)
libevent-2.1.so.7 => /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7 (0x000014d19da21000)
libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x000014d19da1b000)
libutil.so.1 => /usr/lib/x86_64-linux-gnu/libutil.so.1 (0x000014d19da14000)
libevent_pthreads-2.1.so.7 => /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x000014d19da0f000)
libudev.so.1 => /usr/lib/x86_64-linux-gnu/libudev.so.1 (0x000014d19d9e3000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x000014d19d9d8000)


Running the executable on two nodes, with four tasks per node, can be done like this:
Launch the MPI processes with <code>srun</code>, which spreads them across the nodes your job was allocated. For example, two nodes with four tasks each:
srun --nodes=2 --ntasks-per-node=4 --mpi=openmpi ./test_hello_world


This will result in the following output:
<syntaxhighlight lang="bash">
  Hello MPI! Process 4 of 8 on node011
srun --nodes=2 --ntasks-per-node=4 ./hello_mpi
  Hello MPI! Process 1 of 8 on node010
</syntaxhighlight>
  Hello MPI! Process 7 of 8 on node011
  Hello MPI! Process 6 of 8 on node011
  Hello MPI! Process 5 of 8 on node011
  Hello MPI! Process 2 of 8 on node010
  Hello MPI! Process 0 of 8 on node010
  Hello MPI! Process 3 of 8 on node010


== A mvapich2 sbatch example ==
In a batch job, request the nodes and tasks with <code>#SBATCH</code> and launch with <code>srun</code>:
A mpi job using mvapich2 on 32 cores, using the normal compute nodes and the fast infiniband interconnect for RDMA traffic.
<source lang='bash'>
$ module load mvapich2/gcc
$ vim batch.sh
#!/bin/sh
#SBATCH --comment=projectx
#SBATCH --time=30-0
#SBATCH  -n 32
#SBATCH --constraint=4gpercpu
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=MPItest
#SBATCH --mail-type=ALL
#SBATCH --mail-user=user@wur.nl
echo "Starting at `date`"
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running on $SLURM_NPROCS processors."
echo "Current working directory is `pwd`"
# echo "Env var MPIR_CVAR_NEMESIS_TCP_NETWORK_IFACE is $MPIR_CVAR_NEMESIS_TCP_NETWORK_IFACE"
# export MPIR_CVAR_NEMESIS_TCP_NETWORK_IFACE=ib0


mpirun -iface ib0 -np 32 ./tmf_par.out -NX 480 -NY 240 -alpha  11 -chi 1.3 -psi_b 5e--beta  0.0 -zeta 3.5 -kT 0.10
<syntaxhighlight lang="bash">
#!/bin/bash
#SBATCH --job-name=mpi-test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --time=00:30:00
#SBATCH --output=mpi-%j.out


echo "Program finished with exit code $? at: `date`"
module purge
module load 2024
module load gcc openmpi/gcc slurm


$ sbatch batch.sh
srun ./hello_mpi
</syntaxhighlight>


</source>
<!-- TODO: confirm the recommended MPI library and module names for the current cluster, any required `srun --mpi=...` plugin flag, and whether specific interconnect/fabric tuning is needed. The previous page documented an mvapich2 + InfiniBand (ib0) setup from the Breed4Food cluster, which may no longer match Anunna's hardware. -->
 
== See also ==
* [[Performance Optimization/Multiple CPUs]]
* [[Performance Optimization/Multiple nodes (arrayjobs)]]
* [[Environment Modules]]
* [[Batch Jobs]]
* [[Scheduler Overview (Slurm)]]
* [[Cluster Architecture Overview]]

Latest revision as of 13:03, 18 June 2026

MPI (the Message Passing Interface) lets a single program run across many cores, and even many nodes, at once by passing messages between its processes. Use it for software written to scale beyond one node. To use several cores within a single node, see Performance Optimization/Multiple CPUs; for many independent tasks, see Performance Optimization/Multiple nodes (arrayjobs).

Compiling an MPI program

Load a software bucket, then a compiler and an MPI library, through the module system. A bucket has to be loaded before its modules are visible (see Environment Modules). To avoid library conflicts it is safest to start from a clean environment — note that purging also removes the slurm module, so reload it:

module purge
module load 2024
module load gcc openmpi/gcc slurm

As a simple example, here is the classic MPI "Hello World" in C:

#include <stdio.h>
#include <mpi.h>
int main(int argc, char **argv) {
    int size, rank, namelen;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Get_processor_name(processor_name, &namelen);
    printf("Hello MPI! Process %d of %d on %s\n", rank, size, processor_name);
    MPI_Finalize();
}

Compile it with the MPI compiler wrapper:

mpicc hello_mpi.c -o hello_mpi

Running an MPI program

Launch the MPI processes with srun, which spreads them across the nodes your job was allocated. For example, two nodes with four tasks each:

srun --nodes=2 --ntasks-per-node=4 ./hello_mpi

In a batch job, request the nodes and tasks with #SBATCH and launch with srun:

#!/bin/bash
#SBATCH --job-name=mpi-test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --time=00:30:00
#SBATCH --output=mpi-%j.out

module purge
module load 2024
module load gcc openmpi/gcc slurm

srun ./hello_mpi


See also