Performance Optimization/Multiple nodes (arrayjobs)

From HPCwiki
Jump to navigation Jump to search

When you have many independent tasks to run — the same program over many inputs, parameter values, or samples — a job array is usually the best way to scale out. Instead of one big parallel program, you submit one array of many small jobs, and the scheduler spreads them across the cluster as resources free up.

This is "embarrassingly parallel" work: the tasks do not need to talk to each other (unlike MPI). It is often the simplest and most effective way to use many nodes at once.

When to use a job array

  • You run the same analysis over many files or samples.
  • You sweep a parameter over many values.
  • Each task is independent and can run on its own.

How it works

A job array is a single submission with many tasks, each with its own $SLURM_ARRAY_TASK_ID that selects which input it processes. The scheduler runs as many tasks at once as there is room for and queues the rest. Because each task is a separate job, an array naturally spreads across many nodes without any parallel programming — and if some tasks fail, you can rerun just those.

For the syntax and worked examples, see Array Jobs.

See also