Monitoring job execution

Output stream redirection

The primary way to monitor job execution is through the stdout and stderr streams. These are redirected to [| text files specified in the SLURM script].

For this purpose the `tail`command is particularly useful. To continuous follow the output to a text file use:

tail -f output_987654.txt

To obtain the last X lines of a text file use:

tail -n X output_987654.txt

Replacing X by the desired number of lines.

If the output file gets too long and you wish to read from the begining you may combine the commands cat and less:

cat output_987654.txt | less

Use the Q key to exit less.

Monitoring resource usage

While the output streams may suffice in most cases, certain programmes might not provide much feedback. This could be the case with a programme that rellies on modules that are not verbose. In such situations it is best to monitor resource usage to gauge job execution. Two possible options are described below.

Using sstat

sstat is a SLURM tool that can be used to obtain instantaneous

srun python3 calc_pi.py

sstat --format=AveCPU,AveRSS,MaxRSS -P -j 7466208

Monitoring job execution

Contents

Output stream redirection

Monitoring resource usage

Using sstat

Logging the output of top

Navigation menu

Monitoring job execution

Output stream redirection

Monitoring resource usage

Using sstat

Logging the output of top

Navigation menu

Search