Monitoring job execution: Difference between revisions
(Created page with " == Output stream redirection == The primary way to monitor job execution is through the stdout and stderr streams. These are redirected to https://wiki.hpcagrogenomics.wur...") |
No edit summary |
||
Line 26: | Line 26: | ||
=== Using sstat === | === Using sstat === | ||
sstat is a SLURM tool that can be used to obtain instantaneous | sstat is a SLURM tool that can be used to obtain instantaneous information on resource usage, CPU load, memory, etc. To use it starting by changing your SLURM script so that your programme starts with the srun command (this should be the last line in the script): | ||
srun python3 calc_pi.py | srun python3 calc_pi.py | ||
sstat --format=AveCPU,AveRSS,MaxRSS -P -j | Note down the job number. During execution you can then use sstat, passing the job number with the -j flag: | ||
sstat --format=AveCPU,AveRSS,MaxRSS -P -j 987654 | |||
sstat can provide information on many different variables, for more details check [[https://slurm.schedmd.com/sstat.html | the manual]]. | |||
=== Logging the output of top === | === Logging the output of top === |
Revision as of 10:13, 23 August 2018
Output stream redirection
The primary way to monitor job execution is through the stdout and stderr streams. These are redirected to [| text files specified in the SLURM script].
For this purpose the `tail`command is particularly useful. To continuous follow the output to a text file use:
tail -f output_987654.txt
To obtain the last X lines of a text file use:
tail -n X output_987654.txt
Replacing X by the desired number of lines.
If the output file gets too long and you wish to read from the begining you may combine the commands cat and less:
cat output_987654.txt | less
Use the Q key to exit less.
Monitoring resource usage
While the output streams may suffice in most cases, certain programmes might not provide much feedback. This could be the case with a programme that rellies on modules that are not verbose. In such situations it is best to monitor resource usage to gauge job execution. Two possible options are described below.
Using sstat
sstat is a SLURM tool that can be used to obtain instantaneous information on resource usage, CPU load, memory, etc. To use it starting by changing your SLURM script so that your programme starts with the srun command (this should be the last line in the script):
srun python3 calc_pi.py
Note down the job number. During execution you can then use sstat, passing the job number with the -j flag:
sstat --format=AveCPU,AveRSS,MaxRSS -P -j 987654
sstat can provide information on many different variables, for more details check [| the manual].