|
|
| (9 intermediate revisions by one other user not shown) |
| Line 1: |
Line 1: |
| | | #REDIRECT [[Monitoring Jobs]] |
| == Output stream redirection ==
| |
| | |
| The primary way to monitor job execution is through the stdout and stderr streams. These are redirected to [[https://wiki.hpcagrogenomics.wur.nl/index.php/Creating_sbatch_script#output_.28stderr.2Cstdout.29_directed_to_file | text files specified in the SLURM script]].
| |
| | |
| For this purpose the `tail`command is particularly useful. To continuous follow the output to a text file use:
| |
| | |
| tail -f output_987654.txt
| |
| | |
| To obtain the last X lines of a text file use:
| |
| | |
| tail -n X output_987654.txt
| |
| | |
| Replacing X by the desired number of lines.
| |
| | |
| If the output file gets too long and you wish to read from the begining you may combine the commands cat and less:
| |
| | |
| cat output_987654.txt | less
| |
| | |
| Use the Q key to exit less.
| |
| | |
| == Monitoring resource usage ==
| |
| | |
| While the output streams may suffice in most cases, certain programmes might not provide much feedback. This could be the case with a programme that rellies on modules that are not verbose. In such situations it is best to monitor resource usage to gauge job execution. Two possible options are described below.
| |
| | |
| === Using sstat ===
| |
| | |
| sstat is a SLURM tool that can be used to obtain instantaneous
| |
| | |
| srun python3 calc_pi.py
| |
| | |
| sstat --format=AveCPU,AveRSS,MaxRSS -P -j 7466208
| |
| | |
| === Logging the output of top ===
| |