Monitoring job execution: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
Duque004 (talk | contribs)
Created page with " == Output stream redirection == The primary way to monitor job execution is through the stdout and stderr streams. These are redirected to https://wiki.hpcagrogenomics.wur..."
 
Phase 1 § 4 redirect: content merged into Monitoring Jobs (P1.4.7) (via update-page on MediaWiki MCP Server)
Tag: New redirect
 
(9 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
#REDIRECT [[Monitoring Jobs]]
== Output stream redirection ==
 
The primary way to monitor job execution is through the stdout and stderr streams. These are redirected to [[https://wiki.hpcagrogenomics.wur.nl/index.php/Creating_sbatch_script#output_.28stderr.2Cstdout.29_directed_to_file | text files specified in the SLURM script]].
 
For this purpose the `tail`command is particularly useful. To continuous follow the output to a text file use:
 
tail -f output_987654.txt
 
To obtain the last X lines of a text file use:
 
tail -n X output_987654.txt
 
Replacing X by the desired number of lines.
 
If the output file gets too long and you wish to read from the begining you may combine the commands cat and less:
 
cat output_987654.txt | less
 
Use the Q key to exit less.
 
== Monitoring resource usage ==
 
While the output streams may suffice in most cases, certain programmes might not provide much feedback. This could be the case with a programme that rellies on modules that are not verbose. In such situations it is best to monitor resource usage to gauge job execution. Two possible options are described below.
 
=== Using sstat ===
 
sstat is a SLURM tool that can be used to obtain instantaneous
 
srun python3 calc_pi.py
 
sstat --format=AveCPU,AveRSS,MaxRSS -P -j 7466208
 
=== Logging the output of top ===

Latest revision as of 09:54, 16 June 2026

Redirect to: