Scheduler Overview (Slurm): Difference between revisions

From HPCwiki
Jump to navigation Jump to search
No edit summary
Phase 1 § 4 P1.4.1: trim to overview only — content split into Partitions / Queues, Choosing a node (constraints), Batch Jobs, Interactive Jobs, Cancelling Jobs, Monitoring Jobs (separate pages). This page is now the entry point with QoS overview and topic index. (via update-page on MediaWiki MCP Server)
 
(121 intermediate revisions by 9 users not shown)
Line 1: Line 1:
== submitting jobs: sbatch ==
The resource allocation and scheduling software on Anunna is [http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management SLURM]: '''S'''imple '''L'''inux '''U'''tility for '''R'''esource '''M'''anagement. This page is the entry point — most topics have their own page; below is a short summary plus links.


  #!/bin/bash
== What's on which page ==
  # #SBATCH --time=100
  #SBATCH --ntasks=1
  #SBATCH --output=output_%j.txt
  #SBATCH --error=error_output_%j.txt
  #SBATCH --job-name=calc_pi.py
  #SBATCH --partition=research


time python3 calc_pi.py
* [[Partitions / Queues]] — list of partitions (<code>main</code>, <code>gpu</code>, <code>gpu_amd</code>) and how to choose one.
* [[Choosing a node (constraints)]] — defaults, hardware constraints, GPU selection.
* [[Batch Jobs]] — writing sbatch scripts and submitting them, including multi-job submissions and dependencies.
* [[Interactive Jobs]] — <code>sinteractive</code> and <code>salloc</code> for live shell sessions on a compute node.
* [[Array jobs]] — running the same script many times with a varying parameter.
* [[Monitoring Jobs]] — <code>squeue</code>, <code>scontrol</code>, <code>sstat</code>, <code>sacct</code>, <code>node_usage_graph</code>.
* [[Cancelling Jobs]] — <code>scancel</code>.
* [[Reservations]] — booking nodes in advance for events.


== allocating resources interactively: sallocate ==
== Quality of Service ==


== running MPI jobs on B4F cluster ==
When submitting a job, you may optionally assign a different Quality of Service (QoS) to it:


== monitoring submitted jobs: squeue ==
<syntaxhighlight lang="bash">
#SBATCH --qos=std
</syntaxhighlight>


== removing jobs from a list: scancel ==
The QoS values configured on Anunna:


== other ==
* '''std''' (priority 10) — the default. Use this unless you have a specific reason to pick another.
* '''low''' (priority 1) — reduced priority, but limited to 8 hours per job so a flood of low-priority jobs cannot lock up the cluster.
* '''high''' (priority 20) — higher priority than <code>std</code>. More expensive — see [[Tariffs]].
* '''interactive''' (priority 100) — the highest priority, exclusively for immediate-running interactive jobs. You may not submit many or large jobs at this QoS.


== external links ==
Jobs can in principle be restarted and rescheduled if a higher-priority job needs cluster resources, but at the time of writing this preemption is not actually configured.
 
== Running MPI jobs ==
 
For multi-node MPI workloads see [[MPI on B4F cluster | MPI on Anunna]].
 
== See also ==
 
* [[Partitions / Queues]]
* [[Choosing a node (constraints)]]
* [[Batch Jobs]]
* [[Interactive Jobs]]
* [[Array jobs]]
* [[Monitoring Jobs]]
* [[Cancelling Jobs]]
* [[Reservations]]
* [[Tariffs | Costs associated with resource usage]]
 
== External links ==
 
* [http://slurm.schedmd.com Slurm official documentation]
* [http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management Slurm on Wikipedia]

Latest revision as of 09:48, 16 June 2026

The resource allocation and scheduling software on Anunna is SLURM: Simple Linux Utility for Resource Management. This page is the entry point — most topics have their own page; below is a short summary plus links.

What's on which page

Quality of Service

When submitting a job, you may optionally assign a different Quality of Service (QoS) to it:

#SBATCH --qos=std

The QoS values configured on Anunna:

  • std (priority 10) — the default. Use this unless you have a specific reason to pick another.
  • low (priority 1) — reduced priority, but limited to 8 hours per job so a flood of low-priority jobs cannot lock up the cluster.
  • high (priority 20) — higher priority than std. More expensive — see Tariffs.
  • interactive (priority 100) — the highest priority, exclusively for immediate-running interactive jobs. You may not submit many or large jobs at this QoS.

Jobs can in principle be restarted and rescheduled if a higher-priority job needs cluster resources, but at the time of writing this preemption is not actually configured.

Running MPI jobs

For multi-node MPI workloads see MPI on Anunna.

See also