Scheduler Overview (Slurm): Difference between revisions
Jump to navigation
Jump to search
Phase 1 § 4 P1.4.1: trim to overview only — content split into Partitions / Queues, Choosing a node (constraints), Batch Jobs, Interactive Jobs, Cancelling Jobs, Monitoring Jobs (separate pages). This page is now the entry point with QoS overview and topic index. (via update-page on MediaWiki MCP Server) |
|||
| (78 intermediate revisions by 9 users not shown) | |||
| Line 1: | Line 1: | ||
The resource allocation | The resource allocation and scheduling software on Anunna is [http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management SLURM]: '''S'''imple '''L'''inux '''U'''tility for '''R'''esource '''M'''anagement. This page is the entry point — most topics have their own page; below is a short summary plus links. | ||
== | == What's on which page == | ||
* [[Partitions / Queues]] — list of partitions (<code>main</code>, <code>gpu</code>, <code>gpu_amd</code>) and how to choose one. | |||
* [[Choosing a node (constraints)]] — defaults, hardware constraints, GPU selection. | |||
< | * [[Batch Jobs]] — writing sbatch scripts and submitting them, including multi-job submissions and dependencies. | ||
* [[Interactive Jobs]] — <code>sinteractive</code> and <code>salloc</code> for live shell sessions on a compute node. | |||
* [[Array jobs]] — running the same script many times with a varying parameter. | |||
* [[Monitoring Jobs]] — <code>squeue</code>, <code>scontrol</code>, <code>sstat</code>, <code>sacct</code>, <code>node_usage_graph</code>. | |||
* [[Cancelling Jobs]] — <code>scancel</code>. | |||
* [[Reservations]] — booking nodes in advance for events. | |||
</ | |||
=== | == Quality of Service == | ||
When submitting a job, you may optionally assign a different Quality of Service (QoS) to it: | |||
<syntaxhighlight lang="bash"> | |||
#SBATCH --qos=std | |||
< | </syntaxhighlight> | ||
#SBATCH -- | |||
The QoS values configured on Anunna: | |||
* '''std''' (priority 10) — the default. Use this unless you have a specific reason to pick another. | |||
* '''low''' (priority 1) — reduced priority, but limited to 8 hours per job so a flood of low-priority jobs cannot lock up the cluster. | |||
* '''high''' (priority 20) — higher priority than <code>std</code>. More expensive — see [[Tariffs]]. | |||
* '''interactive''' (priority 100) — the highest priority, exclusively for immediate-running interactive jobs. You may not submit many or large jobs at this QoS. | |||
</ | |||
Jobs can in principle be restarted and rescheduled if a higher-priority job needs cluster resources, but at the time of writing this preemption is not actually configured. | |||
== | == Running MPI jobs == | ||
For multi-node MPI workloads see [[MPI on B4F cluster | MPI on Anunna]]. | |||
== See also == | |||
* [[Partitions / Queues]] | |||
* [[Choosing a node (constraints)]] | |||
* [[Batch Jobs]] | |||
* [[Interactive Jobs]] | |||
* [[Array jobs]] | |||
* [[Monitoring Jobs]] | |||
* [[Cancelling Jobs]] | |||
* [[Reservations]] | |||
* [[Tariffs | Costs associated with resource usage]] | |||
== | == External links == | ||
* [http://slurm.schedmd.com Slurm official documentation] | * [http://slurm.schedmd.com Slurm official documentation] | ||
* [http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management Slurm on Wikipedia] | * [http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management Slurm on Wikipedia] | ||
Latest revision as of 09:48, 16 June 2026
The resource allocation and scheduling software on Anunna is SLURM: Simple Linux Utility for Resource Management. This page is the entry point — most topics have their own page; below is a short summary plus links.
What's on which page
- Partitions / Queues — list of partitions (
main,gpu,gpu_amd) and how to choose one. - Choosing a node (constraints) — defaults, hardware constraints, GPU selection.
- Batch Jobs — writing sbatch scripts and submitting them, including multi-job submissions and dependencies.
- Interactive Jobs —
sinteractiveandsallocfor live shell sessions on a compute node. - Array jobs — running the same script many times with a varying parameter.
- Monitoring Jobs —
squeue,scontrol,sstat,sacct,node_usage_graph. - Cancelling Jobs —
scancel. - Reservations — booking nodes in advance for events.
Quality of Service
When submitting a job, you may optionally assign a different Quality of Service (QoS) to it:
#SBATCH --qos=std
The QoS values configured on Anunna:
- std (priority 10) — the default. Use this unless you have a specific reason to pick another.
- low (priority 1) — reduced priority, but limited to 8 hours per job so a flood of low-priority jobs cannot lock up the cluster.
- high (priority 20) — higher priority than
std. More expensive — see Tariffs. - interactive (priority 100) — the highest priority, exclusively for immediate-running interactive jobs. You may not submit many or large jobs at this QoS.
Jobs can in principle be restarted and rescheduled if a higher-priority job needs cluster resources, but at the time of writing this preemption is not actually configured.
Running MPI jobs
For multi-node MPI workloads see MPI on Anunna.
See also
- Partitions / Queues
- Choosing a node (constraints)
- Batch Jobs
- Interactive Jobs
- Array jobs
- Monitoring Jobs
- Cancelling Jobs
- Reservations
- Costs associated with resource usage