The resource allocation / scheduling software on the B4F Cluster is [http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management SLURM]: '''S'''imple '''L'''inux '''U'''tility for '''R'''esource '''M'''anagement.

== Submitting jobs: sbatch ==

=== Example ===
Consider this simple python3 script that should calculate Pi to 1 million digits:
<source lang='python'>
from decimal import *
D=Decimal
getcontext().prec=10000000
p=sum(D(1)/16**k*(D(4)/(8*k+1)-D(2)/(8*k+4)-D(1)/(8*k+5)-D(1)/(8*k+6))for k in range(411))
print(str(p)[:10000002])
</source>

=== Loading modules ===
In order for this script to run, the first thing that is needed is that Python3, which is not the default Python version on the cluster, is load into your environment. Availability of (different versions of) software can be checked by the following command:
module avail

In the list you should note that python3 is indeed available to be loaded, which then can be loaded with the following command:
module load python/3.3.3

=== Batch script ===
The following shell/slurm script can then be used to schedule the job using the sbatch command:
<source lang='bash'>
#!/bin/bash
#SBATCH --time=1200
#SBATCH --ntasks=1
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=calc_pi.py
#SBATCH --partition=ABGC
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email@org.nl

time python3 calc_pi.py
</source>
Explanation of used SBATCH parameters:
<source lang='bash'>
#SBATCH --time=1200
</source>
A time limit of zero requests that no time limit be imposed. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds". So in this example the job will run for a maximum of 1200 minutes.
<source lang='bash'>
#SBATCH --ntasks=1
</source>
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the SLURM controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node, but note that the --cpus-per-task option will change this default.
<source lang='bash'>
#SBATCH --output=output_%j.txt
</source>
Instruct SLURM to connect the batch script's standard output directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to a file of the name "slurm-%j.out", where the "%j" is replaced with the job allocation number. See the --input option for filename specification options.
<source lang='bash'>
#SBATCH --error=error_output_%j.txt
</source>
Instruct SLURM to connect the batch script's standard error directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to a file of the name "slurm-%j.out", where the "%j" is replaced with the job allocation number. See the --input option for filename specification options.
<source lang='bash'>
#SBATCH --job-name=calc_pi.py
</source>
Specify a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system. The default is the name of the batch script, or just "sbatch" if the script is read on sbatch's standard input.
<source lang='bash'>
#SBATCH --partition=research
</source>
Request a specific partition for the resource allocation. It is prefered to use your organizations partition.
<source lang='bash'>
#SBATCH --mail-type=ALL
</source>
Notify user by email when certain event types occur. Valid type values are BEGIN, END, FAIL, REQUEUE, and ALL (any state change). The user to be notified is indicated with --mail-user.
<source lang='bash'>
#SBATCH --mail-user=email@org.nl
</source>
Email address to use.

=== Submitting ===
The script, assuming it was named 'run_calc_pi.sh', can then be posted using the following command:
<source lang='bash'>
sbatch run_calc_pi.sh
</source>

=== Submitting multiple jobs ===
Assuming there are 10 job scripts, name runscript_1.sh through runscript_10.sh, all these scripts can be submitted using the following line of shell code:
<source lang='bash'>for i in `seq 1 10`; do echo $i; sbatch runscript_$i.sh;done
</source>

== monitoring submitted jobs: squeue ==
Once a job is submitted, the status can be monitored using the <code>squeue</code> command. The <code>squeue</code> command has a number of parameters for monitoring specific properties of the jobs such as time limit.

=== Generic monitoring of all running jobs ===
<source lang='bash'>
squeue
</source>

You should then get a list of jobs that are running at that time on the cluster, for the example on how to submit using the 'sbatch' command, it may look like so:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3396 ABGC BOV-WUR- megen002 R 27:26 1 node004
3397 ABGC BOV-WUR- megen002 R 27:26 1 node005
3398 ABGC BOV-WUR- megen002 R 27:26 1 node006
3399 ABGC BOV-WUR- megen002 R 27:26 1 node007
3400 ABGC BOV-WUR- megen002 R 27:26 1 node008
3401 ABGC BOV-WUR- megen002 R 27:26 1 node009
3385 research BOV-WUR- megen002 R 44:38 1 node049
3386 research BOV-WUR- megen002 R 44:38 1 node050
3387 research BOV-WUR- megen002 R 44:38 1 node051
3388 research BOV-WUR- megen002 R 44:38 1 node052
3389 research BOV-WUR- megen002 R 44:38 1 node053
3390 research BOV-WUR- megen002 R 44:38 1 node054
3391 research BOV-WUR- megen002 R 44:38 3 node[049-051]
3392 research BOV-WUR- megen002 R 44:38 3 node[052-054]
3393 research BOV-WUR- megen002 R 44:38 1 node001
3394 research BOV-WUR- megen002 R 44:38 1 node002
3395 research BOV-WUR- megen002 R 44:38 1 node003

=== Monitoring time limit set for a specific job ===
The default time limit is set at one hour. Estimated run times need to be specified when running jobs. To see what the time limit is that is set for a certain job, this can be done using the <code>squeue</code> command.
<source lang='bash'>
squeue -l -j 3532
</source>
Information similar to the following should appear:
Fri Nov 29 15:41:00 2013
JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON)
3532 ABGC BOV-WUR- megen002 RUNNING 2:47:03 3-08:00:00 1 node054

=== Query a specific active job: scontrol ===
Show all the details of a currently active job, so not a completed job.
<source lang='bash'>
nfs01 ~]$ scontrol show jobid 4241
JobId=4241 Name=WB20F06
UserId=megen002(16795409) GroupId=domain users(16777729)
Priority=1 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=02:55:25 TimeLimit=3-08:00:00 TimeMin=N/A
SubmitTime=2013-12-09T13:37:29 EligibleTime=2013-12-09T13:37:29
StartTime=2013-12-09T13:37:29 EndTime=2013-12-12T21:37:29
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=research AllocNode:Sid=nfs01:21799
ReqNodeList=(null) ExcNodeList=(null)
NodeList=node023
BatchHost=node023
NumNodes=1 NumCPUs=4 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/lustre/scratch/WUR/ABGC/...
WorkDir=/lustre/scratch/WUR/ABGC/...
</source>

== Removing jobs from a list: scancel ==
If for some reason you want to delete a job that is either in the queue or already running, you can remove it using the 'scancel' command. The 'scancel' command takes the jobid as a parameter. For the example above, this would be done using the following code:
<source lang='bash'>
scancel 3401
</source>

== Allocating resources interactively: sallocate ==
< text here>

== Get overview of past and current jobs: sacct ==
To do some accounting on past and present jobs, and to see whether they ran to completion, you can do:
<source lang='bash'>
sacct
</source>
This should provide information similar to the following:

JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
3385 BOV-WUR-58 research 12 COMPLETED 0:0
3385.batch batch 1 COMPLETED 0:0
3386 BOV-WUR-59 research 12 CANCELLED+ 0:0
3386.batch batch 1 CANCELLED 0:15
3528 BOV-WUR-59 ABGC 16 RUNNING 0:0
3529 BOV-WUR-60 ABGC 16 RUNNING 0:0

Or in more detail for a specific job:
<source lang='bash'>
sacct --format=jobid,jobname,account,partition,ntasks,alloccpus,elapsed,state,exitcode -j 4220
</source>
This should provide information about job id 4220:
<source lang='bash'>
JobID JobName Account Partition NTasks AllocCPUS Elapsed State ExitCode
------------ ---------- ---------- ---------- -------- ---------- ---------- ---------- --------
4220 PreProces+ research 3 00:30:52 COMPLETED 0:0
4220.batch batch 1 1 00:30:52 COMPLETED 0:0
</source>

== Running MPI jobs on B4F cluster ==
< text here >

== Understanding which resources are available to you: sinfo ==
By using the 'sinfo' command you can retrieve information on which 'Partitions' are available to you. A 'Partition' using SLURM is similar to the 'queue' when submitting using the Sun Grid Engine ('qsub'). The different Partitions grant different levels of resource allocation. Not all defined Partitions will be available to any given person. E.g., Master students will only have the Student Partition available, researchers at the ABGC will have 'student', 'research', and 'ABGC' partitions available. The higher the level of resource allocation, though, the higher the cost per compute-hour. The default Partition is the 'student' partition. A full list of Partitions can be found from the Bright Cluster Manager webpage.

<source lang='bash'>
sinfo
</source>

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
student* up infinite 12 down* node[043-048,055-060]
student* up infinite 50 idle fat[001-002],node[001-042,049-054]
research up infinite 12 down* node[043-048,055-060]
research up infinite 50 idle fat[001-002],node[001-042,049-054]
ABGC up infinite 12 down* node[043-048,055-060]
ABGC up infinite 50 idle fat[001-002],node[001-042,049-054]

== See also ==
* [[B4F_cluster | B4F Cluster]]
* [[BCM_on_B4F_cluster | BCM on B4F cluster]]
* [[Setting_up_Python_virtualenv | Setting up and using a virtual environment for Python3 ]]

== External links ==
* [http://slurm.schedmd.com Slurm official documentation]
* [http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management Slurm on Wikipedia]
* [http://www.youtube.com/watch?v=axWffyrk3aY Slurm Tutorial on Youtube]

Scheduler Overview (Slurm)

2013-12-09T15:41:43Z

Bohme001: /* Get overview of past and current jobs: sacct */

The resource allocation / scheduling software on the B4F Cluster is [http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management SLURM]: '''S'''imple '''L'''inux '''U'''tility for '''R'''esource '''M'''anagement.

== Submitting jobs: sbatch ==

=== Example ===
Consider this simple python3 script that should calculate Pi to 1 million digits:
<source lang='python'>
from decimal import *
D=Decimal
getcontext().prec=10000000
p=sum(D(1)/16**k*(D(4)/(8*k+1)-D(2)/(8*k+4)-D(1)/(8*k+5)-D(1)/(8*k+6))for k in range(411))
print(str(p)[:10000002])
</source>

=== Loading modules ===
In order for this script to run, the first thing that is needed is that Python3, which is not the default Python version on the cluster, is load into your environment. Availability of (different versions of) software can be checked by the following command:
module avail

In the list you should note that python3 is indeed available to be loaded, which then can be loaded with the following command:
module load python/3.3.3

=== Batch script ===
The following shell/slurm script can then be used to schedule the job using the sbatch command:
<source lang='bash'>
#!/bin/bash
#SBATCH --time=1200
#SBATCH --ntasks=1
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=calc_pi.py
#SBATCH --partition=ABGC
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email@org.nl

time python3 calc_pi.py
</source>
Explanation of used SBATCH parameters:
<source lang='bash'>
#SBATCH --time=1200
</source>
A time limit of zero requests that no time limit be imposed. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds". So in this example the job will run for a maximum of 1200 minutes.
<source lang='bash'>
#SBATCH --ntasks=1
</source>
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the SLURM controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node, but note that the --cpus-per-task option will change this default.
<source lang='bash'>
#SBATCH --output=output_%j.txt
</source>
Instruct SLURM to connect the batch script's standard output directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to a file of the name "slurm-%j.out", where the "%j" is replaced with the job allocation number. See the --input option for filename specification options.
<source lang='bash'>
#SBATCH --error=error_output_%j.txt
</source>
Instruct SLURM to connect the batch script's standard error directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to a file of the name "slurm-%j.out", where the "%j" is replaced with the job allocation number. See the --input option for filename specification options.
<source lang='bash'>
#SBATCH --job-name=calc_pi.py
</source>
Specify a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system. The default is the name of the batch script, or just "sbatch" if the script is read on sbatch's standard input.
<source lang='bash'>
#SBATCH --partition=research
</source>
Request a specific partition for the resource allocation. It is prefered to use your organizations partition.
<source lang='bash'>
#SBATCH --mail-type=ALL
</source>
Notify user by email when certain event types occur. Valid type values are BEGIN, END, FAIL, REQUEUE, and ALL (any state change). The user to be notified is indicated with --mail-user.
<source lang='bash'>
#SBATCH --mail-user=email@org.nl
</source>
Email address to use.

=== Submitting ===
The script, assuming it was named 'run_calc_pi.sh', can then be posted using the following command:
<source lang='bash'>
sbatch run_calc_pi.sh
</source>

=== Submitting multiple jobs ===
Assuming there are 10 job scripts, name runscript_1.sh through runscript_10.sh, all these scripts can be submitted using the following line of shell code:
<source lang='bash'>for i in `seq 1 10`; do echo $i; sbatch runscript_$i.sh;done
</source>

== monitoring submitted jobs: squeue ==
Once a job is submitted, the status can be monitored using the <code>squeue</code> command. The <code>squeue</code> command has a number of parameters for monitoring specific properties of the jobs such as time limit.

=== Generic monitoring of all running jobs ===
<source lang='bash'>
squeue
</source>

You should then get a list of jobs that are running at that time on the cluster, for the example on how to submit using the 'sbatch' command, it may look like so:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3396 ABGC BOV-WUR- megen002 R 27:26 1 node004
3397 ABGC BOV-WUR- megen002 R 27:26 1 node005
3398 ABGC BOV-WUR- megen002 R 27:26 1 node006
3399 ABGC BOV-WUR- megen002 R 27:26 1 node007
3400 ABGC BOV-WUR- megen002 R 27:26 1 node008
3401 ABGC BOV-WUR- megen002 R 27:26 1 node009
3385 research BOV-WUR- megen002 R 44:38 1 node049
3386 research BOV-WUR- megen002 R 44:38 1 node050
3387 research BOV-WUR- megen002 R 44:38 1 node051
3388 research BOV-WUR- megen002 R 44:38 1 node052
3389 research BOV-WUR- megen002 R 44:38 1 node053
3390 research BOV-WUR- megen002 R 44:38 1 node054
3391 research BOV-WUR- megen002 R 44:38 3 node[049-051]
3392 research BOV-WUR- megen002 R 44:38 3 node[052-054]
3393 research BOV-WUR- megen002 R 44:38 1 node001
3394 research BOV-WUR- megen002 R 44:38 1 node002
3395 research BOV-WUR- megen002 R 44:38 1 node003

=== Monitoring time limit set for a specific job ===
The default time limit is set at one hour. Estimated run times need to be specified when running jobs. To see what the time limit is that is set for a certain job, this can be done using the <code>squeue</code> command.
<source lang='bash'>
squeue -l -j 3532
</source>
Information similar to the following should appear:
Fri Nov 29 15:41:00 2013
JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON)
3532 ABGC BOV-WUR- megen002 RUNNING 2:47:03 3-08:00:00 1 node054

=== Query a specific active job: scontrol ===
Show all the details of a currently active job, so not a completed job.
<source lang='bash'>
nfs01 ~]$ scontrol show jobid 4241
JobId=4241 Name=WB20F06
UserId=megen002(16795409) GroupId=domain users(16777729)
Priority=1 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=02:55:25 TimeLimit=3-08:00:00 TimeMin=N/A
SubmitTime=2013-12-09T13:37:29 EligibleTime=2013-12-09T13:37:29
StartTime=2013-12-09T13:37:29 EndTime=2013-12-12T21:37:29
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=research AllocNode:Sid=nfs01:21799
ReqNodeList=(null) ExcNodeList=(null)
NodeList=node023
BatchHost=node023
NumNodes=1 NumCPUs=4 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/lustre/scratch/WUR/ABGC/...
WorkDir=/lustre/scratch/WUR/ABGC/...
</source>

== Removing jobs from a list: scancel ==
If for some reason you want to delete a job that is either in the queue or already running, you can remove it using the 'scancel' command. The 'scancel' command takes the jobid as a parameter. For the example above, this would be done using the following code:
<source lang='bash'>
scancel 3401
</source>

== Allocating resources interactively: sallocate ==
< text here>

== Get overview of past and current jobs: sacct ==
To do some accounting on past and present jobs, and to see whether they ran to completion, you can do:
<source lang='bash'>
sacct
</source>
This should provide information similar to the following:

JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
3385 BOV-WUR-58 research 12 COMPLETED 0:0
3385.batch batch 1 COMPLETED 0:0
3386 BOV-WUR-59 research 12 CANCELLED+ 0:0
3386.batch batch 1 CANCELLED 0:15
3528 BOV-WUR-59 ABGC 16 RUNNING 0:0
3529 BOV-WUR-60 ABGC 16 RUNNING 0:0

Or in more detail for a specific job:
<source lang='bash'>
sacct --format=jobid,jobname,account,partition,ntasks,alloccpus,elapsed,state,exitcode -j 4220
</source>
This should provide information about job ud 4220:
JobID JobName Account Partition NTasks AllocCPUS Elapsed State ExitCode
------------ ---------- ---------- ---------- -------- ---------- ---------- ---------- --------
4220 PreProces+ research 3 00:30:52 COMPLETED 0:0
4220.batch batch 1 1 00:30:52 COMPLETED 0:0

== Running MPI jobs on B4F cluster ==
< text here >

== Understanding which resources are available to you: sinfo ==
By using the 'sinfo' command you can retrieve information on which 'Partitions' are available to you. A 'Partition' using SLURM is similar to the 'queue' when submitting using the Sun Grid Engine ('qsub'). The different Partitions grant different levels of resource allocation. Not all defined Partitions will be available to any given person. E.g., Master students will only have the Student Partition available, researchers at the ABGC will have 'student', 'research', and 'ABGC' partitions available. The higher the level of resource allocation, though, the higher the cost per compute-hour. The default Partition is the 'student' partition. A full list of Partitions can be found from the Bright Cluster Manager webpage.

<source lang='bash'>
sinfo
</source>

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
student* up infinite 12 down* node[043-048,055-060]
student* up infinite 50 idle fat[001-002],node[001-042,049-054]
research up infinite 12 down* node[043-048,055-060]
research up infinite 50 idle fat[001-002],node[001-042,049-054]
ABGC up infinite 12 down* node[043-048,055-060]
ABGC up infinite 50 idle fat[001-002],node[001-042,049-054]

== See also ==
* [[B4F_cluster | B4F Cluster]]
* [[BCM_on_B4F_cluster | BCM on B4F cluster]]
* [[Setting_up_Python_virtualenv | Setting up and using a virtual environment for Python3 ]]

== External links ==
* [http://slurm.schedmd.com Slurm official documentation]
* [http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management Slurm on Wikipedia]
* [http://www.youtube.com/watch?v=axWffyrk3aY Slurm Tutorial on Youtube]

Scheduler Overview (Slurm)

2013-12-09T15:38:51Z

Bohme001: /* Query a specific active job: scontrol */

The resource allocation / scheduling software on the B4F Cluster is [http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management SLURM]: '''S'''imple '''L'''inux '''U'''tility for '''R'''esource '''M'''anagement.

== Submitting jobs: sbatch ==

=== Example ===
Consider this simple python3 script that should calculate Pi to 1 million digits:
<source lang='python'>
from decimal import *
D=Decimal
getcontext().prec=10000000
p=sum(D(1)/16**k*(D(4)/(8*k+1)-D(2)/(8*k+4)-D(1)/(8*k+5)-D(1)/(8*k+6))for k in range(411))
print(str(p)[:10000002])
</source>

=== Loading modules ===
In order for this script to run, the first thing that is needed is that Python3, which is not the default Python version on the cluster, is load into your environment. Availability of (different versions of) software can be checked by the following command:
module avail

In the list you should note that python3 is indeed available to be loaded, which then can be loaded with the following command:
module load python/3.3.3

=== Batch script ===
The following shell/slurm script can then be used to schedule the job using the sbatch command:
<source lang='bash'>
#!/bin/bash
#SBATCH --time=1200
#SBATCH --ntasks=1
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=calc_pi.py
#SBATCH --partition=ABGC
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email@org.nl

time python3 calc_pi.py
</source>
Explanation of used SBATCH parameters:
<source lang='bash'>
#SBATCH --time=1200
</source>
A time limit of zero requests that no time limit be imposed. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds". So in this example the job will run for a maximum of 1200 minutes.
<source lang='bash'>
#SBATCH --ntasks=1
</source>
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the SLURM controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node, but note that the --cpus-per-task option will change this default.
<source lang='bash'>
#SBATCH --output=output_%j.txt
</source>
Instruct SLURM to connect the batch script's standard output directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to a file of the name "slurm-%j.out", where the "%j" is replaced with the job allocation number. See the --input option for filename specification options.
<source lang='bash'>
#SBATCH --error=error_output_%j.txt
</source>
Instruct SLURM to connect the batch script's standard error directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to a file of the name "slurm-%j.out", where the "%j" is replaced with the job allocation number. See the --input option for filename specification options.
<source lang='bash'>
#SBATCH --job-name=calc_pi.py
</source>
Specify a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system. The default is the name of the batch script, or just "sbatch" if the script is read on sbatch's standard input.
<source lang='bash'>
#SBATCH --partition=research
</source>
Request a specific partition for the resource allocation. It is prefered to use your organizations partition.
<source lang='bash'>
#SBATCH --mail-type=ALL
</source>
Notify user by email when certain event types occur. Valid type values are BEGIN, END, FAIL, REQUEUE, and ALL (any state change). The user to be notified is indicated with --mail-user.
<source lang='bash'>
#SBATCH --mail-user=email@org.nl
</source>
Email address to use.

=== Submitting ===
The script, assuming it was named 'run_calc_pi.sh', can then be posted using the following command:
<source lang='bash'>
sbatch run_calc_pi.sh
</source>

=== Submitting multiple jobs ===
Assuming there are 10 job scripts, name runscript_1.sh through runscript_10.sh, all these scripts can be submitted using the following line of shell code:
<source lang='bash'>for i in `seq 1 10`; do echo $i; sbatch runscript_$i.sh;done
</source>

== monitoring submitted jobs: squeue ==
Once a job is submitted, the status can be monitored using the <code>squeue</code> command. The <code>squeue</code> command has a number of parameters for monitoring specific properties of the jobs such as time limit.

=== Generic monitoring of all running jobs ===
<source lang='bash'>
squeue
</source>

You should then get a list of jobs that are running at that time on the cluster, for the example on how to submit using the 'sbatch' command, it may look like so:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3396 ABGC BOV-WUR- megen002 R 27:26 1 node004
3397 ABGC BOV-WUR- megen002 R 27:26 1 node005
3398 ABGC BOV-WUR- megen002 R 27:26 1 node006
3399 ABGC BOV-WUR- megen002 R 27:26 1 node007
3400 ABGC BOV-WUR- megen002 R 27:26 1 node008
3401 ABGC BOV-WUR- megen002 R 27:26 1 node009
3385 research BOV-WUR- megen002 R 44:38 1 node049
3386 research BOV-WUR- megen002 R 44:38 1 node050
3387 research BOV-WUR- megen002 R 44:38 1 node051
3388 research BOV-WUR- megen002 R 44:38 1 node052
3389 research BOV-WUR- megen002 R 44:38 1 node053
3390 research BOV-WUR- megen002 R 44:38 1 node054
3391 research BOV-WUR- megen002 R 44:38 3 node[049-051]
3392 research BOV-WUR- megen002 R 44:38 3 node[052-054]
3393 research BOV-WUR- megen002 R 44:38 1 node001
3394 research BOV-WUR- megen002 R 44:38 1 node002
3395 research BOV-WUR- megen002 R 44:38 1 node003

=== Monitoring time limit set for a specific job ===
The default time limit is set at one hour. Estimated run times need to be specified when running jobs. To see what the time limit is that is set for a certain job, this can be done using the <code>squeue</code> command.
<source lang='bash'>
squeue -l -j 3532
</source>
Information similar to the following should appear:
Fri Nov 29 15:41:00 2013
JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON)
3532 ABGC BOV-WUR- megen002 RUNNING 2:47:03 3-08:00:00 1 node054

=== Query a specific active job: scontrol ===
Show all the details of a currently active job, so not a completed job.
<source lang='bash'>
nfs01 ~]$ scontrol show jobid 4241
JobId=4241 Name=WB20F06
UserId=megen002(16795409) GroupId=domain users(16777729)
Priority=1 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=02:55:25 TimeLimit=3-08:00:00 TimeMin=N/A
SubmitTime=2013-12-09T13:37:29 EligibleTime=2013-12-09T13:37:29
StartTime=2013-12-09T13:37:29 EndTime=2013-12-12T21:37:29
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=research AllocNode:Sid=nfs01:21799
ReqNodeList=(null) ExcNodeList=(null)
NodeList=node023
BatchHost=node023
NumNodes=1 NumCPUs=4 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/lustre/scratch/WUR/ABGC/...
WorkDir=/lustre/scratch/WUR/ABGC/...
</source>

== Removing jobs from a list: scancel ==
If for some reason you want to delete a job that is either in the queue or already running, you can remove it using the 'scancel' command. The 'scancel' command takes the jobid as a parameter. For the example above, this would be done using the following code:
<source lang='bash'>
scancel 3401
</source>

== Allocating resources interactively: sallocate ==
< text here>

== Get overview of past and current jobs: sacct ==
To do some accounting on past and present jobs, and to see whether they ran to completion, you can do:
<source lang='bash'>
sacct
</source>
This should provide information similar to the following:

JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
3385 BOV-WUR-58 research 12 COMPLETED 0:0
3385.batch batch 1 COMPLETED 0:0
3386 BOV-WUR-59 research 12 CANCELLED+ 0:0
3386.batch batch 1 CANCELLED 0:15
3528 BOV-WUR-59 ABGC 16 RUNNING 0:0
3529 BOV-WUR-60 ABGC 16 RUNNING 0:0

== Running MPI jobs on B4F cluster ==
< text here >

== Understanding which resources are available to you: sinfo ==
By using the 'sinfo' command you can retrieve information on which 'Partitions' are available to you. A 'Partition' using SLURM is similar to the 'queue' when submitting using the Sun Grid Engine ('qsub'). The different Partitions grant different levels of resource allocation. Not all defined Partitions will be available to any given person. E.g., Master students will only have the Student Partition available, researchers at the ABGC will have 'student', 'research', and 'ABGC' partitions available. The higher the level of resource allocation, though, the higher the cost per compute-hour. The default Partition is the 'student' partition. A full list of Partitions can be found from the Bright Cluster Manager webpage.

<source lang='bash'>
sinfo
</source>

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
student* up infinite 12 down* node[043-048,055-060]
student* up infinite 50 idle fat[001-002],node[001-042,049-054]
research up infinite 12 down* node[043-048,055-060]
research up infinite 50 idle fat[001-002],node[001-042,049-054]
ABGC up infinite 12 down* node[043-048,055-060]
ABGC up infinite 50 idle fat[001-002],node[001-042,049-054]

== See also ==
* [[B4F_cluster | B4F Cluster]]
* [[BCM_on_B4F_cluster | BCM on B4F cluster]]
* [[Setting_up_Python_virtualenv | Setting up and using a virtual environment for Python3 ]]

== External links ==
* [http://slurm.schedmd.com Slurm official documentation]
* [http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management Slurm on Wikipedia]
* [http://www.youtube.com/watch?v=axWffyrk3aY Slurm Tutorial on Youtube]