Using Slurm: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 35: Line 35:
</source>
</source>


== monitoring submitted jobs: squeue ==
=== monitoring submitted jobs: squeue ===
Once a job is submitted, the status can be monitored using the 'squeue' command:
Once a job is submitted, the status can be monitored using the 'squeue' command:
   squeue
   squeue
Line 43: Line 43:
   3347  research calc_pi. megen002  R      0:03      1 node049
   3347  research calc_pi. megen002  R      0:03      1 node049


== removing jobs from a list: scancel ==
=== removing jobs from a list: scancel ===
If for some reason you want to delete a job that is either in the queue or already running, you can remove it using the 'scancel' command. The 'scancel' command takes the jobid as a parameter. The  For the example above, this would be done using the following code:
If for some reason you want to delete a job that is either in the queue or already running, you can remove it using the 'scancel' command. The 'scancel' command takes the jobid as a parameter. The  For the example above, this would be done using the following code:
<source lang='bash'>
<source lang='bash'>
Line 49: Line 49:
</source>
</source>


== allocating resources interactively: sallocate ==
=== allocating resources interactively: sallocate ===


== running MPI jobs on B4F cluster ==
=== running MPI jobs on B4F cluster ===


== Understanding which resources are available to you: sinfo ==
=== Understanding which resources are available to you: sinfo ===
By using the 'sinfo' command you can retrieve information on which 'Partitions' are available to you. A 'Partition' using SLURM is similar to the 'queue' when submitting using the Sun Grid Engine ('qsub'). The different Partitions grant different levels of resource allocation. Not all defined Partitions will be available to any given person. E.g., Master students will only have the Student Partition available, researchers at the ABGC will have 'student', 'research', and 'ABGC' partitions available. The higher the level of  resource allocation, though, the higher the cost per compute-hour. The default Partition is the 'student' partition. A full list of Partitions can be found from the Bright Cluster Manager webpage.
By using the 'sinfo' command you can retrieve information on which 'Partitions' are available to you. A 'Partition' using SLURM is similar to the 'queue' when submitting using the Sun Grid Engine ('qsub'). The different Partitions grant different levels of resource allocation. Not all defined Partitions will be available to any given person. E.g., Master students will only have the Student Partition available, researchers at the ABGC will have 'student', 'research', and 'ABGC' partitions available. The higher the level of  resource allocation, though, the higher the cost per compute-hour. The default Partition is the 'student' partition. A full list of Partitions can be found from the Bright Cluster Manager webpage.


Line 68: Line 68:
   ABGC        up  infinite    50  idle fat[001-002],node[001-042,049-054]
   ABGC        up  infinite    50  idle fat[001-002],node[001-042,049-054]


== See also ==
=== See also ===


== External links ==
=== External links ===
* [http://slurm.schedmd.com Slurm official documentation]
* [http://slurm.schedmd.com Slurm official documentation]
* [http://www.youtube.com/watch?v=axWffyrk3aY Slurm Tutorial on Youtube]
* [http://www.youtube.com/watch?v=axWffyrk3aY Slurm Tutorial on Youtube]

Revision as of 11:09, 23 November 2013

submitting jobs: sbatch

Consider this simple python3 script that should calculate Pi up to 1 million digits: <source lang='python'> from decimal import * D=Decimal getcontext().prec=10000000 p=sum(D(1)/16**k*(D(4)/(8*k+1)-D(2)/(8*k+4)-D(1)/(8*k+5)-D(1)/(8*k+6))for k in range(411)) print(str(p)[:10000002]) </source>

In order for this script to run, the first thing that is needed is that Python3, which is not the default Python version on the cluster, is load into your environment. Availability of (different versions of) software can be checked by the following command:

 module avail

In the list you should note that python3 is indeed available to be loaded, which then can be loaded with the following command:

 module load python/3.3.3

The following shell/slurm script can then be used to schedule the job using the sbatch command: <source lang='bash'>

  1. !/bin/bash
  2. #SBATCH --time=100
  3. SBATCH --ntasks=1
  4. SBATCH --output=output_%j.txt
  5. SBATCH --error=error_output_%j.txt
  6. SBATCH --job-name=calc_pi.py
  7. SBATCH --partition=research

time python3 calc_pi.py </source>

The script, assuming it was named 'run_calc_pi.sh', can then be posted using the following command:

<source lang='bash'> sbatch run_calc_pi.sh </source>

monitoring submitted jobs: squeue

Once a job is submitted, the status can be monitored using the 'squeue' command:

 squeue

You should then get a list of jobs that are running at that time on the cluster, for the example on how to submit using the 'sbatch' command, it may look like so:

 JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
  3347  research calc_pi. megen002   R       0:03      1 node049

removing jobs from a list: scancel

If for some reason you want to delete a job that is either in the queue or already running, you can remove it using the 'scancel' command. The 'scancel' command takes the jobid as a parameter. The For the example above, this would be done using the following code: <source lang='bash'> scancel 3347 </source>

allocating resources interactively: sallocate

running MPI jobs on B4F cluster

Understanding which resources are available to you: sinfo

By using the 'sinfo' command you can retrieve information on which 'Partitions' are available to you. A 'Partition' using SLURM is similar to the 'queue' when submitting using the Sun Grid Engine ('qsub'). The different Partitions grant different levels of resource allocation. Not all defined Partitions will be available to any given person. E.g., Master students will only have the Student Partition available, researchers at the ABGC will have 'student', 'research', and 'ABGC' partitions available. The higher the level of resource allocation, though, the higher the cost per compute-hour. The default Partition is the 'student' partition. A full list of Partitions can be found from the Bright Cluster Manager webpage.

<source lang='bash'> sinfo </source>

 PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 student*     up   infinite     12  down* node[043-048,055-060]
 student*     up   infinite     50   idle fat[001-002],node[001-042,049-054]
 research     up   infinite     12  down* node[043-048,055-060]
 research     up   infinite     50   idle fat[001-002],node[001-042,049-054]
 ABGC         up   infinite     12  down* node[043-048,055-060]
 ABGC         up   infinite     50   idle fat[001-002],node[001-042,049-054]

See also

External links