|
|
| (3 intermediate revisions by 3 users not shown) |
| Line 1: |
Line 1: |
| There is a graphing tool that uses elements directly from sacct to display information about the current cluster usage, node_usage_graph (located at /cm/shared/apps/accounting/node_usage_graph ).
| | #REDIRECT [[Monitoring Jobs]] |
| | |
| Example:
| |
| <pre>
| |
| [user@login0 ~]# module load anunna | |
| [user@login0 ~]# usage_graph
| |
| node: |0% 100%|
| |
| fat001: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
| |
| DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
| |
| fat002: CCCCCCCCC
| |
| MMMMMmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
| |
| node001:
| |
|
| |
| node002:cccccccccc
| |
| MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMmmmmmmmmmm
| |
| node003:CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
| |
| MM
| |
| node004:CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
| |
| M
| |
| node005:CCCCCCCCCC
| |
|
| |
| node006:CCCCCCCCCC
| |
|
| |
| node007:CCCCCCCCCC
| |
|
| |
| node008:CCCCCCCCCCccccc
| |
| MMMMMMMMMMMMMMMMMMMMM
| |
| node009:cccccccccc
| |
| MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
| |
| node010:
| |
|
| |
| node011:
| |
|
| |
| node012:CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
| |
| M
| |
| node013:
| |
|
| |
| node014:
| |
|
| |
| node015:CCCCC
| |
| MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
| |
| node016:CCCCCCCCCCCCCCCCCCCCC
| |
| MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
| |
| node017:
| |
|
| |
| node018:
| |
|
| |
| node019:CCCCC
| |
| MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
| |
| node020:
| |
|
| |
| node021:
| |
|
| |
| node022:
| |
|
| |
| node023:
| |
|
| |
| node024:CCCCCCCCCCCCCCC
| |
|
| |
| node025:CCCCCCCCCCCCCCCCCCCCC
| |
|
| |
| node026:CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
| |
|
| |
| node027:CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
| |
|
| |
| node028:CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
| |
| MMM
| |
| node029:
| |
|
| |
| node030:
| |
|
| |
| node031:
| |
|
| |
| node032:
| |
|
| |
| node033:
| |
|
| |
| node034:
| |
|
| |
| node035:
| |
|
| |
| node036:
| |
|
| |
| node037:
| |
|
| |
| node038:
| |
|
| |
| node039:
| |
|
| |
| node040:DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
| |
| DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
| |
| node041:CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCcccccc
| |
| MMMMMMMMMMMMMMMMMMMMMM
| |
| node042:RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
| |
| RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
| |
| node049:DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
| |
| DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
| |
| node050:CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
| |
| M
| |
| node051:
| |
|
| |
| node052:CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
| |
| MMMMMMmmmmmmmmmmmmmmm
| |
| node053:CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
| |
| M
| |
| node054:DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
| |
| DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
| |
| </pre>
| |
| | |
| This gives an overview of the current per-node resource usage. There are four types of letter:
| |
| * M: Memory reserved and in use
| |
| * m: Memory reserved and not in use
| |
| * C: CPU reserved and in use
| |
| * c: CPU reserved and not in use
| |
| * D: Drained node (not available for submission for some adminstrative reason
| |
| * R: Reserved node
| |
| | |
| It cannot however give you an indication of how much the queue is right now for any node. for that, squeue is a better resource.
| |