Anunna: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 11: Line 11:
=== nodes ===
=== nodes ===
The cluster consists of a bunch of separate machines that each has its own operating system. The default operating system throughout the cluster is RHEL6. The cluster has two master nodes in a redundant configuration, which means that if one crashes, the other will take over seamlessly. Various other nodes exist to support the two main file systems (the Lustre parallel file system and the NFS file system). The actual computations are done on the worker nodes. The cluster is configured in a heterogeneous fashion: it consists of 60 so called 'slim nodes', that each have 16 cores and 64GB of RAM (called 'node001' through 'node0060'), and two so called 'fat nodes' that each have 64 cores and 1TB of RAM ('fat001' and 'fat002').
The cluster consists of a bunch of separate machines that each has its own operating system. The default operating system throughout the cluster is RHEL6. The cluster has two master nodes in a redundant configuration, which means that if one crashes, the other will take over seamlessly. Various other nodes exist to support the two main file systems (the Lustre parallel file system and the NFS file system). The actual computations are done on the worker nodes. The cluster is configured in a heterogeneous fashion: it consists of 60 so called 'slim nodes', that each have 16 cores and 64GB of RAM (called 'node001' through 'node0060'), and two so called 'fat nodes' that each have 64 cores and 1TB of RAM ('fat001' and 'fat002').
Information from the Cluster Management Portal, as it appeared on November 23, 2013:
   DEVICE INFORMATION
   DEVICE INFORMATION
   Hostname State Memory Cores CPU Speed GPU NICs IB Category
   Hostname State Memory Cores CPU Speed GPU NICs IB Category
Line 20: Line 22:
   storage02..storage06 UP 67.6 GiB 32 Intel(R) Xeon(R) CPU E5-2660 0+ 2199 MHz 5 1 oss
   storage02..storage06 UP 67.6 GiB 32 Intel(R) Xeon(R) CPU E5-2660 0+ 2199 MHz 5 1 oss
   nfs01 UP 67.6 GiB 8 Intel(R) Xeon(R) CPU E5-2609 0+ 2400 MHz 7 1 login
   nfs01 UP 67.6 GiB 8 Intel(R) Xeon(R) CPU E5-2609 0+ 2400 MHz 7 1 login
   fat001, fat002 UP 1.0 TiB 64 AMD Opteron(tm) Processor 6376 2299 MHz 5 1 fat=== filesystem ===
   fat001, fat002 UP 1.0 TiB 64 AMD Opteron(tm) Processor 6376 2299 MHz 5 1 fat
 
=== filesystem ===
 
=== network ===
=== network ===



Revision as of 14:24, 23 November 2013

The Breed4Food (B4F) cluster is a joint High Performance Compute (HPC) infrastructure of the Animal Breeding and Genomics Centre and four major breeding companies: Cobb Vantress, CRV, Hendrix Genetics, and TOPIGS.

Rationale and Requirements for a new cluster

The B4F Cluster is, in a way, the 7th pillar of the Breed4Food programme. While the other six pillar revolve around specific research theme, the Cluster represents a joint infrastructure. The rationale behind the cluster is to enable the increasing computational needs in the field of genetics and genomics research, by creating a joint facility that will generate benefits of scale, thereby reducing cost. In addition, the joint infrastructure is intended generate cross-organisational knowledge transfer. In that capacity, the B4F Cluster acts as a joint (virtual) laboratory where researchers across the isle (academic and applied) can benefit from each other's know how. Lastly, the joint cluster, housed at Wageningen University campus, allows retaining vital and often confidential data sources in a controlled environment, something that cloud services such as Amazon Cloud or others usually can not guarantee.

Process of Acquisition and financing

The B4F cluster was financed through CAT-AGRO <insert further details>

Architecture of the cluster

overview of nodes, fs, etc.

nodes

The cluster consists of a bunch of separate machines that each has its own operating system. The default operating system throughout the cluster is RHEL6. The cluster has two master nodes in a redundant configuration, which means that if one crashes, the other will take over seamlessly. Various other nodes exist to support the two main file systems (the Lustre parallel file system and the NFS file system). The actual computations are done on the worker nodes. The cluster is configured in a heterogeneous fashion: it consists of 60 so called 'slim nodes', that each have 16 cores and 64GB of RAM (called 'node001' through 'node0060'), and two so called 'fat nodes' that each have 64 cores and 1TB of RAM ('fat001' and 'fat002').

Information from the Cluster Management Portal, as it appeared on November 23, 2013:

 DEVICE INFORMATION
 Hostname	State	Memory	Cores	CPU	Speed	GPU	NICs	IB	Category
 master1, master2	UP	67.6 GiB	16	Intel(R) Xeon(R) CPU E5-2660 0+	2199 MHz		5	1	
 node001..node042, node049..node054	UP	67.6 GiB	16	Intel(R) Xeon(R) CPU E5-2660 0+	1200 MHz		3	1	default
 node043..node048, node055..node060	DOWN	N/A	N/A	N/A	N/A	N/A	N/A	N/A	default
 mds01, mds02	UP	16.8 GiB	8	Intel(R) Xeon(R) CPU E5-2609 0+	2400 MHz		5	1	mds
 storage01	UP	67.6 GiB	32	Intel(R) Xeon(R) CPU E5-2660 0+	2200 MHz		5	1	oss
 storage02..storage06	UP	67.6 GiB	32	Intel(R) Xeon(R) CPU E5-2660 0+	2199 MHz		5	1	oss
 nfs01	UP	67.6 GiB	8	Intel(R) Xeon(R) CPU E5-2609 0+	2400 MHz		7	1	login
 fat001, fat002	UP	1.0 TiB	64	AMD Opteron(tm) Processor 6376	2299 MHz		5	1	fat

filesystem

network

Housing at Theia

include some pictures

Access Policy

etc

Cluster Management Software and Scheduler

The B4F cluster uses Bright Cluster Manager software for overall cluster management, and Slurm as job scheduler.