Lx6 and Lx7 compute nodes

From HPCwiki
Revision as of 17:53, 23 November 2013 by Hjmegens (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

HPC infrastructure

Schematic of bioinformatics infrastructure at ABGC. Lx5 is running Red Hat Enterprise Linux (RHEL) 6. Lx6 and Lx7 are running Ubuntu LTS 12.04. Jobs can be submitted with the Sun Grid Engine (SGE). A user account is needed to work on the machines.

Bioinformatics-infra2.gif


  • scomp1038 aka lx5 aka abgc.asg.wur.nl: a VM hosted by FB-ICT.
  • scomp1090 aka lx6: HP Proliant server with 48 cores and 512GB RAM.
  • scomp1095 aka lx7: HP Proliant server with 48 cores and 192GB RAM.

Lx5-7 naming scheme was inspired by earlier Linux servers at the ABGC. Lx3 got discontinued in 2010, Lx4 in 2011.

For more information see the general ABGC bioinformatics page.

Access

Accessing computer cluster through ssh protocol: <source lang='bash'> ssh username@scompXXXX.wurnet.nl </source> For people that needs visualisation (i.e. R graphs) use: <source lang='bash'> ssh -X username@scompXXXX.wurnet.nl </source>

Basic Bash programming

For basic bash programming please refer to:

    http://en.wikibooks.org/wiki/Bash_Shell_Scripting

Submitting jobs

Submitting jobs on a super computer has to be done through the SGE (Sun Grid Engine) that manages jobs. It attributes priorities to jobs and distribute them across the different cores available.

Using the SGE through qsub:

qsub has many options, I am going to describe a few crucial ones.

  -l h_vmem=XG

This command allows you to pre-define how much memory is to be attributed to your job. Note that your job will be killed by the SGE if you underestimate the amount of memory needed. The default is 1G.

  -cwd 

Set the current working directory. This will allow the SGE to work with incomplete path (i.e. ../my_data/).

  -q all.q 

Send your job to the all.q queue.

  -S $PATH 

Sometimes the SGE has trouble finding the path of some interpreter, using -S allows you to specify to the SGE where to find the interpreter (i.e -S /bin/sh)

  -b y 

Tells the SGE that you are running a binary program and to specify where to find it. This is particularly useful when you have some program that you compiled yourself in your own bin.

  –v DISPLAY 

Is needed to run ASReml on the SGE in a batch file


Example of qsub commands: <source lang='bash'> qsub -l h_vmem=10G -q all.q -cwd -S /usr/bin/perl myscript.pl qsub -l h_vmem=10G -q all.q -cwd -b y ~/bin/./asreml </source>

Installing programs

There are two ways to install a program on the clusters:

1) If the program is going to be used by a wide range of users, better ask one of the administrators to install it.

2) If you are going to be the only one to use this program you can install it in your home directory. Create your own ~/bin directory and compile things there. Then copy the executable directly in your ~/bin/. The next step is to add the path of your new bin to your .bashrc To do so do:

<source lang='bash'>

vim ~/.bashrc 

</source> Then add the following line in this file: <source lang='bash'> export PATH=$PATH:~/bin/ </source> This will make all executable in ~/bin installed, so the they can be invoked simply by typing their name in the command line.

When to use the computer clusters

Guideline for deciding to run your job on the HPC infrastructure:

  • can you parallelize your job, or does your job consist of many different small jobs?
  • can your job make use of multithreading?
  • do you require more memory per process than your own computer has available?
  • does the volume of data used make it undesirable or impossible to use your own computer?

Answering 'yes' to any of these questions indicates a valid reason to use the HPC infrastructure.

In addition, you may simply need to have access to a Linux environment, although this in itself is a poor reason to use the infrastructure.

Please note that per thread the machines currently in service are not that fast - if you have a desktop PC that was purchased in 2010 or later, it is likely to be faster (per thread).

Out of University access

To access the cluster from outside of the intranet, you can use the access point abgc.asg.wur.nl. SSH clients are normally installed on Linux and MacOS machines. For Windows, you can use the program Putty.


Another nice feature of ssh is that you can redirect html traffic through a specified port to abgc.asg.wur.nl. This will allow you to consult journals directly from home without having to login onto the University network. It also allows a very secure access to sensible data in a public network. For example when looking at your bank account from a cafe or so. Here is a small tutorial for Linux + Firefox.

1) Connect to abgc.asg.wur.nl <source lang='bash'> ssh -D 9999 username@abgc.asg.wur.nl </source> 2) Change proxy settings in Firefox go to Edit > Preferences > Advanced > Network.

Then click on Settings.

Then Manual proxy configuration.

There add in the SOCKS Host: localhost; port 9999;

For directions on using Putty Check this link :

  http://www.hacktabs.com/how-to-setup-ssh-tunneling-in-firefox-and-surf-anonymously/

See also

External links