Filesystems: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
m (Dawes001 moved page Lustre PFS layout to Filesystems: Make more generic.)
No edit summary
Line 1: Line 1:
== Fast Scratch ==
The AgroGenomics HPC currently has multiple filesystem mounts that are available cluster-wide:


On the Lustre PFS scratch space is organised per partner. Users can only create directory and files in the folders of the organisation they belong to. The Fast Scratch is meant for temporary files and folders. Files and folders should be removed once the jobs are finished. Files and folder older than one month will automatically be removed. Since the Fast Scratch is in an integrated part of the compute infrastructure, no additional cost is incurred based on use in either throughput or volume stored.  
* /home - This mount uses NFS to mount the home directories directly from nfs01. Each user has a 200G quota for this filesystem, as it is regularly backed up to tape, and can reliably be restored from up to a week's history.


  /lustre/scratch/[partner]/[unit]
* /cm/shared - This mount provides a consistent set of binaries for the entire cluster.
 
* /lustre - This large mount uses the Lustre filesystem to provide files from multiple redundant servers. Access is provided per group, thus:
/lustre/[level]/[partner]/[unit]
e.g.
e.g.
  /lustre/scratch/WUR/ABGC/
/lustre/backup/WUR/ABGC/
 
It comprises of three major parts (and some minor):
== Fast Protected ==
* /lustre/backup - In case of disaster, this data is stored a second time on a separate machine. Whilst this backup is purely in case of complete tragedy (such as some immense filesystem error, or multiple component failure), it can potentially be used to revert mistakes if you are very fast about reporting them. There is however no guarantee of this service.
Data that needs to remain on the Lustre PFS '''and''' needs to be backed up as well (i.e. requires redundancy in case the PFS experiences a fatal failure) can be placed in the Fast Protected area.
* /lustre/nobackup - This is the 'normal' filesystem for Lustre - no backups, just stored on the filesystem. Without having a backup needed, the cost of data here is not as much as under /lustre/backup, but in case of disaster cannot be recivered.
 
* /lustre/scratch - Files here may be removed after some time if the filesystem gets too full (Typically 30 days). You should tidy up this data yourself once work is complete.
  /lustre/backup/[partner]
* /lustre/shared - Same as /lustre/backup, except publicly available. This is where truly shared data lives that isn't assigned to a specific group.
 
e.g.
 
  /lustre/backup/WUR/ABGC/
 
Note that this map will not be backed up yet. It is planned that daily syncing will commence 1-1-2014.
 
== Fast Unprotected ==
The Fast Unprotected area is meant for data that needs to remain on the PFS, but that otherwise is not so valuable that it requires regular backup. This could for instance include results computationally derived from primary data sources that are backed up, and for which it is computationally feasible to reconstitute it in case of a FS failure. The advantage of using the Fast Unprotected over the Fast Protected is cost: the absence of the backup function means generally halving of cost.
  /lustre/nobackup/[partner]/
 
e.g.  


  /lustre/nobackup/WUR/ABGC/


== Fast Shared ==
Specific to certain machines are some other filesystems that are available to you:
Projects that require sharing of data between partners (e.g. between ABGC and TOPIGS) can use the Fast Shared. The policies of the Fast Shared are the same as Fast Unprotected, i.e. the data is persistent but without backup.
* /archive - an archive mount only accessible from nfs01. Files here are sent to the Isilon for deeper storage. The cost of storing data here is much less than on the Lustre, but it cannot be used for compute work. This is only available to WUR users, as the Isilon is unable to resolve local groups (without additional work).


  /lustre/shared/
* /tmp - On each worker node there is a (large) /tmp mount that can be used for temporary local caching. Be advised that you should clean this up, lest your files become accessible to other users.


== Compute nodes scratch ==
* /dev/shm - On each worker you may also create a virtual filesystem directly into memory, for extremely fast data access. Be advised that this will count against the memory used for your job, but it is also the fastest available filesystem if needed.
Mounted on /local a local scratch area, accessible to everybody, is enabled. Files and folders on /local will be removed upon restarting of compute nodes!


  /local


== See also ==
== See also ==

Revision as of 15:07, 12 July 2017

The AgroGenomics HPC currently has multiple filesystem mounts that are available cluster-wide:

  • /home - This mount uses NFS to mount the home directories directly from nfs01. Each user has a 200G quota for this filesystem, as it is regularly backed up to tape, and can reliably be restored from up to a week's history.
  • /cm/shared - This mount provides a consistent set of binaries for the entire cluster.
  • /lustre - This large mount uses the Lustre filesystem to provide files from multiple redundant servers. Access is provided per group, thus:
/lustre/[level]/[partner]/[unit]

e.g.

/lustre/backup/WUR/ABGC/

It comprises of three major parts (and some minor):

  • /lustre/backup - In case of disaster, this data is stored a second time on a separate machine. Whilst this backup is purely in case of complete tragedy (such as some immense filesystem error, or multiple component failure), it can potentially be used to revert mistakes if you are very fast about reporting them. There is however no guarantee of this service.
  • /lustre/nobackup - This is the 'normal' filesystem for Lustre - no backups, just stored on the filesystem. Without having a backup needed, the cost of data here is not as much as under /lustre/backup, but in case of disaster cannot be recivered.
  • /lustre/scratch - Files here may be removed after some time if the filesystem gets too full (Typically 30 days). You should tidy up this data yourself once work is complete.
  • /lustre/shared - Same as /lustre/backup, except publicly available. This is where truly shared data lives that isn't assigned to a specific group.


Specific to certain machines are some other filesystems that are available to you:

  • /archive - an archive mount only accessible from nfs01. Files here are sent to the Isilon for deeper storage. The cost of storing data here is much less than on the Lustre, but it cannot be used for compute work. This is only available to WUR users, as the Isilon is unable to resolve local groups (without additional work).
  • /tmp - On each worker node there is a (large) /tmp mount that can be used for temporary local caching. Be advised that you should clean this up, lest your files become accessible to other users.
  • /dev/shm - On each worker you may also create a virtual filesystem directly into memory, for extremely fast data access. Be advised that this will count against the memory used for your job, but it is also the fastest available filesystem if needed.


See also

External links