[Bioclusters] RE: Linux cluster storage question (SAN/NAS/GPFS)

Anand S Bisen bioclusters@bioinformatics.org
Thu, 19 Aug 2004 13:03:11 -0500


-----Original Message-----
From: Guy Coates [mailto:gmpc@sanger.ac.uk] 
Sent: Thursday, August 19, 2004 12:40 PM
To: bioclusters@bioinformatics.org; vmlinuz@abisen.com
Subject: Re: Linux cluster storage question (SAN/NAS/GPFS)

Quick Plug; I gave a presentation at the Bioclusters workshop on cluster
filesystems; it should give you a few starters.


The different cluster filesystems tend to support different toplogies. You
can have all the machines on the SAN (which is expensive) or you can do away
with SAN althogether and use locally attached disks, although not all
filesystems support this.

You can also go halfway, with some machines with SAN hardware and the rest
of the machines communicating over the network (Lustre, GPFS and GFS support
this model).

                 /             \
Disk-hardware -SAN---Server2-----Network----Clients
                 \             /

Unforunately, small file IO is probably the worst possible case for any
clustered filesystem, as the lock manager becomes your IO bottleneck. Most
cluster filesystems shine with streaming IO. GPFS, in particular, isn't
great at small file access.

GFS, back when it was a Sistina product, did have some tuning option for
filesystems with small files; I don't know if this is still in the redhat

Lustre might be worth a look too, as it has some features which are hand for
small files.

I'd be wary about the NAS devices; as has already been pointed out, small
file access over NFS has alot of overhead. Having said that, 48 clients
isn't a stupidly large amount, so you might be lucky. It is also much
simpler than dealing with SANs and cluster filesystems.

As always, the only way to see if the works with your application is to try



Dr. Guy Coates,  Informatics System Group The Wellcome Trust Sanger
Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199