[Bioclusters] Re: Linux cluster storage question (SAN/NAS/GPFS)

Guy Coates bioclusters@bioinformatics.org
Thu, 19 Aug 2004 18:39:39 +0100 (BST)


Quick Plug; I gave a presentation at the Bioclusters workshop on cluster
filesystems; it should give you a few starters.

http://www.sanger.ac.uk/Users/gmpc/presentations/bioclusters-talk.pdf


The different cluster filesystems tend to support different toplogies. You
can have all the machines on the SAN (which is expensive) or you can do
away with SAN althogether and use locally attached disks, although not all
filesystems support this.

You can also go halfway, with some machines with SAN hardware and the rest
of the machines communicating over the network (Lustre, GPFS and GFS
support this model).


                   --Server1--
                 /             \
Disk-hardware -SAN---Server2-----Network----Clients
                 \             /
                   --Server3--


Unforunately, small file IO is probably the worst possible case for any
clustered filesystem, as the lock manager becomes your IO bottleneck. Most
cluster filesystems shine with streaming IO. GPFS, in particular, isn't
great at small file access.

GFS, back when it was a Sistina product, did have some tuning option for
filesystems with small files; I don't know if this is still in the redhat
codebase.

Lustre might be worth a look too, as it has some features which are hand
for small files.

I'd be wary about the NAS devices; as has already been pointed out, small
file access over NFS has alot of overhead. Having said that, 48 clients
isn't a stupidly large amount, so you might be lucky. It is also much
simpler than dealing with SANs and cluster filesystems.

As always, the only way to see if the works with your application is to
try it.

Cheers,

Guy

-- 
Dr. Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199