-----Original Message----- From: Guy Coates [mailto:gmpc@sanger.ac.uk] Sent: Thursday, August 19, 2004 12:40 PM To: bioclusters@bioinformatics.org; vmlinuz@abisen.com Subject: Re: Linux cluster storage question (SAN/NAS/GPFS) Quick Plug; I gave a presentation at the Bioclusters workshop on cluster filesystems; it should give you a few starters. http://www.sanger.ac.uk/Users/gmpc/presentations/bioclusters-talk.pdf The different cluster filesystems tend to support different toplogies. You can have all the machines on the SAN (which is expensive) or you can do away with SAN althogether and use locally attached disks, although not all filesystems support this. You can also go halfway, with some machines with SAN hardware and the rest of the machines communicating over the network (Lustre, GPFS and GFS support this model). --Server1-- / \ Disk-hardware -SAN---Server2-----Network----Clients \ / --Server3-- Unforunately, small file IO is probably the worst possible case for any clustered filesystem, as the lock manager becomes your IO bottleneck. Most cluster filesystems shine with streaming IO. GPFS, in particular, isn't great at small file access. GFS, back when it was a Sistina product, did have some tuning option for filesystems with small files; I don't know if this is still in the redhat codebase. Lustre might be worth a look too, as it has some features which are hand for small files. I'd be wary about the NAS devices; as has already been pointed out, small file access over NFS has alot of overhead. Having said that, 48 clients isn't a stupidly large amount, so you might be lucky. It is also much simpler than dealing with SANs and cluster filesystems. As always, the only way to see if the works with your application is to try it. Cheers, Guy -- Dr. Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK Tel: +44 (0)1223 834244 ex 7199