[Bioclusters] Lustre and other cluster file systems

Guy Coates gmpc at sanger.ac.uk
Sun Jul 16 07:53:54 EDT 2006


Jinal Jhaveri wrote:
> Hi All,
> 
> Has anybody tried to use Luster File System (or any other cluster file
> systems) in Bioinformatics arena? Can you share your experience on
> performance gain, management overhead, etc.

We are currently using lustre (the HP SFS version). I'd categorize our
experiences as "cautiously optimistic".

We have had a 50 node lustre instance running as a proof-of-concept for
the past 6 months. We've been using it for hosting blast databases (so
large files, streaming memory-map reads) and some user scratch/work
directories. (mixed read/write workload).

We currently also have a 140 node lustre instance currently in
pre-production mode, which is currently using lustre for blast databases.

We are in the process of merging both systems to create a single lustre
file system across our entire cluster (560 nodes/1500 cores), hosting
all of our scratch/work space and blast databases.

We are reasonably confident that this will work, but, as whenever you
scale stuff up, you never know until you try.


Our experiences to date:

Performance, especially for blast type workloads is excellent. Our
limiting factor is how much networking we can install between the
clients and servers. A single client can easily fill a single gigabit
pipe. We have had to put quite a bit of thought into how we construct
the network to ensure we have enough bandwidth between the clients and
servers.

Stability is good. We had issues with earlier code versions, but
stability in the current code revs is good. The system recovers well
from network failures and servers going away. We do currently run into
the odd node which goes catatonic, but:

a) These bugs are allegedly fixed "in the next release".
b) The frequency is much less than we see with NFS on the existing
system, so that is a win as far as I'm concerned.

We have not had any server-side crashes which would impact the whole
cluster.


Manageability is a mixed bag; having a single file system across the
cluster is a big-win for usability. Managing the lustre-file system
itself is reasonably simple, especially as cluster filesystems go.
Obviously there is a bit more to look after than a single fileserver,
but it isn't excessive.

The main short-comings of lustre is the lack of LVM type operations
(expanding file systems on the fly etc).  You have to right-size the
file systems first-time off. If you want to add more storage, you have
to create new file systems, rather then extend existing ones.

Cheers,

Guy

-- 
Dr Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 ex 6925
Fax: +44 (0)1223 496802


More information about the Bioclusters mailing list