[Bioclusters] software raid / ISMB rollcall / Platform question(s)

Donald Becker bioclusters@bioinformatics.org
Wed, 24 Jul 2002 15:46:49 -0400 (EDT)

On Wed, 24 Jul 2002, Chris Dagdigian wrote:

> (1) Anyone have any good methodologies to share regarding maximizing 
> linux-on-intel file I/O with cheap ATA drives and linux software RAID? 
> We've been seeing some amazing numbers on a couple of prototype cluster 
> compute nodes using pairs of 80gig ATA drives controlled by a Promise 
> ATA card running reiserfs on top of software RAID0.

That's an excellent configuration.

We've deployed similar hardware for customers with a demonstrated
90MB/sec long-term sustained write performance per node, with the input
data coming in over gigabit Etherent.  Yes, that's actually per node,
not aggregate over the cluster.

ReiserFS is not needed to achieve this performance, but you do need a
chipset that has isolated/switched (not merely bridged) PCI buses.

> The tests are not complete yet but the numbers appear to be better
> than what we can  squeeze from (a) scsi drives, (b) a direct SAN
> connection and (c) a  single 100Tx connection to a Netapp F840 NAS
> filer.

You won't come close with Fast Ethernet, and SCSI doesn't provide any
advantage in this type of use.  There is little reason to use SCSI or
Fibre Channel (the only common SAN).

> We'll eventually post what we find to be our 'best config' once our
> tests are done. We also need to get numbers from the Netapp when we
> have a gigE link.

You likely won't see a performance advantage when using a NetApp, and
certainly a NetApp doesn't come close with performance/cost.  The NetApp
advantage is the ability to do filesystem snapshots for backups and
checkpointing.  To get the similar features on Linux you must use the
Sistina filesystem and use a Fibre Channel SAN.

Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf System
Annapolis MD 21403			410-990-9993