[Bioclusters] cluster hardware question

Jeff Layton bioclusters@bioinformatics.org
Thu, 16 Jan 2003 11:00:30 -0500

Chris Dagdigian wrote:

> Duzlevski, Ognen wrote:
> > Hi Joe,
> > thanks for sharing your knowledge with me.
> >>Often overlooked in clusters until too late is Disk and IO in general.
> >>Chris Dagdigian at BioTeam.net is a good person to speak to about this.
> > When you say "Disk and IO", do you mean storage over fiber, local node
> > hard-drives...? What would be good choices for your typical bioinformatics
> > shop - I have seen options between local nodes having the latest SCSI
> > drives and nodes having the regular 5400 rpm ide drives. Does it pay to go
> > with compute node SCSI 10000 rpm or is a 7200 rpm ide good enough?
> The biggest performance bottleneck in 'bioclusters' is usually disk I/O
> throughput. Bio people tend to do lots of things that involve streaming
> massive text and binary files through the CPU and RAM (think running a
> blast search). The speed of your storage becomes the rate limiting
> performance bottleneck. Often there will be terabytes of this sort of
> data laying around so the "/data" volume is usually a NFS mount.

   Any comments on the size of "typical" databases? (pick whatever
you want for "typical"). This shows my ignorance of Bio codes.
   However, I've been looking at using the extra memory on the latest
Xeon mptherboards as  RAM-disk. For instance some of the Supermicro
boards can handle up to 16 Gig for a dual CPU (32 Gig for a Quad).
If you assume that you are running on one instance of your app per
CPU and that you can only address 3.5 Gig of memory per CPU, then
that leaves you with around 8 Gigs to play with (giving a generous
1 Gig for the OS). While RAM is expensive compared to disk, this
idea is also much faster than disk. Would 8 Gigs be enough for some,
many, lots of people?




Jeff Layton
Senior Engineer
Lockheed-Martin Aeronautical Company - Marietta
Aerodynamics & CFD

