[Bioclusters] Bottom line question: which system to buy?

chris dagdigian bioclusters@bioinformatics.org
Tue, 18 Jun 2002 18:31:12 -0400

Hi Rick,

Without knowing more details I can't say for sure if you are going to be 
well served spending $40K on a biocluster. Where you spend your money is 
largely dependent on your priorities and what you want to do. For 
instance you may get good results by taking some of that budget money 
and piling lots more RAM into your existing Sun boxes.  So- take this 
reply with the grain of salt it deserves...

40K will get you a nice flexible linux-on-commodity hardware  
'biocluster' -- actual CPU count will largely depend on your packaging-- 
whitebox mini-tower cases will get you great price/performance but they 
take up tons of space and can be a hassle to wire and maintain. You will 
pay more for the bladed and rackmount systems but will gain floorspace 
and managbility.  When I do this stuff professionally I tell people that 
a good budgetary guideline for cluster building blocks is roughly $1000 
per cpu without a high speed interconnect subsystem (ie a dual CPU 1U 
rackmount server will generally cost about $2000 - $2800 depending on 
how it is kitted out). Nodes will cost more if from 'name' vendors like 
IBM or HP.  You will need to pad on extra money for 'head nodes' and 
switches/fileserver/cables/disks etc. if necessary.

If you are willing to take on some software work within your group you 
would get the most flexibility by purchasing just a bare bones cluster 
or compute farm configuration from one of the many companies who 
specialize in integrated cluster systems. Since few of them really 
specialize in the life sciences (or will charge you lots of $$ to ship a 
ready to rock biocluster) -- you will likely be better off getting just 
the system plus a hardware support contract from the vendor and then 
installing the load management layer and Blast/HMMER/etc. on your own. 
That way you can spend your budget on getting the most hardware you can 
afford.  Many of the existing cluster hardware companies have been 
selling into the life science market for a while so you have a good 
chance of  finding salesfolks who will actually understand what you mean 
when you say 'biocluster' or 'blast farm'.

Since you have a limited budget I'd recommend the freely available Sun 
GridEngine suite for handling load, batch scheduling and remote job 
execution. There are several hardcore SGE users on this list by now and 
the SGE-users mailing list is active and a great place to get support.

There are 2 cluster vendors that I personally like and can recommend -- 
Microway in Massachusetts (www.microway.com) and Rackable Systems in 
California (www.rackable.com). If you talk to either of them tell them 
that Chris from bioteam.net says hello :) I'm also using Dell hardware 
for a current project at local university and have had good experiences 
with the Poweredge 1550, 1650 and 6450 servers as well as their 
Powerconnect line of switches which are incredible (I can get a Dell 
switch for $900 that has more functionality than what I used to pay 
Cisco $3500 for...just amazing). You can probably get a nice full Dell 
branded setup (servers + switches + racks + service contract) from Dell 
at your current academic pricing as long as you were willing to do a bit 
of rack and stack work onsite. Some universities prefer to go that route 
both for pricing and IT support reasons. You can justify spending more 
money on your hardware if it means not upsetting your IT group and 
ensuring that they will take responsibility for the care and feeding of 
your systems.

RLX systems are nice although my hands on knowledge of them is almost a 
year and a half out of date by now. You need to understand that any 
bladed system that uses laptop drives mounted on the blade is going to 
give you slower IO performance which will generally not be optimal for 
things like sequence similarity searching.

Off the top of my head this is one way to spend $40K to get a 30-CPU 
compute farm -- this is a rough budget guide only and may not be 
accurate since prices change daily for the commodity stuff:

o 15x dual-CPU 1U Pentium III rackmounts @ $2000 each (Total: $30,000)
o 1 beefy "head" or "portal" node to run the cluster and provide NFS 
services: $6000
o  24 or 48 port network switch with at least 2 gigabit ports (portal 
will use 1): $1000
o  Cluster rack: ~$1000
o Misc cables, GBIC modules, power distribution, KVM, etc. etc.: $2000
Total: $40,000


Rick Westerman wrote:

>    I've been reading the biocluster list for some time but since we 
> have been, mainly, satisfied with our setup I have not jumped in.  Now 
> I have a question.
>    Background: We have $40K of "end of year" money needing to be spent 
> soon; a single 3700 sequencer pumping out ~200 sequences a day; and a 
> pair of Sun E-450 (4 GB memory, 4 processor) servers providing 
> GCG/Emboss, database and local Blast support when needed.  Many of our 
> current Blast searches are batched to NCBI but occasionally we run 
> searches against non-standard datasets.  Such processing can take over 
> the Sun computers for a couple of days.
>    What off-the-shelf biocluster would you recommend?  I would prefer 
> not to build a system from scratch and would also prefer not to spend 
> too much time installing and maintaining Linux and Blast itself 
> although "rolling our own" on the software end is more feasible than 
> the hardware end.  We would also want to run HMMer and perhaps some 
> other data-intensive software on the cluster.
>     I have looked at RLX, RackSaver, and the Paracel offerings.  Any 
> other leads?
> Thank you for any advice,
> -- Rick

Chris Dagdigian, <dag@sonsorol.org>
Life Science IT & Research Computing 
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
Web: http://bioteam.net PGP KeyID: 83D4310E  Yahoo IM: craffi