[Bioclusters] Bottom line question: which system to buy?
chris dagdigian
bioclusters@bioinformatics.org
Tue, 18 Jun 2002 18:31:12 -0400
Hi Rick,
Without knowing more details I can't say for sure if you are going to be
well served spending $40K on a biocluster. Where you spend your money is
largely dependent on your priorities and what you want to do. For
instance you may get good results by taking some of that budget money
and piling lots more RAM into your existing Sun boxes. So- take this
reply with the grain of salt it deserves...
40K will get you a nice flexible linux-on-commodity hardware
'biocluster' -- actual CPU count will largely depend on your packaging--
whitebox mini-tower cases will get you great price/performance but they
take up tons of space and can be a hassle to wire and maintain. You will
pay more for the bladed and rackmount systems but will gain floorspace
and managbility. When I do this stuff professionally I tell people that
a good budgetary guideline for cluster building blocks is roughly $1000
per cpu without a high speed interconnect subsystem (ie a dual CPU 1U
rackmount server will generally cost about $2000 - $2800 depending on
how it is kitted out). Nodes will cost more if from 'name' vendors like
IBM or HP. You will need to pad on extra money for 'head nodes' and
switches/fileserver/cables/disks etc. if necessary.
If you are willing to take on some software work within your group you
would get the most flexibility by purchasing just a bare bones cluster
or compute farm configuration from one of the many companies who
specialize in integrated cluster systems. Since few of them really
specialize in the life sciences (or will charge you lots of $$ to ship a
ready to rock biocluster) -- you will likely be better off getting just
the system plus a hardware support contract from the vendor and then
installing the load management layer and Blast/HMMER/etc. on your own.
That way you can spend your budget on getting the most hardware you can
afford. Many of the existing cluster hardware companies have been
selling into the life science market for a while so you have a good
chance of finding salesfolks who will actually understand what you mean
when you say 'biocluster' or 'blast farm'.
Since you have a limited budget I'd recommend the freely available Sun
GridEngine suite for handling load, batch scheduling and remote job
execution. There are several hardcore SGE users on this list by now and
the SGE-users mailing list is active and a great place to get support.
There are 2 cluster vendors that I personally like and can recommend --
Microway in Massachusetts (www.microway.com) and Rackable Systems in
California (www.rackable.com). If you talk to either of them tell them
that Chris from bioteam.net says hello :) I'm also using Dell hardware
for a current project at local university and have had good experiences
with the Poweredge 1550, 1650 and 6450 servers as well as their
Powerconnect line of switches which are incredible (I can get a Dell
switch for $900 that has more functionality than what I used to pay
Cisco $3500 for...just amazing). You can probably get a nice full Dell
branded setup (servers + switches + racks + service contract) from Dell
at your current academic pricing as long as you were willing to do a bit
of rack and stack work onsite. Some universities prefer to go that route
both for pricing and IT support reasons. You can justify spending more
money on your hardware if it means not upsetting your IT group and
ensuring that they will take responsibility for the care and feeding of
your systems.
RLX systems are nice although my hands on knowledge of them is almost a
year and a half out of date by now. You need to understand that any
bladed system that uses laptop drives mounted on the blade is going to
give you slower IO performance which will generally not be optimal for
things like sequence similarity searching.
Off the top of my head this is one way to spend $40K to get a 30-CPU
compute farm -- this is a rough budget guide only and may not be
accurate since prices change daily for the commodity stuff:
o 15x dual-CPU 1U Pentium III rackmounts @ $2000 each (Total: $30,000)
o 1 beefy "head" or "portal" node to run the cluster and provide NFS
services: $6000
o 24 or 48 port network switch with at least 2 gigabit ports (portal
will use 1): $1000
o Cluster rack: ~$1000
o Misc cables, GBIC modules, power distribution, KVM, etc. etc.: $2000
=====
Total: $40,000
Regards,
Chris
Rick Westerman wrote:
> I've been reading the biocluster list for some time but since we
> have been, mainly, satisfied with our setup I have not jumped in. Now
> I have a question.
>
> Background: We have $40K of "end of year" money needing to be spent
> soon; a single 3700 sequencer pumping out ~200 sequences a day; and a
> pair of Sun E-450 (4 GB memory, 4 processor) servers providing
> GCG/Emboss, database and local Blast support when needed. Many of our
> current Blast searches are batched to NCBI but occasionally we run
> searches against non-standard datasets. Such processing can take over
> the Sun computers for a couple of days.
>
> What off-the-shelf biocluster would you recommend? I would prefer
> not to build a system from scratch and would also prefer not to spend
> too much time installing and maintaining Linux and Blast itself
> although "rolling our own" on the software end is more feasible than
> the hardware end. We would also want to run HMMer and perhaps some
> other data-intensive software on the cluster.
>
> I have looked at RLX, RackSaver, and the Paracel offerings. Any
> other leads?
>
> Thank you for any advice,
>
> -- Rick
>
>
--
Chris Dagdigian, <dag@sonsorol.org>
Life Science IT & Research Computing
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
Web: http://bioteam.net PGP KeyID: 83D4310E Yahoo IM: craffi