[Bioclusters] Recommendations for building a cluster

Chris Dagdigian dag at sonsorol.org
Fri Oct 19 14:02:45 EDT 2007


Can you be more specific about your application mix?

In a general sense for "genome-wide analyses" means that you'll  
likely be using some very very memory hungry applications. In that  
scenario it may make sense to purchase fewer machines but load them  
up with 8-12GB of physical memory. You will need to have at least a  
rough idea of your application mix before you can figure out where it  
is best to spend your budget on.

The current general duty sweet spot for life science cluster nodes in  
absence of application benchmarks seems to be something like this:

  - 1U rackmount
  - pair of dual core 64 bit CPUs
  - 4-8GB RAM
  - a few hundred gigs of SATA or cheap IDE disk for use as local  
scratch space

You'll tweak the above system based on your application mix.

For instance, now that Intel is shipping the quad-core CPUs you may  
find it better to own fewer quad core systems but load them up with  
as much memory as you can afford. This is where knowing your  
application requirements becomes very helpful.

Keep in mind that you'll need a big and fast storage solution. The  
shared storage will likely be the most expensive thing you buy.

Operating system is a personal choice. Most bioclusters run Linux  
because of the availability of tools and applications. Most codes are  
being developed and supported on Linux these days. The type of Linux  
OS is a personal preference -- there are people on this list who use  
Debian, Gentoo, RedHat Enterprise, Fedora Core, Centos, Scientific  
Linux, Suse  etc. etc.

My default Linux flavor right now is Centos (www.centos.org) because  
my cluster building is biased towards minimizing the amount of work  
needed to keep the system running. With Centos I get a Redhat clone  
with an OS supported life span (patches and security updates) that  
exceeds the production life of my cluster. If I chose Fedora I'd  
likely have to push a whole new OS into the cluster within a year or so.

Anyway if you have more questions on Linux OS of choice for  
clustering you may want to take a peek at the beowulf at beowulf.org  
mailing list archives for the past few weeks -- they've been debating  
it off an on for a long time now and there are many great and nuanced  
answers to be found in the replies.

My $.02 of course


On Oct 19, 2007, at 10:16 AM, Ahmed Moustafa wrote:

> Hello!
> What would be your recommendations for building a cluster of ~20-30  
> machines with a budget of about $50K?
> I think more medium power machines would provide higher throughput  
> than fewer super power machines, especially, while performing  
> genome-wide analyses.
> What would you recommend for hardware brands and specs and OS?
> Thanks in advance!
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters

More information about the Bioclusters mailing list