[Bioclusters] Related question (was: a dedicated cluster to mpiblast the nr database)

Tim Cutts bioclusters@bioinformatics.org
Fri, 5 Dec 2003 14:22:07 +0000

On 05-Dec-03, Joe Landman wrote:
> What speed advantage would be worth such a price penalty?  Or better 
> put,  for every $1000USD over in price per unit from the base, how much 
> additional performance would be needed to justify the cost?   Is the 
> idea to keep the price performance the same (e.g. for 10x the price, you 
> get 10x the performance), or should it be a different function.

I don't think it has to be parity, like that (although that would be
nice) because the large boxes can have TCO benefits; smaller numbers of
system images, fewer systems to manage in general, less issues with
shifting data around between boxes and so on.

But the current ratio for BLAST is 1.4x the performance for at least ten
times the price frequently more, and that's very hard to justify,
especially with the management tools for commodity clusters that are now

> Another similar question is, if these larger boxen were 1/10 their 
> current price, would people buy 10x more?  Less than that?  More than 
> that?  That is, is this the analysis bottleneck?

I think if it were a matter of buying 10 large boxes or 100 little ones,
I'd be happy with the 100 little ones.  Once the cluster gets much
larger than that,  the bottlenecks move and it's harder to pre-judge
which will be better.    Analysis bottlenecks start being job management
and result collation, for example.

> Quite true.  I do see big clusters being used for a number of apps which 
> map reasonable well onto them.  You need the heavy metal for some tasks 
> (big memory/big I/O).  Some things simply cannot be done well on 
> distributed shared nothing machines. 

Precisely - Sanger's model has been to use both kinds of machines, and
this looks set to continue.


Dr Tim Cutts
Informatics Systems Group
Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK