[Bioclusters] topbiocluster.org

James Cuff jcuff at broad.mit.edu
Tue Jun 28 15:15:03 EDT 2005


So I kept going a little here, just to see what falls out.

http://topbiocluster.org/firststab.html

has the results for 7 platforms, ranging from osx to tru64 etc.  I
basically applied the same recipe we talked about the other day, and
wrote a: 

"platform independent parallel infrastructure agnostic protein
processing pipeline suite" 

(also known as a bunch of really dodgy shell scripts, and a big old
compile the whole darn bag from scratch approach:-)).  

You would never guess, but certain machines perform differently to
others.  This first stab is in no way an attempt to classify anything,
some of our hardware is showing it's age, and I did not tweak a single
parameter.  This data does have a // aspect, and also includes LSF
submission to a farm, it is very small scale, just to test the ideas.

The point is _not_ to actually use the scripts (they suck), but rather
do what ever you can to get the fastest possible result for a given set
of target ids that are presented against two sets of data files.  

I think the real magic here will to be use what ever hardware, raid
storage, mpiblast, hardware accelerator, maspar, SAN, GPFS, palm pilot
based cluster you have to be able to get the answer in the fastest
possible time.  

If we manage to do this, we as a community will be able to identifying
the 'topbiocluster'.  I think we will also learn a lot of how people
have things connected and built.  We know it's more than the component
parts, the topcluster will be a union of all the parts, with smart folk
making them work.

To reiterate the results in this first stab were not optimized in
anyway, and are just simple, the scripts are there if you want to see
what I did.  However even being naive they do start to show the possible
variation.  They are also not ranked, there is no 'winner' from these
results.  

With luck, the community out there will know how to drive their hardware
in the best possible manner to get this simple protein pipeline to
deliver the results in the shortest possible time.

We can (later) have further 'contests' for gromacs, dna pipelines etc,
but it might be worth us taking a stab at this first one.  

Just in case folk think I've been 'drinking the koolaid' (as they seem
to say a lot here) let me know if you would be at all interested in
participating by return email to me.  I'll collect the results and let
folk know if we have enough critical mass.

If we get enough numbers we can get the ball rolling, I'll post up the
data files, and to be honest that should be all we need, everyone on
this list knows how to compile and run these tools.  

Assuming good luck and a following wind, we may even be able to have a
bioclusters meeting to discuss the fall out, and all the things we have
discovered.  But I'm getting ahead of myself, again let me know directly
if you would be willing to take part in the experiment, ok 'contest' :-)


Best regards,

J.

 



More information about the Bioclusters mailing list