[Bioclusters] Clusters for bioinformatics... Some numbers or statistics?

Ivo Grosse grosse@cshl.org
Thu, 30 Aug 2001 22:24:13 -0400


Part II ... more questions ...


> My problems with the NOW approach have little to do with power and more 
> to do with (a) unpredictable available CPU power 

What do you mean by that?

Do you mean that the foreground user may steel all the CPU power from 
the background process once in a while?  Does that have any negative 
effect on the background job (other than steeling some time) if the 
backround code is not parallel?


> and (b) non-trivial 
> administrative burden 

That is a VERY important point, at least for us!!!

In which sense is the administration of a network of, say, 20 machines 
harder than the administration of a cluster of, say, 20 nodes?

Just to give you an idea of how inexperienced an naive I am: I thought 
setting up a beowulf cluster with all the special cluster software and 
Mosix on top and whatever else would be some extra work, making the 
total installation and administration work *harder* and not *easier*?

Even worse: given the fact that we have to administer our 20 machines 
anyway (A), how could the extra installation and administration of a 
cluster (B) make the total installation and administration time (A+B) 
smaller than (A)?


> network or intranet. It may work nice in a lab, department or workgroup 
> but can quickly get hairy in a building, campus or enterprise.

Okay, this sounds encouraging ... so you think for a cluster of less 
than 20 machines, all in our building, all attached to the same 
network, all under our control, all being sysadmin-ed by us anyway, ... 
a "NOW" (as opposed to a separate cluster) would perhaps make some 
sense?


> o _many_ life science applications are rate limited by I/O throughput 
> and the way they get their data is via the network. This means that the 
> performance of your NOW system is going to be dependent on the speed and 
> uptime of the regular internal intranet. All it takes is the start of 
> your IT group's backup server or a couple of porn-downloadin', 
> net-radio-listenin' people to trash your network performance. 

Very good point!  Could that problem be circumvented by storing the 
data on the local hard disks rather than fetching it over the net?


> network performance can do much more than slow a system down; it can 
> cause jobs & data to disappear and other nastyness.

Could you please give a few more details on those nasty things, and how 
frequently they may happen?


> One company that is doing the NOW thing in life sciences that I've heard 
> of is TurboGenomics. They have TurboBlast available and are apparently 
> porting that system into a more general application framework. I had an 
> interesting interaction with a TurboGenomics employee at the Drug 
> Discovery Conference a few weeks ago, his first words were "Blackstone? 
> I'm not allowed to talk to you." heh. Very nice people though despite 
> being competitors.

Sorry, I think I am lost.  How does the second part of the paragraph 
relate to the first?  What does the fact that they are nice people and 
your competitors tell me about whether a NOE is a good or a bad idea?  
Could you please clarify?


Again, thanks for all your great and detailed answers, and again I 
apologize for all of my naive questions.

Best regards,

Ivo