[Bioclusters] RE: non linear scale-up issues?

David Gayler bioclusters@bioinformatics.org
Wed, 12 May 2004 20:16:32 -0500

This is all great feedback. Thanks.
As you have stated, data access/network bandwidth issues are definitely a
difficult problem to solve with no silver bullet in sight (fiber or at least
gigabit would be a good start though). I certainly understand the idea of
just building a dedicated cluster and calling it a day if that gives the
results back in the time needed. It certainly minimizes the amount of
management housekeeping that needs to be done. As stated, some problems may
not be well-suited to being distributed to 100 administrative assistant's

However, for the problems that do work well with the cycle stealing
solutions that y'all are using, is the 
1) mgmt such a royal pain in the rear that ultimately you say screw it?
2) are the political issues of harvesting from a couple hundred or thousand
machines a nightmare? If so, where does the security issues rank?

Thanks again.

Message: 2
From: Chris Dwan <cdwan@mail.ahc.umn.edu>
Subject: Re: [Bioclusters] RE: non linear scale-up issues?
Date: Tue, 11 May 2004 18:56:46 -0500
To: bioclusters@bioinformatics.org
Reply-To: bioclusters@bioinformatics.org

> Slinging a terabyte or two of traffic over the same worm-rotten, 
> ocasionally-managed corporate network that handles things like 
> payroll, HR, business apps etc. just to get some CPU cycles from a 
> bunch of cheap $900 desktop CPUs can be, um... problematic.

I agree with this completely.

I try to treat data motion as an "out of band" problem which is 
completely decoupled from the CPU scheduling and access problem.  I 
have found that we can get good use out of those $900 desktops provided 
that I'm allowed to reserve 20GB (or so) for my target set and that I 
can populate that 20GB with my target data via cron / rsync / whatever 
on an automatic basis.  All the scheduler really needs to know is 
whether or not the data is already on a particular node.

This comes back to a very old saw indeed:  Not all problems are suited 
to parallel computing.

-Chris Dwan


Bioclusters maillist  -  Bioclusters@bioinformatics.org

End of Bioclusters Digest