[Bioclusters] RE: non linear scale-up issues?
Chris Dwan
bioclusters@bioinformatics.org
Tue, 11 May 2004 18:56:46 -0500
> Slinging a terabyte or two of traffic over the same worm-rotten,
> ocasionally-managed corporate network that handles things like
> payroll, HR, business apps etc. just to get some CPU cycles from a
> bunch of cheap $900 desktop CPUs can be, um... problematic.
I agree with this completely.
I try to treat data motion as an "out of band" problem which is
completely decoupled from the CPU scheduling and access problem. I
have found that we can get good use out of those $900 desktops provided
that I'm allowed to reserve 20GB (or so) for my target set and that I
can populate that 20GB with my target data via cron / rsync / whatever
on an automatic basis. All the scheduler really needs to know is
whether or not the data is already on a particular node.
This comes back to a very old saw indeed: Not all problems are suited
to parallel computing.
-Chris Dwan