[Bioclusters] Using semi-public PCs for heavy computation jobs

Sun, 15 Feb 2004 14:10:13 -0600 (CST)

> As part of my graduate research, I need to run a job of a genome-wide
> scale. Using all of the computers available to me at my lab, this can
> take about 6 months. We don't have a cluster...

That must be a pretty impressive job.  I'd love to hear more about your
research that takes so much CPU time.  That, however, is probably a
different thread.  I have some experience with the situation you describe.

I've built up, by hook and crook, semi-integrated access to a moderate
number of compute resources (a few hundred CPUs) spread over several
campuses.  I suspect we're in something of the same boat, resource wise.
You could call this hodgepodge a "grid," but that lends a dignity and
maturity to a system whose only really positive attribute is that it
functions to get real work done.

Below is a list, easiest to hardest, of systems I've hooked in:

* My very own cluster that I admin
* The cluster maintained by our local supercomputing center
* Clusters maintained by collaborators at other institutions
* Lab workstations that I admin, running Linux or OS X
* Lab workstations maintained by a someone else, running Linux or OS-X
* Lab workstations which usually run Windows, maintained by someone else,
    which can be rebooted into Linux or OS X at night

Then, of course, there are the systems which I decided would be too much
trouble, particularly given the number of CPUs in question:

* Lab workstations running OS 9 or Windows, which I can't get rebooted
    into Linux.

> I am already making use of a students computer lab, after-hours. Those
> computers run linux, and it was a no-brainer : just hacked some scripts
> to rshell into the machines, activated by the crond. It's not enough,
> though.

The major queuing systems for clusters (LSF, PBS-Pro (Torque?), and SGE)
each have facilities for cycle stealing from workstations.  The very
best way to approach this situation is to convince the lab admin to set up
a queuing system to run jobs on those machines only in certain hours, or
(better) when there is nobody logged in at the terminal, the load is below
the number of CPUs, the mouse hasn't moved and no key has been struck in
15 minutes, or whatever they're most comfortable with.

Many folks on this list (myself included) have written our very own rsh /
cron based remote job execution system.  Unless there's a really good
reason to do it (infinite CPUs available, but there's no chance of getting
a queueing system installed so you have to hack) experience says that it's
better to use an established package with a user support base and code
maintained by somebody else.

A system which has been around a long time, is quite mature, and doesn't
get nearly enough credit is Condor (from the University of Wisconsin).
It's explicitly designed as a cycle scavanger.  I know some cluster admins
who run condor to "backfill" their tightly scheduled cluster.

Integrating a set of queuing systems across domains remains tricky.  I've
found that, despite unlimited hype, grid software (including the globus
package) remains best suited to a single administrative domain, single
administrator setup.  Of course, I haven't tried all of the offerings, and
I haven't installed globus this week...so things might have changed.  I
encourage you to try all the options available and see what
works for you.  Exclude Sun's "Grid Engine" from the above statement, as
they have a slightly different definition of a "grid" than we're talking
about here.

Anyway, I've got a horrible, hacked together "metascheduler" that has
nothing going for it except the fact that it works.  I would happily throw
it away if someone came out with a product or tool that did the same
thing.  It, on a user by user basis maintains a list of the resources to
which that user can connect.  It loops over a queue of jobs, checking to
see which resource is not overloaded and sending jobs out as appropriate.

I've used both PBS and SGE's faclity for "sloughing off" jobs from one
queue to another.  These are neat, unless you need to hand off jobs for
only some users, but not others, to go between SGE and PBS, or to really
keep track of errors.  Probably best suited to a setup with several
queues maintained by a single admin or single team.

> While I'm looking at the option of getting or buying CPU time on a
> cluster, I am also tempted to make use of other public PCs at the
> campus. The ideal thing here is to have something like SETI@home or
> Fold@home, but I would go for anything that will allow me to have my
> jobs running on as many PCs as possible here, while not making me the
> enemy of the system admins...

United Devices sells a software package to do this, and some very large
corporate installations (2000+ CPUs) have been brough online.  The trick
with systems like this is getting enough systems to offset the high
latencies and (generally) low performance CPUs.  If you can convince your
University IT department to make a campus wide resource of this sort, it
will be terrific.  On the other hand, you'll probably have to share it.

In any solution you build, data motion and error detection / correction
will be the biggest time-sinks.

I've found that, for jobs requiring a moderate size dataset (my core set
of BLAST targets is around 14GB) data motion should be decoupled from CPU
finding.  I.e:  I have one process that pushes data out to compute
resources on a regular basis, and jobs are only scheduled onto nodes that
have the needed data.  This means that I have to ask my partners not just
for access to their CPUs, but for a bit of storage dedicated to me, as
close to the compute nodes as possible.

> I think it would be best if I can approach the authorities with a
> sensible plan - first impressions are very important...

The social aspects of this sort of distributed computation are, by far,
the most important.  If there is trust between the administrative domains,
the rest is really just technical work.  Without the trust, it's nearly
impossible to make even the best plan succeed.

> I would like to hear anything about this subject: configuration
> suggestions, past experience, encouragements, discouragements, etc.

Me too.  I've got my experiences and opinions, and I'm always interested
in other takes on similar problems.  On a totally selfish note, if anyone
wants to share CPUs with me, we can expand each other's grids.

Any takers?

-Chris Dwan
 The University of Minnesota