[Bioclusters] Distributed Load Management on the desktop
Tim Cutts
tjrc at sanger.ac.uk
Mon Apr 10 12:25:37 EDT 2006
On 10 Apr 2006, at 5:09 pm, Andrew D. Fant wrote:
> I've been thinking about the problem of job submission and the
> tendency of head
> nodes to become bottlenecks, both in bandwidth and cycles. I also
> know that
> many of the batch management systems out there can be installed on
> the desktop
> for direct submission. Sadly, some animals are more equal than
> others and some
> installations are easier to configure and support than others.
> Since I don't
> have control over desktop installs and the technical support can
> vary in
> quality, making things really simple is a good thing.
>
> I'd appreciate feedback from people who have submission-only
> clients (or
> whatever term your package uses) on heterogeneous desktop
> environments,
> expressing how well it works with your clusters, how hard the
> installations and
> configuration were. and how it was accepted by the user community.
> If people
> don't want to post here, I'll accumulate the posts and summarize,
> though I'd
> love to see some discussion here.
Even if users are submitting directly from their workstations, it
doesn't alleviate the head node problem much, and may make it worse.
1) The queue system will still have a single node somewhere which is
the master and actually performing the scheduling; all the submitting
clients will still be having to contact this single node. Since the
submission will come over the network, there is actually then
slightly more overhead this way than if they submit on the master
node itself.
2) The likelihood is that to make your administration easier, you
have filesystems such as home directories NFS/CIFS mounted on these
desktops. Desktop submissions are likely to create a lot of network
filesystem traffic which doesn't exist if you use one or two head
nodes which have the data stored on physically attached storage.
As far as LSF is concerned, a client-only installation is quite easy
to do. I can't speak for other batch systems, but I expect the same
is true of SGE and PBS.
Of the two problems I outline above, I suspect that 1 will not really
be a problem, since it's not really any worse than having a single
head node. But 2 could bite you hard, depending on how disciplined
your users are.
Here, we tend to treat desktop machines as fairly dumb terminals,
used for X sessions, WWW and e-mail (and office applications in the
case of Windows machines). We used to have a few Tru64 workstations
(about 10) working as submission-only hosts. They became very
awkward to maintain, and have steadily been replaced with the dumb
linux terminals everyone else has. I think there's only one left
now, and I pretend it doesn't exist. :-)
If you encourage users to start doing real work on their local
processors with local data, it can easily become a management
nightmare for your desktop support folks, to say nothing of coming up
with a backup strategy for the machines' local data. Better to keep
the desktop machine dumb, so if it fails you just bin it (or at least
take it away to fix at leisure) and give them another one.
Just my 0.02
Tim
More information about the Bioclusters
mailing list