[Bioclusters] Distributed Load Management on the desktop

Mon Apr 10 12:25:37 EDT 2006

On 10 Apr 2006, at 5:09 pm, Andrew D. Fant wrote:

> I've been thinking about the  problem of job submission and the  
> tendency of head
> nodes to become bottlenecks, both in bandwidth and cycles.  I also  
> know that
> many of the batch management systems out there can be installed on  
> the desktop
> for direct submission.  Sadly, some animals are more equal than  
> others and some
> installations are easier to configure and support than others.   
> Since I don't
> have control over desktop installs and the technical support can  
> vary in
> quality, making things really simple is a good thing.
>
> I'd appreciate feedback from people who have submission-only  
> clients (or
> whatever term your package uses) on heterogeneous desktop  
> environments,
> expressing how well it works with your clusters, how hard the  
> installations and
> configuration were. and how it was accepted by the user community.   
> If people
> don't want to post here, I'll accumulate the posts and summarize,  
> though I'd
> love to see some discussion here.

Even if users are submitting directly from their workstations, it  
doesn't alleviate the head node problem much, and may make it worse.

1)  The queue system will still have a single node somewhere which is  
the master and actually performing the scheduling; all the submitting  
clients will still be having to contact this single node.  Since the  
submission will come over the network, there is actually then  
slightly more overhead this way than if they submit on the master  
node itself.

2)  The likelihood is that to make your administration easier, you  
have filesystems such as home directories NFS/CIFS mounted on these  
desktops.  Desktop submissions are likely to create a lot of network  
filesystem traffic which doesn't exist if you use one or two head  
nodes which have the data stored on physically attached storage.

As far as LSF is concerned, a client-only installation is quite easy  
to do.  I can't speak for other batch systems, but I expect the same  
is true of SGE and PBS.

Of the two problems I outline above, I suspect that 1 will not really  
be a problem, since it's not really any worse than having a single  
head node.  But 2 could bite you hard, depending on how disciplined  
your users are.

Here, we tend to treat desktop machines as fairly dumb terminals,  
used for X sessions, WWW and e-mail (and office applications in the  
case of Windows machines).  We used to have a few Tru64 workstations  
(about 10) working as submission-only hosts.  They became very  
awkward to maintain, and have steadily been replaced with the dumb  
linux terminals everyone else has.  I think there's only one left  
now, and I pretend it doesn't exist.  :-)

If you encourage users to start doing real work on their local  
processors with local data, it can easily become a management  
nightmare for your desktop support folks, to say nothing of coming up  
with a backup strategy for the machines' local data.  Better to keep  
the desktop machine dumb, so if it fails you just bin it (or at least  
take it away to fix at leisure) and give them another one.

Just my 0.02

Tim