[Bioclusters] SGE 6.1?

Chris Dagdigian bioclusters@bioinformatics.org
Sun, 26 Sep 2004 22:53:53 -0400


Grid Engine 6.0 - 6.0u1 has a bit of a catch-22 functionality hole right 
now when it comes to building fault-tolerant SGE implementations:

o You can use shared NFS with the time-tested shadow master feature but 
you have to use "classic" spooling instead of berkeleydb spooling which 
is slower and less scalable. Honestly though, for many people classic 
spooling is not going to make much of a throughput or performance 
difference.

o You can get around the "can't write berkeleydb files to shared NFS 
mount" problem by running the berkeley RPC spooling server. In this 
mode, spooling is done over the network to a remote RPC server -- this 
allows shadow masters to pick up the pieces after the qmaster falls 
over. You get the "fast/new" spooling technology with shadow master 
functionality, but...

The trouble with RPC server spooling (besides it being characterized as 
incredibly insecure) is that you can have only 1 RPC server currently. 
This effectively makes the use of shadow masters quite silly as you'll 
still have a single point of failure (the RPC server is now your 
critical failure point).

Rayson mentioned one possible workaround -- use NFSv4 and berkeleydb 
spooling. Have not tested this yet myself.

Another approach that we have tested and seen work is to use a shared 
SAN volume between SGE master hosts. Our testbed for this was a 100+ 
node Apple G5 Xserve cluster in which the 4x "head nodes" shared a SGE 
6.0u1 spool volume via Apple's XSAN software.  Failover worked fine when 
we knocked over head nodes. Our setup used a beta release of the XSAN 
product so I would not call this 100% rock solid, production-ready yet.

This functionality hole is just a byproduct of the new adoption of 
berkeleydb under the hood, I'm guessing NFSv4 and hopefully some way to 
run multiple RPC servers in the future will make this a non-issue.

-chris








Rayson Ho wrote:

> It's rather a limitation of NFS -- but if you are using NFSv4 for the
> spool directory, you can use shadow master with Berkely spooling.
> 
> Rayson
> 
> --- Steve <slitster@rcn.com> wrote:
> 
>>Not positive on this, but, I think 6.1 may allow the use of a shadow 
>>master in combination with the Berkely database functionality.
>>
>>
>>Steve
> 
> https://bioinformatics.org/mailman/listinfo/bioclusters