[Bioclusters] High Availability Clustering

Chris Dagdigian bioclusters@bioinformatics.org
Wed, 23 Jul 2003 09:50:29 -0400 (EDT)

Grid Engine is free; see http://gridengine.sunsource.net. That is the same 
code that Sun packages up and sells through its own channels. The only 
differences seem to be (a) extra QA, (b) internationalization and (c) 
formal enterprise level support. The binaries are 100% the same. Both for 
SGE and the enterprise edition (SGEEE). 

Your experience may vary but I've found SGE and LSF to be superior to all 
versions of PBS that I've ever used. They all will work at the end of the 
day but some packages require more care/feeding and operational overhead 
over time than others.  

If you are running OpenPBS instead of at least PBSPro I'd argue 
semi-seriously that you are silly to even be considering HA hardware 
techniques for your cluster :)


On Wed, 23 Jul 2003, Osborne, John wrote:

> Hi Joe,
> Thanks for your comments, you definitely gave me enough to scare my boss who
> seems more interested in high availbility than is worthwhile.  We are
> running PBS now (I wasn't aware that it could or couldn't do HA so thanks
> for the tip) and I don't think they are going to be interested in paying the
> extra money for LSF or SGE yet.  I'm also not too sure what a dual ported FC
> disk is, so I should definitely avoid this for now!
> Is your experience similar to Chris's with regards to the unimportance of HA
> for research work?
> Thanks,
>  -John
> -----Original Message-----
> From: Joseph Landman [mailto:landman@scalableinformatics.com]
> Sent: Tuesday, July 22, 2003 3:22 PM
> To: biocluster
> Subject: Re: [Bioclusters] High Availability Clustering
> Hi John:
>   You have to look at what services your master node is providing, and
> decide your failover plan.  You need specifically to consider how you
> want to do a heartbeat (usually a serial cable or other physical
> connection) detection.  You need to look at file system issues.  You
> might need to invest in specific file system gear (dual ported FC disks,
> redundant NAS's, etc).  You would need to look carefully at your
> scheduler.  PBS cannot handle HA now, and there are good reasons to look
> at other schedulers.  SGE may be able to do HA, and LSF can do HA.
>   Have a look at http://www.linux-ha.org/.  Look at Mon (for providing
> basic monitoring and triggering).  Look at
> http://www.linuxvirtualserver.org/ and see if you could use that for
> some of your services.  It depends strongly upon the services you need
> the head node to provide.
>   You should look at GFS if you want the file system to be Linux based
> rather than appliance based.
> Joe
> On Tue, 2003-07-22 at 14:58, Osborne, John wrote:
> > Hello,
> > 
> > I'm the unofficial admin for a 20 node (40 CPU) linux cluster here at the
> > CDC and I'm looking for some advice.  Our setup here relies upon a
> *single*
> > master node which acts as a gateway to the internal cluster network.  If
> > something were to happen to the master node, we'd be in serious trouble if
> > we are aiming for 100% uptime.  So far we aren't that serious about 100%
> > uptime (although we've had it for this master node thus far) but as the
> > popularity of the cluster grows it is becoming more important.  I am
> > wondering what is the best way to ensure failover for a master node in a
> > cluster.  Write now I just write out a master node image to network
> storage
> > every night and if something goes wrong, the cluster is effectively down
> and
> > it could take hours to get it fixed.
> > 
> > Is it possible to have 2 master nodes with a single virtual IP address?
> How
> > are other people solving this problem?
> > 
> >  -John
> > 
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters

Chris Dagdigian, <dag@sonsorol.org>
BioTeam Inc. - Independent Bio-IT & Informatics consulting
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net