[Bioclusters] High Availability Clustering

Joseph Landman bioclusters@bioinformatics.org
23 Jul 2003 10:16:03 -0400

On Wed, 2003-07-23 at 09:49, Osborne, John wrote:
> Hi Joe,
> Thanks for your comments, you definitely gave me enough to scare my boss who
> seems more interested in high availbility than is worthwhile.  We are
> running PBS now (I wasn't aware that it could or couldn't do HA so thanks
> for the tip) and I don't think they are going to be interested in paying the
> extra money for LSF or SGE yet.  I'm also not too sure what a dual ported FC
> disk is, so I should definitely avoid this for now!

SGE is open source, and available for "free" (download time, install
time, self-education time are not free).

Dual ported fibre channel are disks with 2 physical IO port
connections.  They allow multiple fibre channel host bus adaptors (HBA)
to communicate with them  You typically need dual ported disks for
failover capability in the older view of failover.  There are SAN
systems and NASes that do completely mirrored disk writes/reads, though
I do not know how well they work.

> Is your experience similar to Chris's with regards to the unimportance of HA
> for research work?

True production shops need their data intact and available, and the
servers with applications and front-ends stable.  The rest of the
computing facility may not be as well protected.  Protecting compute
nodes from power-outages may make sense if your computing requirements
are mission-critical minimize-time-to-results.  If they are not, and you
can tolerate down time while you have no power, then you can focus upon
protecting your data.

An important question I ask my customers is:  how much would it cost you
(time and money) to rebuild the data that you would lose if this
component went away?  This component may be a disk, a server, an
application machine, a network.  When you start to ask those questions
in that manner, you'll see the important stuff is your data, your access
and protection/backup of your data.

People often bat about a "5-nines" number for uptime, and what they are
willing to pay for that.  Well, 5-nines is 99.999% or 0.001% downtime. 
For the ~31.6M seconds/year, 0.001% represents about 316 s of
unscheduled downtime per year.  Thats about 5 and a quarter minutes.  Do
you really need that?  4-nines is about 52.6 minutes, 3-nines is about
22 hours.  Why this is relevant has to do with the costs to achieve such
high levels of risk avoidance.

Rarely do high performance computing shops need such risk avoidance,
though in the case of many machines, the MTBF of the entire unit often
means that specific components may fail.  What you need is to understand
how you might continue to operate in the face of such failures, whether
or not you have "resilient" computing (insensitive to single points of

More often than not, what Chris indicated is the case.  You get critical
component failures (NFS, scheduler, etc) which bring the system down. 
Which of these are most critical?

1) Disk:  Good disks are important.  You do not need to go SCSI/FC RAID
everywhere.  I do know people who insist upon SCSI everywhere (including
compute nodes).  You need a good cluster storage infrastructure
(central).  You need a good platform storage infrastructure (outside
cluster) with good backup capability. What are the risks to your work if
the data is inaccessible?

2) application servers: Look at DRM as an application (so LSF/SGE/PBS
are examples of this).  Look at web servers and applications built upon
them as applications.  What are the risks to your work if they are

3) compute nodes:  What are the risks to your work if they are

4) head node: What are the risks to your work if it is inaccessible?

Your plan of nightly snapshots may be fine.  I have found great benefit
to booting my servers every now and then with Knoppix (www.knoppix.net),
and running partimage to copy the drive image over the net to an NFS
mount, or to a fast USB2 drive.  A good tape backup system (ala Arkiea
and others) could do a similar job.


> Thanks,
>  -John
> -----Original Message-----
> From: Joseph Landman [mailto:landman@scalableinformatics.com]
> Sent: Tuesday, July 22, 2003 3:22 PM
> To: biocluster
> Subject: Re: [Bioclusters] High Availability Clustering
> Hi John:
>   You have to look at what services your master node is providing, and
> decide your failover plan.  You need specifically to consider how you
> want to do a heartbeat (usually a serial cable or other physical
> connection) detection.  You need to look at file system issues.  You
> might need to invest in specific file system gear (dual ported FC disks,
> redundant NAS's, etc).  You would need to look carefully at your
> scheduler.  PBS cannot handle HA now, and there are good reasons to look
> at other schedulers.  SGE may be able to do HA, and LSF can do HA.
>   Have a look at http://www.linux-ha.org/.  Look at Mon (for providing
> basic monitoring and triggering).  Look at
> http://www.linuxvirtualserver.org/ and see if you could use that for
> some of your services.  It depends strongly upon the services you need
> the head node to provide.
>   You should look at GFS if you want the file system to be Linux based
> rather than appliance based.
> Joe
> On Tue, 2003-07-22 at 14:58, Osborne, John wrote:
> > Hello,
> > 
> > I'm the unofficial admin for a 20 node (40 CPU) linux cluster here at the
> > CDC and I'm looking for some advice.  Our setup here relies upon a
> *single*
> > master node which acts as a gateway to the internal cluster network.  If
> > something were to happen to the master node, we'd be in serious trouble if
> > we are aiming for 100% uptime.  So far we aren't that serious about 100%
> > uptime (although we've had it for this master node thus far) but as the
> > popularity of the cluster grows it is becoming more important.  I am
> > wondering what is the best way to ensure failover for a master node in a
> > cluster.  Write now I just write out a master node image to network
> storage
> > every night and if something goes wrong, the cluster is effectively down
> and
> > it could take hours to get it fixed.
> > 
> > Is it possible to have 2 master nodes with a single virtual IP address?
> How
> > are other people solving this problem?
> > 
> >  -John
> > 
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman@scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615