[Bioclusters] High Availability Clustering

Osborne, John bioclusters@bioinformatics.org
Wed, 23 Jul 2003 09:49:24 -0400


Hi Joe,

Thanks for your comments, you definitely gave me enough to scare my boss who
seems more interested in high availbility than is worthwhile.  We are
running PBS now (I wasn't aware that it could or couldn't do HA so thanks
for the tip) and I don't think they are going to be interested in paying the
extra money for LSF or SGE yet.  I'm also not too sure what a dual ported FC
disk is, so I should definitely avoid this for now!

Is your experience similar to Chris's with regards to the unimportance of HA
for research work?

Thanks,

 -John


-----Original Message-----
From: Joseph Landman [mailto:landman@scalableinformatics.com]
Sent: Tuesday, July 22, 2003 3:22 PM
To: biocluster
Subject: Re: [Bioclusters] High Availability Clustering


Hi John:

  You have to look at what services your master node is providing, and
decide your failover plan.  You need specifically to consider how you
want to do a heartbeat (usually a serial cable or other physical
connection) detection.  You need to look at file system issues.  You
might need to invest in specific file system gear (dual ported FC disks,
redundant NAS's, etc).  You would need to look carefully at your
scheduler.  PBS cannot handle HA now, and there are good reasons to look
at other schedulers.  SGE may be able to do HA, and LSF can do HA.

  Have a look at http://www.linux-ha.org/.  Look at Mon (for providing
basic monitoring and triggering).  Look at
http://www.linuxvirtualserver.org/ and see if you could use that for
some of your services.  It depends strongly upon the services you need
the head node to provide.

  You should look at GFS if you want the file system to be Linux based
rather than appliance based.

Joe

On Tue, 2003-07-22 at 14:58, Osborne, John wrote:
> Hello,
> 
> I'm the unofficial admin for a 20 node (40 CPU) linux cluster here at the
> CDC and I'm looking for some advice.  Our setup here relies upon a
*single*
> master node which acts as a gateway to the internal cluster network.  If
> something were to happen to the master node, we'd be in serious trouble if
> we are aiming for 100% uptime.  So far we aren't that serious about 100%
> uptime (although we've had it for this master node thus far) but as the
> popularity of the cluster grows it is becoming more important.  I am
> wondering what is the best way to ensure failover for a master node in a
> cluster.  Write now I just write out a master node image to network
storage
> every night and if something goes wrong, the cluster is effectively down
and
> it could take hours to get it fixed.
> 
> Is it possible to have 2 master nodes with a single virtual IP address?
How
> are other people solving this problem?
> 
>  -John
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman@scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Bioclusters maillist  -  Bioclusters@bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters