[Bioclusters] Request for discussions-How to build a biocluster Part 3 (the OS)

Andrew Shewmaker bioclusters@bioinformatics.org
Tue, 14 May 2002 10:56:19 -0600


On Mon, 13 May 2002 22:17:48 -0500
Mike Coleman <mkc@mathdogs.com> wrote:

> I think we're using the word 'distribution' in slightly different ways.  When
> I think of my favored distribution, Debian Linux, I think primarily of the
> packaging tools and format (apt-get, dpkg, .deb files) and the philosophical
> and engineering structures within which the packages get created and
> maintained.  Red Hat, another popular Linux distribution, has analogs which
> are quite different.

I think you are right that you are using the word 'distribution' differently 
than Don.  My opinion is that Scyld is its own distribution, or is at least 
becoming one in the same way that Mandrake has.  I'm sure it will stay 
compatible with RedHat to a very large degree, but IMO it is a derived 
distribution.  

Your definition may not be as precise as you thought because of projects like 
www.gentoo.org  Gentoo calls itself a meta-distribution because it compiles 
each of its packages and allows each user to make a custom distribution.  
Each custom distribution shares the same package tools and format, but they 
can be very different---binaries from one may very well not run on another.  
It's a fuzzy situation.
 
> Beowulf software, though it may require special versions of libraries and
> patched kernels, is, from the distribution perspective, just a set of
> packages.  As far as I can see, the pieces could be packaged for the different
> distributions with little difficulty.  Considered this way, the Beowulf
> software seems almost completely orthogonal to the choice of distribution, and
> therefore should not determine its choice.

You are partly correct.  I have used RedHat and Mandrake to build beowulf 
clusters, and it is easy to make a cluster out of them.  They come with many 
beowulf packages like pvm, mpich, lam-mpi, and others are easy to compile 
yourself.  I have also used Scyld though, and I could tell that they put a 
lot of effort into quality assurance for specifically beowulf-related issues.  

It was said either here or on the beowulf list that Scyld did a lot of work 
to make sure things like large files really worked.  Where the clusters I 
made myself were functional, they were not elegant and they were not easy 
to administer and I keep bumping into problems.  The problems I run into are 
there because RedHat and Mandrake concentrate mostly on servers and desktops, 
as they should.

A beowulf isn't just software, it is also a hardware architecture.  When I 
first started working with clusters, I thought of them as a network of 
workstations without monitors.  Now that I have more experience I try to 
get away from that perspective.  Administering and using a cluster as a 
(pseudo) single system image is much more preferable to me.  The Scyld/Bproc 
method of having minimal slave nodes (with basically a kernel, some libraries, 
and a daemon) instead of full installations of a Linux distribution is a Godsend.

 
> Of course, some situations are indifferent to distribution, and may value a
> turn-key solution, but that's not my situation.
> 
> 
> It does seem that the only reasonable way to structure a cluster of any size
> is for the compute nodes to somehow mirror some canonical copy of their
> filesystems.  (No one wants to do any manual operation O(N) times.)
> 
> The two obvious ways to accomplish this mirroring would be to use nfsroot and
> run the compute nodes diskless (or OS-diskless, anyway), or otherwise to use
> something like SystemImager to clone the systems at boot time.  I can see
> advantages to both, but currently I'm partial to the nfsroot alternative.  It
> just seems simpler, and these days I worship simplicity.  :-)
> 
> The Bproc stuff looks interesting.  I worry, though, about the amount of brain
> surgery being done to the Linux/Unix model (which goes back to that simplicity
> thing).

Or, mirror little or nothing and use the local disks for caching and scratch 
space.  Mirroring takes bandwidth away from the applications.  You may have 
plenty of bandwidth or not require much, but why mirror things like password and 
other configuration files when you don't have to?  Bproc moves complexity into 
the kernel (48k patch) in order to make things simpler for administrators and 
users.

I don't have a stake in Scyld or Bproc, but I had to transition from homebrewed 
clusters to Scyld and then back to homebrewed.  The main reason we went with 
homebrewed the last time was "in order to be flexible".  It is flexible, but it 
is also brittle.  Please learn from my group's mistake and avoid older style 
clusters where each node has a full install and go with something like Scyld/Bproc 
if you can.  There may be some circumstance where you want full installs...
we thought we had one, but now I think we were wrong.

I have been part of the construction and administration of four clusters, and I 
a lot has changed since even a year ago in terms of software for clusters.
The community has accumulated years of experience with beowulf clusters built on 
full distributions using rsh/ssh and NFS.  They identified the weaknesses with the 
original methods and tools and have produced next generation methods and tools 
like bproc, pvfs, and others (more arriving every day).  I'm going to take 
advantage of their hard work.

For more information about nextgen cluster packages/distributions (or at least 
ones that minimize adminstrative overhead by avoiding full installs):

www.scyld.com
www.clustermatic.org
bproc.sf.net
psoftware.org/clumpos (MOSIX)

Most other cluster software like OSCAR, Score, OpenSCE, or NPACI Rocks seek to 
minimize administrative, but still keep full installations on each node.  See 
http://www.lcic.org for links.

> > Doing diskless boots over NFS works for a handful of nodes, but is a
> > significant performance and scalability bottleneck.  There are ways to
> > mitigate the performance problems with tuning, but you have to measure and
> > monitor the system to verify that it's performing as you expect.
> 
> I suspect there are a lot of little issues here to bump into.  I hadn't
> considered that NFS attribute stuff you pointed out.  (Does Scyld do technical
> consulting as well?)
> 
> As before, since I have yet to build my first Beowulf, take all this with an
> extra large grain of salt.
> 
> Mike