[Bioclusters] Request for discussions-How to build a biocluster Part 3 (the OS)

Tue, 14 May 2002 13:23:07 -0400 (EDT)

On 13 May 2002, Mike Coleman wrote:

> Donald Becker <becker@scyld.com> writes:
> > The Scyld distribution includes features such as unified process space,
> > single point/ single binary application installation, cluster directory
> > services and zero-administration compute nodes.  These features require
> > kernel and library support.
>
> I think we're using the word 'distribution' in slightly different
> ways.

I don't think so -- the Scyld distribution is a complete OS
installation, with integrated cluster tools.  The CD set is full
distribution.

> Beowulf software, though it may require special versions of libraries and
> patched kernels, is, from the distribution perspective, just a set of
> packages.  As far as I can see, the pieces could be packaged for the
> different distributions with little difficulty.

Viewed that way, there is little difference between Linux distributions.
They are just a set of packages with an installation program. They all
use approximately the same kernels, libraries, compilers and utilities.

But that discounts the value of a distribution.
Unless you have an integrated distribution, you can't provide a
complete, tested solution.  LFS large file support is an example.  Two
years ago we were the first to ship a distribution with tested LFS,
which workstation-oriented distributions didn't see as a priority.  That
wouldn't have been feasible with add-on tools for arbitrary distributions.

> It does seem that the only reasonable way to structure a cluster of any size
> is for the compute nodes to somehow mirror some canonical copy of their
> filesystems.

Not at all.  In a cluster, compute nodes exist to run jobs on behalf of
the master systems.  Putting a full installation on a compute node
increases the complexity, administrative burden, and opportunity for
failure.

With the Scyld system, compute nodes are dramatically simplified.  They
run a fully capable standard kernel with extensions, and start out with
no file system (actually a RAM-based filesystem).

There are many advantages of this approach.
  Adding new compute nodes is fast and automatic
  The system is easily scalable to over a thousand nodes
  Single-point updates for kernel, device drivers, libraries and applications
  Jobs run faster on compute nodes than a full installation

Presenting a simple model to the user is a very important thing.  Using
a NFS root makes it simple for the person installing the system, but
that is a hack not an architected system.  Doing system administration
will require detailed knowledge of what types of files to put on which
file systems, NFS has significant performance and scaling bottleneck,
and the users will have to deal with NFS consistency and caching quirks.

> The Bproc stuff looks interesting.  I worry, though, about the amount
> of brain surgery being done to the Linux/Unix model (which goes back
> to that simplicity thing).

Our system is much more than BProc, although many of the tools are built
using BProc as a base.  The system has dozens of integrated pieces
such as a cluster name service, status monitoring tools, integrated MPI,
and a unified administration system.

The user sees BProc as the unified process space over the cluster.  They
can see and control all processes of their job using Unix tools they
already know, such as a 'top', 'ps', 'suspend' and 'kill'.

> As before, since I have yet to build my first Beowulf, take all this with an
> extra large grain of salt.

You should look at the Scyld system -- we have innovative approaches to
solving problems that people have accepted as inherent to building clusters.
You can buy low cost  CD (note that it is an older version and comes
with no support) from Linux Central.  Or you can buy integrated clusters
from about a dozen vendors, including HP-Compaq.

-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993