[Bioclusters] Re: blast and nfs

Thu, 24 Apr 2003 12:08:45 -0400 (EDT)

On Thu, 24 Apr 2003, Bruce O'Neel wrote:

> basically a fast serial connection is a bad idea.  If you use 100
> megabit ethernet you max out somewhere around 40 megabits or so
> because you can't use the full channel bandwith.

This is an "urban myth" spread 15 years ago by people trying to push
Token Ring.  Astonishingly, it's still repeated despite being trivially
disproved by everyday experience.  Typical TCP/IP/Ethernet bandwidth
delivered to an application is well over 90%.  You should see
11-12MB/sec for a Fast Ethernet file transfer.

The 35% or 40% number comes from a too-simple analysis of CSMA
(e.g. Aloha) instead of CSMA/CD/EBO that Ethernet uses.  But even that
is moot, since every current Ethernet deployment uses a heavily buffered
switch, likely with flow control, rather than a collision domain.

> That, combined with modern OSs hard work to cache disks well, and then
> combined with cheap IDE hard disks, means that it almost always is a
> win to put your data locally.

That I completely agree with: disk bandwidth is still the least
expensive, most effective available.  The key is dealing with the
complexity of many disks; knowing when they are being used as a cache vs
being used as a persistent store of unique data.

> NFS is good for things like login directories where you read small
> files once or twice and for source code repositories where you don't
> keep re-reading the files.

NFS is an exceptionally efficient protocol for read-only small files,
and login directories are an excellent example of rarely changed,
concise configuration files.

> NFS is very bad for big files since (basically) every 8k bytes or so
> requires the file to be reopened on the server, then you have to seek,
> then 8k bytes is read, and then closed again.

That's a mis-characterization: each NFS request is independent (more
precisely, idempotent), but the server isn't implemented as a
open()/seek()/read|write()/close().  I wrote one of the first non-Sun
NFS servers, implemented as user-level code, so I'm familiar with the
implementation options.

That's not to say that NFS is _good_ at reading large files.  It's not.
But the real weakness of NFS is the consistency model when writing or
doing directory modifications.

> To make things worse some labs then do the incremential aproach to
> NFS...
> Far better is to have a central NFS server for all of your home
> directories, and then have your central archives
> mirrored/rsynced/whatever to your different compute nodes.

I completely agree that NFS isn't a good cluster construction solution.

While it's possible to build a reasonable system by tuning the NFS
attribute and data caching parameters on a per-directory basis, this
   adds major complexity to the file system configuration,
   involves much expertise and
   requires revisiting the decisions with updates or usage changes.

There are better, more efficient and manageable solution for building
clusters than just blindly mirroring/rsync/imaging a whole installation
to each node.  PowerCockpit is an example of where that leads.  The
copying part is trivial.  The complexity is the scripts to configure
each application.  And when you are finished, you just have a quick way
to do a reinstall, not an effective way to manage many similar machines.

-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993