Donald Becker <becker@scyld.com> writes: > Not at all. > You selection will depend on the type of cluster you want, and how much > effort and expertise you will need to build and maintain the cluster. > > You can build an first generation Beowulf cluster by putting together a > collection of independent machine running your favorite distribution. > That will require loading the distribution on each machine, and careful > control of system and application updates. It's suitable for one person > being the combined builder, administrator and user. > > The Scyld distribution includes features such as unified process space, > single point/ single binary application installation, cluster directory > services and zero-administration compute nodes. These features require > kernel and library support. I think we're using the word 'distribution' in slightly different ways. When I think of my favored distribution, Debian Linux, I think primarily of the packaging tools and format (apt-get, dpkg, .deb files) and the philosophical and engineering structures within which the packages get created and maintained. Red Hat, another popular Linux distribution, has analogs which are quite different. Beowulf software, though it may require special versions of libraries and patched kernels, is, from the distribution perspective, just a set of packages. As far as I can see, the pieces could be packaged for the different distributions with little difficulty. Considered this way, the Beowulf software seems almost completely orthogonal to the choice of distribution, and therefore should not determine its choice. Of course, some situations are indifferent to distribution, and may value a turn-key solution, but that's not my situation. It does seem that the only reasonable way to structure a cluster of any size is for the compute nodes to somehow mirror some canonical copy of their filesystems. (No one wants to do any manual operation O(N) times.) The two obvious ways to accomplish this mirroring would be to use nfsroot and run the compute nodes diskless (or OS-diskless, anyway), or otherwise to use something like SystemImager to clone the systems at boot time. I can see advantages to both, but currently I'm partial to the nfsroot alternative. It just seems simpler, and these days I worship simplicity. :-) The Bproc stuff looks interesting. I worry, though, about the amount of brain surgery being done to the Linux/Unix model (which goes back to that simplicity thing). > Doing diskless boots over NFS works for a handful of nodes, but is a > significant performance and scalability bottleneck. There are ways to > mitigate the performance problems with tuning, but you have to measure and > monitor the system to verify that it's performing as you expect. I suspect there are a lot of little issues here to bump into. I hadn't considered that NFS attribute stuff you pointed out. (Does Scyld do technical consulting as well?) As before, since I have yet to build my first Beowulf, take all this with an extra large grain of salt. Mike