> > Now that we have all 63 up and running it looks like we are > getting performance issues with NFS much in the same way > that others have reported here. Even moderate job loads > produce trouble - (nfsstats -c show lots of retransmissions), Are you using NFS over TCP? If not, you probably should. That introduces some reliability problems, in that NFS/TCP is no longer stateless. If the file server goes down, clients may hang. But since your file server is your head node, it's mostly a moot point. Lose the head node, and you lose the cluster anyway. > grid engine execds don't report back in so qhost shows nodes not > responding though eventually they will return. On occasion one of > the switches stops and that whole "side" of the cluster disappears. > so we reboot the switch and are back in action. Anyway here are my > questions (thanks for your patience in reading through this) > > Has anyone had similar problems with these SMC switches ? > I'm not accustomed to having the switches die like this. > > In terms of improving NFS performance I've already > put SGE spool onto the local nodes to try to improve things > but only helps a little. There are various NFS tuning > documents with respect to clusters ( using tcp, atime, rsize, > wsize, etc options to mount). I've experimented with a few of > these (rsize, wsize) though with only very marginal positive impact. > for those with larger clusters and similar issues have you found > a subset of these options to be more key or influential than others ? If you use NFS/TCP, the "rsize" and "wsize" parameters are irrelevant. The Linux NFS how-to suggest raising the 'sysctl' values of "net.core.rmem_max" and "net.core.rmem_default" higher than their usual values of 64k. You should also pay attention to the number of 'nfsd' processes running on your server. The rule of thumb is eight per CPU. In principle, the more clients you have the more 'nfsd' processes you want. But multiple server processes contend for resources themselves, so you reach a point of diminishing returns in starting more. > > One scenario that has been discussed is bonding two NICs > on the v40z in conjunction with switch trunking. Does anyone > have any opinions or ideas on this ? If your switch can trunk, go ahead. I trunk together gigabit ethernet interfaces on a FreeBSD file server. I've some rumours to the effect that a four-way trunk on Linux can be slower than a two-way, due to problems in the bonding driver. Regard that as just hearsay, however, because I don't have any experience with such things on Linux. You might consider using jumbo frames, if your switches support that. > Lastly is it even worth > it to keep messing with NFS ? And maybe go for GFS. There are a number of parallel or cluster file systems in addition to GFS, like PVFS2 (free), Lustre (sort of free), GPFS (free to universities), TeraFS (commercial), and Ibrix (commercial). They may not work well for hosting home directories, because they're not optimized for that sort of I/O load. They're also, in my experience, rather less than stable. We built a fifty node cluster with just GPFS, no NFS and very little local disk. The results were quite disappointing. File I/O is one of the major un-solved problems of cluster computing. Anybody who tells you otherwise is trying to sell you something. David S. > > > > >