[Bioclusters] Login & home directory strategies for PVM?
    Tim Cutts 
    tjrc at sanger.ac.uk
       
    Fri Feb  4 04:12:45 EST 2005
    
    
  
On 4 Feb 2005, at 6:46 am, Michael Gutteridge wrote:
>
> I don't believe this problem to be specific to PVM, but could be an 
> issue with any parallel machine using large node sets.  I'm curious as 
> to strategies anyone else has used to mitigate the problem I've 
> described, especially for circumstances such as this, where the slave 
> nodes are merely compute donors.
Most very large clusters in the HPC world don't allow NFS at all, or 
minimise it.
Our 1000-node cluster does allow some NFS, but this is to scratch 
directories, and *not* to all users' home directories, in general.
Even then, we are in the process of replacing our NFS scratch 
directories with true cluster filesystems (GPFS and/or Lustre), largely 
for performance reasons.  NFS really does suck, and NFS abuse by users 
is the primary cause of cluster failure here.
But to answer your question:  it sounds like you're automounting your 
users' home directories.  We rapidly found that automount really 
doesn't work on clusters.  Although it's easy to administer, you get 
the behaviour you're seeing; large numbers of simultaneous mount 
requests, which overwhelm the NFS server.
Consequently, the few NFS filesystems we allow our farm nodes to see, 
we mount statically in /etc/fstab.  We don't automount anything.
You still get the multiple mount requests problem when you switch the 
cluster on (say after a power failure) so on the rare occasions we have 
to power cycle the whole cluster we have to be careful to only switch 
on a few dozen machines at a time until they're all up.
Tim
-- 
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233
    
    
More information about the Bioclusters
mailing list