[Bioclusters] NFS / SMC switches / GFS

Tue Aug 30 09:02:24 EDT 2005

Which of these PVFS2 Lustre GPFS have some level of redundancy ?

===============================================
David Coornaert    (dcoorna at dbm.ulb.ac.be)

Belgian Embnet Node (http://www.be.embnet.org)
Universite' Libre de Bruxelles

Laboratoire de Bioinformatique
12, Rue des Professeurs Jeener & Brachet
6041  Gosselies
BELGIQUE

Te'l:  +3226509975
Fax:  +3226509998
===============================================

DGS wrote:

>>Now that we have all 63 up and running it looks like we are
>>getting performance issues with NFS much in the same way
>>that others have reported here. Even moderate job loads
>>produce trouble - (nfsstats -c show lots of retransmissions),
>>    
>>
>
>Are you using NFS over TCP?  If not, you probably should.  That
>introduces some reliability problems, in that NFS/TCP is no
>longer stateless.  If the file server goes down, clients may
>hang.  But since your file server is your head node, it's mostly
>a moot point.  Lose the head node, and you lose the cluster
>anyway.
>
>  
>
>>grid engine execds don't report back in so qhost shows nodes not
>>responding though eventually they will return. On occasion one of
>>the switches stops and that whole "side" of the cluster disappears.
>>so we reboot the switch and are back in action. Anyway here are my
>>questions (thanks for your patience in reading through this)
>>
>>Has anyone had similar problems with these SMC switches ?
>>I'm not accustomed to having the switches die like this.
>>
>>In terms of improving NFS performance I've already
>>put SGE spool onto the local nodes to try to improve things
>>but only helps a little. There are various NFS tuning
>>documents with respect to clusters ( using tcp, atime, rsize,
>>wsize, etc options to mount). I've experimented with a few of
>>these (rsize, wsize) though with only very marginal positive impact.
>>for those with larger clusters and similar issues have you found
>>a subset of these options to be more key or influential than others ?
>>    
>>
>
>If you use NFS/TCP, the "rsize" and "wsize" parameters are 
>irrelevant.  The Linux NFS how-to suggest raising the 'sysctl' 
>values of "net.core.rmem_max" and "net.core.rmem_default" higher
>than their usual values of 64k.  You should also pay attention
>to the number of 'nfsd' processes running on your server.  The
>rule of thumb is eight per CPU.  In principle, the more clients
>you have the more 'nfsd' processes you want.  But multiple server
>processes contend for resources themselves, so you reach a point
>of diminishing returns in starting more. 
>
>  
>
>>One scenario that has been discussed is bonding two NICs
>>on the v40z in conjunction with switch trunking. Does anyone
>>have any opinions or ideas on this ? 
>>    
>>
>
>
>If your switch can trunk, go ahead.  I trunk together gigabit 
>ethernet interfaces on a FreeBSD file server.  I've some rumours
>to the effect that a four-way trunk on Linux can be slower than
>a two-way, due to problems in the bonding driver.  Regard that
>as just hearsay, however, because I don't have any experience
>with such things on Linux.  You might consider using jumbo
>frames, if your switches support that.
>
>  
>
>>Lastly is it even worth
>>it to keep messing with NFS ? And maybe go for GFS.
>>    
>>
>
>There are a number of parallel or cluster file systems in 
>addition to GFS, like PVFS2 (free), Lustre (sort of free),
>GPFS (free to universities), TeraFS (commercial), and Ibrix
>(commercial).  They may not work well for hosting home
>directories, because they're not optimized for that sort
>of I/O load.  They're also, in my experience, rather less
>than stable.  We built a fifty node cluster with just GPFS,
>no NFS and very little local disk.  The results were quite
>disappointing.
>
>File I/O is one of the major un-solved problems of cluster
>computing.  Anybody who tells you otherwise is trying to
>sell you something.
>
>David S.
>
>  
>
>>
>>
>>
>>    
>>
>_______________________________________________
>Bioclusters maillist  -  Bioclusters at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
>  
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://bioinformatics.org/pipermail/bioclusters/attachments/20050830/9d1f66d8/attachment.html