Hi David: See http://nfs.sourceforge.net/. This could be a network card issue, a driver issue, a physical network issue. My experience with running into this is that you typically get these when you have the NFS server connected on the same speed link as the rest of the net (e.g. all 100 base T), the NFS server machine isn't particularly fast, or you have a buggy driver. As you are using 7.1 RH, I might recommend you at least update the kernel (if it is not a late model) and the NFS tools. I might also suggest changing the soft to a hard mount. All this will do is force an infinite number of retries, while soft will silently fail. The silent failure does not pass information back to the application properly, so the application blocks on IO, consumes cycles, and eventually causes problems as it is unkillable. Also make sure you use intr as an option. Joe On Wed, 2003-07-23 at 08:37, david speed (RI) wrote: > Hi All, > > We have installed SGE onto our 15-node Linux (Red Hat 7.1) cluster (30 Intel CPUs). There is an NFS export mounted from the head node to each slave node solely to contain the SGE tools and directories. We have installed the ncbi blast tools and the databases to be blasted against locally on each node. > > When running test batches of blasts on Grid Engine (random) nodes will go into an error state due to (we think) the node being unable to access the SGE mount, the running job process remains in a RW status till the machine is rebooted (by pulling the plug the shutdown command fails). The process is running at 99.9 %cpu, the sge_shepherd process has S< status > > Running the mount command lists the SGE mount as normal and we can cd into the SGE mount as normal however df causes the shell to hang (it outputs info on the other mounts but hangs just as it should output the SGE mount info) > > The options we have used in fstab for the SGE mount are nfs exec,dev,suid.rw,bg,soft,intr 0 0 > > The /var/log/messages file has entries similar to > > kernel: nfs: task 3077 can't get a request slot > > Anyone any idea what the problem is > > David > > > David Speed > Programmer > Roslin Institute > Bioinformatics Group > Roslin, > Midlothian, > EH25 9PS, > UK > Telephone: +44 (0)131 527 4200 (switchboard) > Fax: +44 (0)131 440 0434 > > The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute. > > > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman@scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615