[Bioclusters] Mount problems

david speed (RI) bioclusters@bioinformatics.org
Wed, 23 Jul 2003 13:37:15 +0100


Hi All,

We have installed SGE onto our 15-node Linux (Red Hat 7.1) cluster (30 Intel CPUs). There is an NFS export mounted from the head node to each slave node solely to contain the SGE tools and directories.  We have installed the ncbi blast tools and the databases to be blasted against locally on each node.

When running test batches of blasts on Grid Engine (random) nodes will go into an error state due to (we think) the node being unable to access the SGE mount, the running job process remains in a RW status till the machine is rebooted (by pulling the plug  the shutdown command fails).  The process is running at 99.9 %cpu, the sge_shepherd process has S< status

Running the mount command lists the SGE mount as normal and we can cd into the SGE mount as normal however df causes the shell to hang (it outputs info on the other mounts but hangs just as it should output the SGE mount info)

The options we have used in fstab for the SGE mount are nfs	exec,dev,suid.rw,bg,soft,intr 0 0

The /var/log/messages file has entries similar to

kernel: nfs: task 3077 can't get a request slot

Anyone any idea what the problem is

David


David Speed
Programmer
Roslin Institute
Bioinformatics Group
Roslin, 
Midlothian, 
EH25 9PS, 
UK
Telephone: +44 (0)131 527 4200 (switchboard) 
Fax: +44 (0)131 440 0434

The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute.