Chris Iacovella wrote: > The instructions are not clear to me as to how to properly install > gridengine on other execution hosts, and have those nodes communicate > with the headnode. > Ok this is more clear... The short answer is this: You do not have to do anything to execution hosts other than install the SGE startup script (and make sure the exec hosts are NFS mounting the sge directory). The medium length answer is this: 1. During the SGE install process on the head node you will have been asked for the hostnames of your execution hosts. If you input the hostnames there then SGE will automatically preconfigure itself to "know" about the compute nodes. It will also create the default queues for you from template files if you just hit "Y" to the defaults when it askes about this. 2. If the SGE head node is already aware of the exec hosts and the queues have all been set up then you only need a few minor things on the compute nodes: a. entry for sge_commd in /etc/services (some ENV var will override this if you don't want to edit services or netinfo) b. NFS mount the SGE_ROOT directory c. Run the "rcsge" script to start the daemons That should be it -- the basic rule of thumb is that you can configure the head node to be aware of compute nodes and queues during the install process or afterwards (by issuing manual qconf commands). Once this is done the compute nodes just need a NFS mount and a startup script. All the config work is done on your head node which you have already said is working fine. The clients just need to NFS mount the SGE_ROOT, start the daemons and check in with the qmaster process. If this fails it is usally due to network routing or DNS issues. If setting up execution hosts and default queues is failing during the install script you can still set them up manually as an SGE admin user on your working head node. The docs on this are easy -- just look up information on how to "add queues" and "add execution hosts". Various problems could be caused by: o bad hostname resolution or DNS issue within the cluster o permission or uid/gid mismatch errors on NFS mount o firewalls blocking sge_commd traffic -Chris