Hello, This is your problem: On Jul 6, 2006, at 8:46 PM, francois.fauteux2 at mail.mcgill.ca wrote: > The qstat -f command outputs: > > queuename qtype used/tot. load_avg > arch states > ---------------------------------------------------------------------- > ------ > all.q at mac2 BIP 0/2 -NA- -NA- au > ---------------------------------------------------------------------- > ------ > all.q at mac1 BIP 0/1 -NA- -NA- au > ---------------------------------------------------------------------- > ------ > all.q at mac3 BIP 0/2 -NA- - > NA- au The reason you can't run jobs is that you have no available job slots. The reason you have no job slots is because Grid Engine may not be running on your three systems - or if it is running it is having firewall, routing or nameserver issues. The main indication here is the "au" entry in the state column for each of your queue instances. State "au" means 'alarm + unreachable' or 'alarm + unheard' and it means that the SGE qmaster process has not been receiving periodic state and staus reports from the sge_execd daemons running on the compute nodes. On working clusters this almost always means that SGE is simply not running on the cluster node and the fix is to simply restart SGE on the nodes in question. Not sure about the root cause on your system, since this is a new install this could also be an artifact of a configuration problem or installation issue. Typically this would be caused by a firewall blocking ports that SGE uses, a routing issue or (very very common) some sort of hostname or DNS lookup issue. Hopefully this is just a "sge is not running" issue -- to check this, login to one of the compute nodes and do a "ps ax | grep sge" command -- you should at least see a "sge_execd" daemon running on each compute node. If you don't see this, simply run the SGE startup script and redo the "qstat -f" command. If SGE starts up OK you will see the "au" status dissapear and you will see real numbers instead of "-NA-".