[Bioclusters] Assembly_contd
Chris Dagdigian
dag at sonsorol.org
Fri Jul 7 06:57:17 EDT 2006
Hello,
This is your problem:
On Jul 6, 2006, at 8:46 PM, francois.fauteux2 at mail.mcgill.ca wrote:
> The qstat -f command outputs:
>
> queuename qtype used/tot. load_avg
> arch states
> ----------------------------------------------------------------------
> ------
> all.q at mac2 BIP 0/2 -NA- -NA- au
> ----------------------------------------------------------------------
> ------
> all.q at mac1 BIP 0/1 -NA- -NA- au
> ----------------------------------------------------------------------
> ------
> all.q at mac3 BIP 0/2 -NA- -
> NA- au
The reason you can't run jobs is that you have no available job
slots. The reason you have no job slots is because Grid Engine may
not be running on your three systems - or if it is running it is
having firewall, routing or nameserver issues.
The main indication here is the "au" entry in the state column for
each of your queue instances. State "au" means 'alarm + unreachable'
or 'alarm + unheard' and it means that the SGE qmaster process has
not been receiving periodic state and staus reports from the
sge_execd daemons running on the compute nodes.
On working clusters this almost always means that SGE is simply not
running on the cluster node and the fix is to simply restart SGE on
the nodes in question.
Not sure about the root cause on your system, since this is a new
install this could also be an artifact of a configuration problem or
installation issue. Typically this would be caused by a firewall
blocking ports that SGE uses, a routing issue or (very very common)
some sort of hostname or DNS lookup issue.
Hopefully this is just a "sge is not running" issue -- to check this,
login to one of the compute nodes and do a "ps ax | grep sge" command
-- you should at least see a "sge_execd" daemon running on each
compute node. If you don't see this, simply run the SGE startup
script and redo the "qstat -f" command. If SGE starts up OK you will
see the "au" status dissapear and you will see real numbers instead
of "-NA-".
More information about the Bioclusters
mailing list