[Bioclusters] SGE jobs staying in dr state

Chris Dagdigian dag at sonsorol.org
Wed Mar 30 13:45:10 EST 2005


Some questions -

  - Which <major> version of Grid Engine are you running? (5.x vs 6.x)

  - If using 6.x are you configured with berkeley-db spooling or 
"classic" spooling?

  {In SGE 5 or SGE 6 with classic-mode spooling, state information and 
configuration on running/active jobs is stored in text files. If you can 
live with shutting down the SGE master for a few seconds you can make 
some "big hammer" type changes by editing/deleting the spool and active 
job data files. Not recommended for novice users/admins though.  }

  - Is sge_execd running on the nodes where the phantom jobs are or did 
you shut SGE down? You may have to fire the daemons back up just to 
allow for the job deletion state messages to pass back and forth

  - any interesting logfile messages?

Look in $SGE_ROOT/$SGE_CELL/spool/qmaster/messages as well as 
$SGE_ROOT/<CELL>/spool/<nodename>/messages to see if anything obvious occurs

Also there is a dedicated Grid Engine mailing list 
(users at gridengine.sunsource.net) with an active crowd of experts willing 
to help with issues like this. The list is worth monitoring if you are a 
heavy Grid Engine user and it is worth searching the list archives if 
you experience odd problems. More info is here: 
http://gridengine.sunsource.net/servlets/ProjectMailingListList



-Chris



Shane Brubaker wrote:

> 
> Hi, I have some SGE jobs which stay in a "dr" state and will not go 
> away.  I have issued a qdel command on these jobs, so they are in a 
> "deleted, running"
> state.  Usually such jobs will go away after a few minutes, but these 
> won't.  I also can't delete that queue now because it has jobs in it.
> 
> These happened to be fairly long jobs that ran a day or two.  Also, 
> these jobs do not show up on the actual nodes, so they aren't really 
> running anymore.  They only
> appear in qstat.
> 
> Any help would be much appreciated.
> 



More information about the Bioclusters mailing list