[Bioclusters] SGE jobs staying in dr state

Rayson Ho raysonlogin at yahoo.com
Wed Mar 30 22:19:48 EST 2005


As the sgeadmin (or the "admin user" of the SGE cluster), you run:

% qdel -f <jobid>

You are encountering this likely because one of the nodes goes down, or
due to OS/network problems.

Rayson




--- Shane Brubaker <brubaker2 at llnl.gov> wrote:
> Hi, I have some SGE jobs which stay in a "dr" state and will not go 
> away.  I have issued a qdel command on these jobs, so they are in a 
> "deleted, running"
> state.  Usually such jobs will go away after a few minutes, but these
> 
> won't.  I also can't delete that queue now because it has jobs in it.
> 
> These happened to be fairly long jobs that ran a day or two.  Also,
> these 
> jobs do not show up on the actual nodes, so they aren't really
> running 
> anymore.  They only
> appear in qstat.
> 
> Any help would be much appreciated.
> 
> 
> Thank You,
> Shane Brubaker
> JGI
> 
> At 09:09 AM 3/30/2005, you wrote:
> >Send Bioclusters mailing list submissions to
> >         bioclusters at bioinformatics.org
> >
> >To subscribe or unsubscribe via the World Wide Web, visit
> >         https://bioinformatics.org/mailman/listinfo/bioclusters
> >or, via email, send a message with subject or body 'help' to
> >         bioclusters-request at bioinformatics.org
> >
> >You can reach the person managing the list at
> >         bioclusters-owner at bioinformatics.org
> >
> >When replying, please edit your Subject line so it is more specific
> >than "Re: Contents of Bioclusters digest..."
> >
> >
> >Today's Topics:
> >
> >    1. Re: alternative DHCP implementations?
> (jason.calvert at novartis.com)
> >    2. Re: alternative DHCP implementations? (Lars G. T. Jorgensen)
> >
> >
>
>----------------------------------------------------------------------
> >
> >Message: 1
> >Date: Wed, 30 Mar 2005 11:43:17 -0400
> >From: jason.calvert at novartis.com
> >Subject: Re: [Bioclusters] alternative DHCP implementations?
> >To: "Clustering,        compute farming & distributed computing in
> life
> >         science informatics"    <bioclusters at bioinformatics.org>
> >Message-ID:
> > 
>
><OFE1C386E6.4CD361C1-ON85256FD4.00560E7A-85256FD4.00565C5B at EU.novartis.net>
> >
> >Content-Type: text/plain; charset="us-ascii"
> >
> >There are scripts within the OSCAR release to do this for you.  You
> can
> >start the scripts, power on the nodes in the order you wish, and
> then
> >assign them to auto generated hostnames.  The scripts output the
> dhcp.conf
> >file.
> >
> >I would think you could pull them out of oscar pretty easily.
> >
> >Jason
> >
> >
> >
> >
> >Chris Dagdigian <dag at sonsorol.org>
> >Sent by:
>
>bioclusters-bounces+jason.calvert=pharma.novartis.com at bioinformatics.org
> >03/29/2005 02:27 PM
> >Please respond to "Clustering,  compute farming & distributed
> computing in
> >life science informatics"
> >
> >
> >         To:     adamm at menlo.com, "Clustering,  compute farming & 
> > distributed computing in
> >life science informatics" <bioclusters at bioinformatics.org>
> >         cc:     (bcc: Jason Calvert/PH/Novartis)
> >         Subject:        Re: [Bioclusters] alternative DHCP
> implementations?
> >
> >
> >
> >
> >Agreed. It was just a shortcut. We already do allocation of IP based
> on
> >MAC address but that only works when you know the MAC address
> >information ahead of time.  This is rare especially on whitebox
> cluster
> >projects where people don't put the MAC on the product packaging or
> on
> >the chassis itself. Some vendors do a good job of making the data
> easy
> >to find and others simply don't bother.
> >
> >A dhcp server handing out dynamic-range leases in a predictable
> manner
> >is what allowed us to easily map MAC address to node position and
> >nodename simply by powering on the nodes for PXE boot in the order
> in
> >which they are racked and stacked. Once this was done we had the
> >MAC->Node mapping data we needed to generate the static allocation
> >entries.
> >
> >A workaround for non-predictable allocation is to simply power on
> the
> >cluster in the order in which you want things named, then parse the
> >dhcpd leases file for both the MAC address *and* the timestamp
> >representing the lease handout. That would allow you to map MAC ->
> Node
> >without having to care about hostnames for the first pass MAC
> collection
> >phase.  Then you build the static-by-mac entries into the conf file
> and
> >problem solved. If we stick with ISC DHCP this is a possibility...
> >
> >c
> >
> >Adam S. Moskowitz wrote:
> > > Chris,
> > >
> > >
> > >>We are thinking about trying to find a replacement DHCP server
> that has
> > >>a predictable method of allocating dynamic IP addresses (even if
> only
> > >>for the initial cluster deployment)
> > >
> > >
> > > I think it's a bad idea to rely on such behavior. I don't
> remember what
> > > the RFC says, but in general, unless the RFC guarantees an
> > > implementation should behave a particular way, you are asking for
> > > trouble to rely on specific behavior.
> > >
> > > A great example of this is how round-robin DNS used to work and
> then how
> > > it changed and lots of things broke.
> > >
> > > DHCP isn't meant to do what you're asking it to do, so I strongly
> > > suggest you not use it to solve that particular problem.
> > >
> > > That said, DHCP supports a mechanism for binding specific IP
> addresses
> > > to specific MAC addresses, even though the assignment is still
> done
> > > dynamically. Yes, this is a bit more work, but at least it's
> guaranteed
> > > behavior.
> >_______________________________________________
> >Bioclusters maillist  -  Bioclusters at bioinformatics.org
> >https://bioinformatics.org/mailman/listinfo/bioclusters
> >
> >
> >
> >
>
>______________________________________________________________________
> >The Novartis email address format has changed to
> >firstname.lastname at novartis.com.  Please update your address book
> >accordingly.
>
>______________________________________________________________________
> >-------------- next part --------------
> >An HTML attachment was scrubbed...
> >URL: 
>
>http://bioinformatics.org/pipermail/bioclusters/attachments/20050330/217cb3f0/attachment-0001.htm
> >
> >------------------------------
> >
> >Message: 2
> >Date: Wed, 30 Mar 2005 14:55:55 +0200
> >From: "Lars G. T. Jorgensen" <lars at binf.ku.dk>
> >Subject: Re: [Bioclusters] alternative DHCP implementations?
> >To: "Clustering,        compute farming & distributed computing in
> life
> >         science informatics"    <bioclusters at bioinformatics.org>
> >Message-ID: <424AA1DB.8000702 at binf.ku.dk>
> >Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >
> >Chris Dagdigian wrote:
> >
> > >
> > >
> > > Agreed. It was just a shortcut. We already do allocation of IP
> based
> > > on MAC address but that only works when you know the MAC address
> > > information ahead of time.  This is rare especially on whitebox
> > > cluster projects where people don't put the MAC on the product
> > > packaging or on the chassis itself. Some vendors do a good job of
> > > making the data easy to find and others simply don't bother.
> > >
> > > A dhcp server handing out dynamic-range leases in a predictable
> manner
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the Bioclusters mailing list