[Bioclusters] SGE and local output

Joe Landman bioclusters@bioinformatics.org
15 May 2002 14:02:37 -0400


  I will need to refer you to an SGE expert for this.  These are SGE
specific questions, and I dont know it well enough to comment.


On Wed, 2002-05-15 at 13:34, Ivo Grosse wrote:
> Hi Joe and others,
> in our case of running 30,000 Blast jobs on a 100-CPU cluster you 
> recommended to not write the output directly to the central file 
> server, but to write the output to the local node, and to collect the 
> output in the end in a non-random manner, in order to avoid NFS server 
> hickups and the like.
> I love that idea, but people from Germany have the strange habit of 
> always trying to think of the worst possible scenario before accepting 
> a new idea, so here comes a set of German questions:
> Assume one slave node (A) dies.  I suppose that SGE will restart the 
> non-finished jobs X from node A on a new node B.
> Question 1: Is that correect?
> Assume the dead node (A) comes back to life at some point.
> Question 2: Is SGE smart enough to notice that jobs X that were started 
> before node A went down have been restarted on node B, and is SGE smart 
> enough to remove the old (and useless) output of jobs X on node A?
> Question 3: Alternatively, can SGE be told to try to restart jobs X on 
> node A after that node is back to life?  How?
> Question 4: If the answer to Q4 is yes, can SGE restart jobs X at the 
> point where they stopped, or does SGE always restart jobs from the 
> beginning?  I mean: does SGE support checkpointing?  How?
> Best regards, Ivo
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> http://bioinformatics.org/mailman/listinfo/bioclusters
Joe Landman,
email: landman@scientificappliance.com
web  : http://scientificappliance.com