[Bioclusters] Is the "OR" job dependency useful??

Tim Cutts tjrc at sanger.ac.uk
Fri Jan 7 05:15:57 EST 2005


On 6 Jan 2005, at 5:49 pm, Malay wrote:

> Rayson Ho wrote:
>> Gridengine currently has the "AND" operator job dependency:
>> A,B -> C
>> ie. we need to wait for job A and B finish before we start job C.
>> There are discussions on the SGE dev mailing list about adding the OR
>> job dependency:
>> A|B -> C
>> So job C will start as soon as job A or job B finishes.
>> I am wondering if this is useful in bioinformatics job flows??
>
> As far as bioinformatics goes I am afraid most of the bioinformatics 
> applications are embarassingly independant :) Although such dependancy 
> resolution issues will have it's niche application but I guess it's 
> very limited as far as bioinformatics goes.

I don't think that's true - when you consider something like a gene 
annotation process, there are lots of dependencies.  Consider what goes 
on with Ensembl; before any analyses are performed, the sequences have 
to be dusted and RepeatMasked.  After that raw features such as blast 
hits, ab initio gene predictors and EST alignments can be calculated.  
Once the BLAST hits have been done, genewise alignments can be 
performed (using the BLAST results to narrow down the areas genewise 
needs to analyse). Only once the EST alignments, ab initio predictors 
and genewise are complete can the code be run to combine these into a 
coherent set of gene structures.

Although each of these processes consists of thousands of independent 
jobs, each type of analysis is dependent on the completion of the 
previous ones.

As it happens, all of these dependencies are handled in the Ensembl 
RuleManager rather than by the scheduling system.

They're all AND dependencies as far as I can tell, and I've never 
needed anything other than AND dependencies in by own pipelines, but I 
wouldn't like to claim that OR dependencies aren't useful to someone.

Tim

-- 
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233



More information about the Bioclusters mailing list