[Bioclusters] BLAST job time estimates

Tim Cutts bioclusters@bioinformatics.org
Tue, 1 Jun 2004 14:00:07 +0100

On 1 Jun 2004, at 1:33 pm, Micha Bayer wrote:

> Hi Tim,
> thanks for that. Can you just clarify what n and m are in your response
> below?

For a given pair of sequences being aligned, n & m are the lengths of 
the two sequences.  So in the case of your blast search, you need to 
know the lengths of the largest query sequence and the largest target 

> It looks like I stuck with doing the time prediction because we are
> plugging into an existing cluster with existing rules, much as I would
> like to avoid this issue altogether.... :-)

All I can suggest then is an iterative procedure - submit jobs with a 
very conservative estimate of CPU time.  They'll get low priority, but 
that's better than them being killed because they've been running too 
long.  Then reduce the requirement when you've got a feel for the real 
requirements of the job.


PS.  I wish the powers that be would let me be as draconian with our 
cluster as your guys are.  It would solve a whole heap of trouble.  :-)

Dr Tim Cutts
Informatics Systems Group
Wellcome Trust Sanger Institute
Hinxton, Cambridge, CB10 1SA, UK