[Bioclusters] BLAST job time estimates
Chris Dwan
bioclusters@bioinformatics.org
Mon, 7 Jun 2004 11:04:51 -0500
> It looks like I stuck with doing the time prediction because we are
> plugging into an existing cluster with existing rules, much as I would
> like to avoid this issue altogether.... :-)
I find that BLAST run time prediction is pretty consistent (within 5%
or so) based only on query length, provided that you're allowed to run
a set of tests on the exact target in question, on the exact machines
in question. I've got an instrumented version of the EnsEMBL pipeline
which saves runtimes (and queue waits, and all sorts of other goodies)
for later perusal. On an analysis containing 627 contigs from
Medicago truncatula the times for blastn vs NCBI NT on our Xserves
(bins of 10,000bp length) look like this:
mysql> select count(distinct(contig_id)) as num_contigs, floor(length /
10000) as bp, avg(runtime), \
std(runtime), run_queue from contig, input_id_analysis
where input_id = name and analysis_id = 3 \
and run_queue = "CCGB_XSERVE" group by bp, run_queue order
by run_queue, bp;
+-------------+------+--------------+--------------+-------------+
| num_contigs | bp | avg(runtime) | std(runtime) | run_queue |
+-------------+------+--------------+--------------+-------------+
| 3 | 0 | 246.0000 | 6.5320 | CCGB_XSERVE |
| 4 | 1 | 424.7500 | 39.3407 | CCGB_XSERVE |
| 1 | 2 | 803.0000 | 0.0000 | CCGB_XSERVE |
| 3 | 3 | 790.6667 | 65.6523 | CCGB_XSERVE |
| 6 | 4 | 1063.8333 | 64.2117 | CCGB_XSERVE |
| 5 | 5 | 1217.8000 | 90.1341 | CCGB_XSERVE |
| 5 | 6 | 1354.4000 | 65.0372 | CCGB_XSERVE |
| 8 | 7 | 1630.7500 | 70.1334 | CCGB_XSERVE |
| 5 | 8 | 1886.8000 | 70.8220 | CCGB_XSERVE |
| 7 | 9 | 2065.2857 | 99.2928 | CCGB_XSERVE |
| 20 | 10 | 2299.0000 | 99.3700 | CCGB_XSERVE |
| 18 | 11 | 2523.0000 | 125.7763 | CCGB_XSERVE |
| 23 | 12 | 2714.9565 | 157.0911 | CCGB_XSERVE |
| 14 | 13 | 3016.5714 | 81.9658 | CCGB_XSERVE |
| 6 | 14 | 3264.6667 | 64.0356 | CCGB_XSERVE |
| 1 | 16 | 3817.0000 | 0.0000 | CCGB_XSERVE |
+-------------+------+--------------+--------------+-------------+
BLASTX vs Uniref looks like:
mysql> select count(distinct(contig_id)) as num_contigs, floor(length /
10000) as bp, avg(runtime), \
std(runtime), run_queue from contig, input_id_analysis
where input_id = name and analysis_id = 14 \
and run_queue = "CCGB_XSERVE" group by bp, run_queue order
by run_queue, bp;
+-------------+------+--------------+--------------+-------------+
| num_contigs | bp | avg(runtime) | std(runtime) | run_queue |
+-------------+------+--------------+--------------+-------------+
| 4 | 0 | 96.2500 | 18.9126 | CCGB_XSERVE |
| 5 | 1 | 515.2000 | 85.6491 | CCGB_XSERVE |
| 2 | 2 | 810.0000 | 2.0000 | CCGB_XSERVE |
| 2 | 3 | 1326.0000 | 115.0000 | CCGB_XSERVE |
| 3 | 4 | 1931.6667 | 46.6571 | CCGB_XSERVE |
| 5 | 5 | 2712.2000 | 150.0325 | CCGB_XSERVE |
| 3 | 6 | 3104.0000 | 99.5624 | CCGB_XSERVE |
| 6 | 7 | 3799.5000 | 218.6342 | CCGB_XSERVE |
| 3 | 8 | 5052.0000 | 205.0870 | CCGB_XSERVE |
| 7 | 9 | 5697.5714 | 480.6186 | CCGB_XSERVE |
| 7 | 10 | 6887.2857 | 385.0632 | CCGB_XSERVE |
| 19 | 11 | 7707.6316 | 342.8089 | CCGB_XSERVE |
| 18 | 12 | 8812.0000 | 502.8817 | CCGB_XSERVE |
| 10 | 13 | 9638.8000 | 726.1260 | CCGB_XSERVE |
| 6 | 14 | 10457.5000 | 742.9995 | CCGB_XSERVE |
| 2 | 16 | 13521.0000 | 58.0000 | CCGB_XSERVE |
+-------------+------+--------------+--------------+-------------+
-Chris Dwan
The University of Minnesota