[Bioclusters] Apple/Genentech's new version of Blast

Fri, 22 Nov 2002 23:12:01 +0000 (GMT)

I just ran the following experiment on my PowerBook (no, not an Xserve,
but a wimpy 550 MHz). The query was a C. elegans transcript (F44B9.10) and
the database was the C. briggsae genome (build cb25.agp8). The following
table shows the user+system time for various word sizes using default
parameters and the number of database hits in paretheses (WU defaults were
changed just a bit to make the target frequencies like the NCBI defaults).

 W      NCBI         AG          WU
===  ==========  ==========  ==========
 7   80.6 (115)  26.8   (0)  47.1 (796)
 8   56.9 (115)  37.9 (115)  29.6 (792)
 9   50.0 (115)   9.5 (115)  22.6 (772)
10   46.6 (115)   5.5 (115)  19.1 (732)
11    2.9 (114)   2.8 (113)   8.6 (697)
12    2.3 (110)   3.1 (110)   7.7 (653)
15    2.1  (98)   2.1  (98)   5.3 (238)
20    1.4   (7)   1.0  (20)   5.0  (10)
30    1.4   (1)   0.6   (1)   5.4   (1)
40    1.4   (0)   0.5   (0)   5.8   (1)
W=12 WINK=12                  1.8  (41)
W=15 WINK=15                  1.4  (13)

NCBI-BLAST has a huge performance gain at -W 11, yet still has about the
same number of hits. Interesting.

AG-BLAST doesn't actually work at -W 7 and has a nice gain at -W 9. I
consider 9 and 10 to be good choices for cross-species work, and AG-BLAST
is clearly better than NCBI-BLAST in this range. But it's hard to compare
to WU-BLAST because of the different sensitivities. AG-BLAST also shines
at high word sizes.

WU-BLAST is fast when W is low. Large word sizes don't really make it fly,
but WINK does.

-Ian