[Bioclusters] Apple/Genentech's new version of Blast

chris dagdigian bioclusters@bioinformatics.org
Fri, 22 Nov 2002 09:44:19 -0500


bioinfo wrote:

> Hello Everyone,
>     Has anyone had a chance to take a look at Apple/Genentech's new 
> version of Blast?  Are the performance gains a great as they say? 


We are in the middle of doing a bunch of benchmark runs on Linux/Intel 
cluster nodes and Apple Xserves using both 'normal NCBI blast' and the 
altivec-enhanced 'agblast' plus the new version of altivec-enhanced 
HMMER that I mentioned on this list a while back (alitivec HMMER is 
still not properly on our website yet so email me if anyone wants the 
source code or a pre-packaged installer).  The work is being done for a 
client who is about to make a big cluster purchase decision and we are 
trying to secure permission to make the figures public after we deliver 
them.

To set the record straight, we (bioteam) did not do anything with the 
AG-BLAST project -- Our involvement comes about because Bill Van Etten 
from our group was the person who did the original port of the 
ncbi-blast codebase so that it would work on MacOS X. He gave the 
patches to NCBI which incorporated them into the codebase. Using that 
code Apple and Genentech were able to add in the altivec-optimizations.

To answer some of Mark's questions

1. Yep the performance gains are real for blastn. Especially cool is the 
ability to change wordsizes without getting unreasonable performance in 
return. It is very nice.

2. People still need to benchmark their typical 'use cases' on the 
xserve -- Some people find them perfect for what they want to do and 
others don't quite see amazing results. I remember hearing from Brian 
Gillman at the Whitehead that in some of his testing he found that 
altivec-blastn was not all that much faster for his specific needs.

3. One thing to remember is Xserve has a physical RAM limitation of 2GB 
and this may hurt people who need to search very large DBs all the time. 
I don't consider this bad from a blast farming perspective as 2GB is 
what I'd put in a linux cluster node anyway but it is something to keep 
in mind.

Elia also mentioned that 'blastn does not work as well on 2-CPUs" and 
I'd have to echo that thought from my experiences in years past at 
Genetics Institute. I _always_ got better blast throughput by 
constraining the blast search to run only on a single CPU.   Most of the 
people I know who do blast farms are doing the same thing I  believe -- 
they constrain blast to run only on a single CPU and compensate for 
throughput by loading up a dual-CPU machine with 2 searches at a time.

-Chris
-- 

Chris Dagdigian, <dag@sonsorol.org>
Bioteam Inc. - Independent Bio-IT & Informatics consulting
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net