[Bioclusters] BLAST Memory Benchmarks revisited

Aaron Darling darling at cs.wisc.edu
Wed Dec 1 17:40:32 EST 2004


Juan Carlos Perin wrote:

>Second, to accommodate this memory restriction, and perhaps test this on our
>own, I was considering removing a CPU as to force all the memory slots to be
>allocated to a single CPU.  I am wondering if this would actually work? Or
>if the architecture is actually segregated and each 4 slots is for an
>individual CPU?  
>  
>
I don't know specific details of G5 architecture, but I'd be surprised 
if the removing a CPU would have any effect on the behavior of memory 
allocation as it pertains to BLAST searches.  I'm  fairly certain that 
BLAST uses memory-mapped file I/O on OS X to access the blast databases, 
which means that it's relying on the OS to store the database in memory 
in the file system buffer-cache. Unless OS X has CPU-specific buffer 
caches, which I doubt, then all installed memory gets used to cache your 
blast db regardless of the number of CPUs.
Last I checked, the nt database weighed in at over 3GB.  When blasting 
against nt on a 4GB system the entire DB can be cached by the OS, 
whereas the 2GB system relies on slower disk I/O to swap the database 
into memory as it's needed.
In order to get good performance searching nt, your options seem to be 
(1) put 4GB RAM in each compute node, (2) use query concatenation for 
blastn, or (3) use one of the several BLAST database segmentation 
packages.  Last February I posted a message describing many of these:  
http://bioinformatics.org/pipermail/bioclusters/2004-February/001500.html  
Since then btblastall has become another option.
I help develop the mpiBLAST package.  Since you are running a Mac 
cluster you may be interested to know that another mpiBLAST user reports 
that our most recent version 1.3.0 release candidate works well with the 
Apple/Genentech BLAST optimizations on OS X.

With respect to your request for "hard evidence" that blast runs much 
faster when the database fits in core memory, you may be interested in 
the mpiBLAST clusterworld 2003 paper.  The first figure shows the 
increase in execution time and disk activity during blast searches as 
the database grows larger than core memory.
http://scholar.google.com/scholar?q=mpiblast&ie=UTF-8&oe=UTF-8&hl=en&btnG=Search

Good luck,
-Aaron



More information about the Bioclusters mailing list