[Bioclusters] mpiblast and mpiformatdb

Jason D. Gans bioclusters@bioinformatics.org
Tue, 20 May 2003 15:46:17 -0600


Hello, 

I have uploaded files (in the patches section of http://sourceforge.net/projects/mpiblast/) 
that implement the following changes to mpiformatdb and mpiblast: 

mpiformatdb: 
* Fixed fragment size calculation to increase the size of the "runt" (i.e. last) fragment. 
By giving each worker process approximately the same amount of work to do, load balancing is
improved.
* Changed the definition of the command line argument "-N X" so that, when possible, X fragments are
created and numbered from [0, ..., X - 1]. This addresses a common complaint on the bioclusters
list.
* Added a command line argument "--decomp" that will display all possible fragment decompositions
and suggest decompositions that maximize the relative size of the runt fragment (to improve load
balancing). 
Due to the blast requirement restricting fragment sizes to whole integer multiples of a megabyte,
not
all fragment decompositions are (a) allowed and (b) equally efficient.
* Changed the return value to be the actual number of fragments created (useful for automation).

mpiblast: 
* Improved memory management.
* Added a new function, sendResults(), that returns the blast results from a worker to the master
using less memory than sendFile(). This allows the use of larger queries (i.e. querying the entire
y. pestis genome against the nr database can now be completed on a master node with 2 GB of ram
memory. Before this
change, it could not be completed without exhausting all available memory). Rewrote the
receiveResults() function to work with the new sendResults() function. 

Comments and feedback are welcome.

Regards, 

Jason Gans 

Biosciences Division, B-1
Los Alamos National Laboratory