[Bioclusters] free parallel versions of BLAST

Aaron Darling bioclusters@bioinformatics.org
Fri, 27 Feb 2004 16:09:26 -0600 (CST)

On Fri, 27 Feb 2004, Micha Bayer wrote:

> Thanks to Aaron, Jason and Dan for their help, this is very useful.
> On a related note: does anyone know how the NCBI BLAST executable deals
> with the query and the database in terms of memory? I have had a
> discussion with a colleague of mine who claims that BLAST never loads
> the database into memory at all but the query does get loaded into
> memory.

On modern operating systems, BLAST accesses the database using memory
mapped I/O.  In practice, this means that it's up to the operating system
to decide what part and how much of the BLAST database to load into
memory.  Different operating systems have different buffer cache page
replacement policies but they commonly follow some variation on the MFU or
MRU (most frequently used, most recently used) scheme.

The query sequences do get loaded into memory and indexed for the search.

> Is it possible to tell BLAST what to load into memory (availability
> permitting obviously)?

I'm not aware of any direct way to keep particular parts of a database in
the OS' buffer cache, but it could be accomplished indirectly by having a
program periodically read the part of the database to be kept in memory.
If you've got cluster resources, you'd probably be better off using one of
the established db segmentation packages though.