[Bioclusters] Daemonizing blast, ie running many sequences through 1 process

bioclusters@bioinformatics.org bioclusters@bioinformatics.org
Fri, 7 Nov 2003 17:07:36 +1100


We have a problem with 66 nodes becoming NFS bound
 when blasting many (>10,000) sequences
 against the same database set.

One approach (which we are trying) is to cache database files locally,
 so nodes can re-read their files without bottlenecking on NFS.

A totally different approach, with even better performance potential,
 would be if a blast process could start up, load its database(s)
 and process multiple queries until told to exit.

This dilutes the startup cost across all the jobs to be run on that node.

Does NCBI blast do this?
Is there a blast that does?
Anyone interested in writing one?
What's involved?

Thanks for any pointers,
michaelj

-- 
Michael James				michael.james@csiro.au
System Administrator			voice:	02 6246 5040
CSIRO Bioinformatics Facility	fax:		02 6246 5166