[Bioclusters] Versions of Blast that run on a cluster?

Malay mbasu at mail.nih.gov
Wed Jan 5 13:23:25 EST 2005

Bernard Li wrote:
> Hi Malay:
>>Oops I forgot to mention the third option. This is for 
>>production machine for very high end scaling up and requires 
>>ample amount of disc space in each node. This is to have each 
>>node it's local copy of database. And use input spitting 
>>through SGE. This the best way to scale up to ~1000 jobs at a 
>>time. But because of database maintanance issue, this method 
>>is advisable of for dedicated BLAST farm.
> You meant 'input splitting' right?  And how would you accomplish that
> using SGE?  By scripting it in your job script?

I meant submit each sequence as a separate job.

There is one more way of doing it. Which is called "pull technique". 
Where you store each sequences in a RDBMS. A demon runs on each node and 
pulls the sequence from the RDBMS and runs it against it's own local 
BLAST database, stores the result in a accesible place and marks the job 
in RDBMS as "done". A designated node then seek the RDBMS for job marked 
done and pulls the result for the place. This method is the most 
efficient of them all, and is used in BLAST server at NCBI.


More information about the Bioclusters mailing list