[Bioclusters] Versions of Blast that run on a cluster?

Wed Jan 5 10:54:16 EST 2005

Malay wrote:

> Joe Landman wrote:
> 
>> Daniel.G.Roberts at aventis.com wrote:
>>
>>> Hello All
>>> Can anyone point me to example/FAQ resources on BLAST implemented on 
>>> a Linux Cluster?
>>>
>>
> 
> It depends on your blast job. There are two ways to accelerate:
> 
> If you have thousand of sequences then the best way is to have a 
> dedicated NFS server running of gigabit LAN and each node with ample 
> amount of RAM. Use NCBI blast through a good job scheduler like SGE and 
> throw each sequences as a separte blast job with BLAST database shared 
> over Gigabit LAN. From day to day experience I can say a routine NFS 
> mount can use around ~250 sequences at a time.
> 
> If you have very few sequences and want to just run a single job as fast 
> as possible, the best way is to split the database. mpiBLAST is best 
> known for that or you can even use your own custom script. Remember that 
> it will screw up the BLAST statistics.
> 

Oops I forgot to mention the third option. This is for production 
machine for very high end scaling up and requires ample amount of disc 
space in each node. This is to have each node it's local copy of 
database. And use input spitting through SGE. This the best way to scale 
up to ~1000 jobs at a time. But because of database maintanance issue, 
this method is advisable of for dedicated BLAST farm.

-Malay
mbasu(at)ncbi.nlm.nih.gov