[Bioclusters] Versions of Blast that run on a cluster?

Thu Jan 6 12:16:53 EST 2005

Bernard Li wrote:

>Hi Malay:
>
>Are there any documentations and/or papers which describe such a setup?
>I would assume that there would be general interest in seeing how such a
>setup could be implemented.
>  
>
Hi Bernard,
I am currently working on exactly such a system.

It involves
- transparent input splitting
- fast (blast)-database staging to the compute nodes
- the transparent merge of all results

The system has adapters to torque and sun grid engine. however we had
rather bad experiences with torque and are now happy after we shifted to
sun grid engine.
we intend to publish the whole solution as open-source at a state of
higher maturity.
please email me directly, if you are interested to beta-test in february.

Ralf

>I was thinking, instead of duplicating ALL the available databases to
>the local HD, could some file-staging utlity be used to simply stage the
>database to be BLASTed against?  Obviously the file-staging utlity has
>to work really quick on the cluster for this method to be viable.
>
>Thanks,
>
>Bernard 
>
>  
>
>>-----Original Message-----
>>From: bioclusters-bounces at bioinformatics.org 
>>[mailto:bioclusters-bounces at bioinformatics.org] On Behalf Of Malay
>>Sent: Wednesday, January 05, 2005 10:23
>>To: Clustering, compute farming & distributed computing in 
>>life science informatics
>>Subject: Re: [Bioclusters] Versions of Blast that run on a cluster?
>>
>>Bernard Li wrote:
>>    
>>
>>>Hi Malay:
>>>
>>>
>>>      
>>>
>>>>Oops I forgot to mention the third option. This is for production 
>>>>machine for very high end scaling up and requires ample 
>>>>        
>>>>
>>amount of disc 
>>    
>>
>>>>space in each node. This is to have each node it's local copy of 
>>>>database. And use input spitting through SGE. This the best way to 
>>>>scale up to ~1000 jobs at a time. But because of database 
>>>>        
>>>>
>>maintanance 
>>    
>>
>>>>issue, this method is advisable of for dedicated BLAST farm.
>>>>        
>>>>
>>>You meant 'input splitting' right?  And how would you 
>>>      
>>>
>>accomplish that
>>    
>>
>>>using SGE?  By scripting it in your job script?
>>>
>>>      
>>>
>>I meant submit each sequence as a separate job.
>>
>>There is one more way of doing it. Which is called "pull technique". 
>>Where you store each sequences in a RDBMS. A demon runs on 
>>each node and 
>>pulls the sequence from the RDBMS and runs it against it's own local 
>>BLAST database, stores the result in a accesible place and 
>>marks the job 
>>in RDBMS as "done". A designated node then seek the RDBMS for 
>>job marked 
>>done and pulls the result for the place. This method is the most 
>>efficient of them all, and is used in BLAST server at NCBI.
>>
>>
>>-Malay
>>
>>_______________________________________________
>>Bioclusters maillist  -  Bioclusters at bioinformatics.org
>>https://bioinformatics.org/mailman/listinfo/bioclusters
>>
>>    
>>
>_______________________________________________
>Bioclusters maillist  -  Bioclusters at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
>  
>