[Bioclusters] Versions of Blast that run on a cluster?

Wed Jan 5 13:53:35 EST 2005

Hi Bernard:

  We have done this a few times in previous products (SGI 
GenomeCluster(TM), MSC.LIFE(TM)).  It is not hard to explain, though it 
is hard to get right (splitting is quite tunable, and the overall 
performance of the job depends critically on getting reasonable 
splits).  One of the nicer aspects of MPIBLAST is that you don't have to 
worry about the input splitting.

  Feel free to contact me offline if you want to speak about this more.

Joe

Bernard Li wrote:

>Hi Malay:
>
>Are there any documentations and/or papers which describe such a setup?
>I would assume that there would be general interest in seeing how such a
>setup could be implemented.
>
>I was thinking, instead of duplicating ALL the available databases to
>the local HD, could some file-staging utlity be used to simply stage the
>database to be BLASTed against?  Obviously the file-staging utlity has
>to work really quick on the cluster for this method to be viable.
>
>Thanks,
>
>Bernard 
>
>  
>
>>-----Original Message-----
>>From: bioclusters-bounces at bioinformatics.org 
>>[mailto:bioclusters-bounces at bioinformatics.org] On Behalf Of Malay
>>Sent: Wednesday, January 05, 2005 10:23
>>To: Clustering, compute farming & distributed computing in 
>>life science informatics
>>Subject: Re: [Bioclusters] Versions of Blast that run on a cluster?
>>
>>Bernard Li wrote:
>>    
>>
>>>Hi Malay:
>>>
>>>
>>>      
>>>
>>>>Oops I forgot to mention the third option. This is for production 
>>>>machine for very high end scaling up and requires ample 
>>>>        
>>>>
>>amount of disc 
>>    
>>
>>>>space in each node. This is to have each node it's local copy of 
>>>>database. And use input spitting through SGE. This the best way to 
>>>>scale up to ~1000 jobs at a time. But because of database 
>>>>        
>>>>
>>maintanance 
>>    
>>
>>>>issue, this method is advisable of for dedicated BLAST farm.
>>>>        
>>>>
>>>You meant 'input splitting' right?  And how would you 
>>>      
>>>
>>accomplish that
>>    
>>
>>>using SGE?  By scripting it in your job script?
>>>
>>>      
>>>
>>I meant submit each sequence as a separate job.
>>
>>There is one more way of doing it. Which is called "pull technique". 
>>Where you store each sequences in a RDBMS. A demon runs on 
>>each node and 
>>pulls the sequence from the RDBMS and runs it against it's own local 
>>BLAST database, stores the result in a accesible place and 
>>marks the job 
>>in RDBMS as "done". A designated node then seek the RDBMS for 
>>job marked 
>>done and pulls the result for the place. This method is the most 
>>efficient of them all, and is used in BLAST server at NCBI.
>>
>>
>>-Malay
>>
>>_______________________________________________
>>Bioclusters maillist  -  Bioclusters at bioinformatics.org
>>https://bioinformatics.org/mailman/listinfo/bioclusters
>>
>>    
>>
>_______________________________________________
>Bioclusters maillist  -  Bioclusters at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
>  
>

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 612 4615