[Bioclusters] how to run BLAST on a linux cluster

jie hu h_jie@hotmail.com
Tue, 30 Oct 2001 22:16:38


Thank you very much for your info.  We are considering to buy this Linux 
cluster with 30-40 processors.  It will run other applications in addition 
to BLAST.  For BLAST part, it will be used for batch job. For example, we 
want our tens of thousands of sequences to be blasted every week.  So 
basically, I am trying to get some program that will allow me to run this 
batch BLAST on the Linux cluster.  I heard there are such programs freely 
available and I only need to write some shell scripts or perl to get it 


>From: Chris Dagdigian <dagdigian@blackstonecomputing.com>
>Reply-To: dagdigian@blackstonecomputing.com
>To: bioclusters@bioinformatics.org, h_jie@hotmail.com
>Subject: Re: [Bioclusters] how to run BLAST on a linux cluster
>Date: Tue, 30 Oct 2001 14:21:48 -0500
>It's hard to be helpful with so little information...is your cluster
>already built? Will it be only used for BLAST? What result are you going
>for: quick response to ad-hoc BLAST queries submitted by researchers or
>efficient batch processing of thousands of pipelined searches?
>If you are at the very beginning of the BLAST-on-Linux-cluster process
>then the first thing you need to understand is that BLAST itself is not
>a parallel application. The way that BLAST is generally run on linux
>"clusters" is by distributing and running many standalone instances of
>the program on multiple hosts/CPUs. The BLAST program falls into a
>category that people refer to as "embarassingly parallel".
>There are a number of ways to distribute and remotely execute blast on
>multple linux nodes. Some people roll their own solution and others use
>existing distributed resource management (DRM) software to handle the
>task. The most common DRM suites that I have seen in the life sciences
>are: OpenPSB, PBSPro, GridEngine & Platform LSF.
>OpenPBS and GridEngine are free and open-source. LSF and PBSPro are
>commercial. You can get the source to PBSPro if you are a paying
>customer. I've spent time with all of them and I still feel that right
>now Platform LSF is technically the best DRM system available (due
>mostly to its fault tolerance characteristics). _However_ LSF is
>amazingly expensive and it would generally be overkill and a waste of
>money for a basic blast farm, especially in an academic setting or
>proof-of-concept project.
>Figuring out how you are going to distribute your blast jobs across the
>linux systems is a good way to get started.
>Other things to keep in mind: Blast is often I/O bound rather than CPU
>bound so you need to pay attention to where the data resides and how the
>linux systems gain access to it. Locally caching data to cheap IDE disks
>in each node is a great way to avoid NFS related bottlenecks. Besides
>tuning your I/O subsystems maxing out on physical memory will also help
>quite a bit.
>my $0.02
>jie hu wrote:
>>I am trying to learn how to run BLAST on a Linux cluter.  Could someone
>>give me some advice on where I should start?  Is there any free software
>>available to do this job?  Thank you very much.

Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp