[Bioclusters] how to run BLAST on a linux cluster

Chris Dagdigian dagdigian@blackstonecomputing.com
Tue, 30 Oct 2001 14:21:48 -0500


It's hard to be helpful with so little information...is your cluster 
already built? Will it be only used for BLAST? What result are you going 
for: quick response to ad-hoc BLAST queries submitted by researchers or 
efficient batch processing of thousands of pipelined searches?

If you are at the very beginning of the BLAST-on-Linux-cluster process 
then the first thing you need to understand is that BLAST itself is not 
a parallel application. The way that BLAST is generally run on linux 
"clusters" is by distributing and running many standalone instances of 
the program on multiple hosts/CPUs. The BLAST program falls into a 
category that people refer to as "embarassingly parallel".

There are a number of ways to distribute and remotely execute blast on 
multple linux nodes. Some people roll their own solution and others use 
existing distributed resource management (DRM) software to handle the 
task. The most common DRM suites that I have seen in the life sciences 
are: OpenPSB, PBSPro, GridEngine & Platform LSF.

OpenPBS and GridEngine are free and open-source. LSF and PBSPro are 
commercial. You can get the source to PBSPro if you are a paying 
customer. I've spent time with all of them and I still feel that right 
now Platform LSF is technically the best DRM system available (due 
mostly to its fault tolerance characteristics). _However_ LSF is 
amazingly expensive and it would generally be overkill and a waste of 
money for a basic blast farm, especially in an academic setting or 
proof-of-concept project.

Figuring out how you are going to distribute your blast jobs across the 
linux systems is a good way to get started.

Other things to keep in mind: Blast is often I/O bound rather than CPU 
bound so you need to pay attention to where the data resides and how the 
linux systems gain access to it. Locally caching data to cheap IDE disks 
in each node is a great way to avoid NFS related bottlenecks. Besides 
tuning your I/O subsystems maxing out on physical memory will also help 
quite a bit.

my $0.02



-Chris



jie hu wrote:

> I am trying to learn how to run BLAST on a Linux cluter.  Could someone 
> give me some advice on where I should start?  Is there any free software 
> available to do this job?  Thank you very much.
> 
> Jie