[Bioclusters] how to run BLAST on a linux cluster
Chris Dagdigian
dagdigian@blackstonecomputing.com
Tue, 30 Oct 2001 14:21:48 -0500
It's hard to be helpful with so little information...is your cluster
already built? Will it be only used for BLAST? What result are you going
for: quick response to ad-hoc BLAST queries submitted by researchers or
efficient batch processing of thousands of pipelined searches?
If you are at the very beginning of the BLAST-on-Linux-cluster process
then the first thing you need to understand is that BLAST itself is not
a parallel application. The way that BLAST is generally run on linux
"clusters" is by distributing and running many standalone instances of
the program on multiple hosts/CPUs. The BLAST program falls into a
category that people refer to as "embarassingly parallel".
There are a number of ways to distribute and remotely execute blast on
multple linux nodes. Some people roll their own solution and others use
existing distributed resource management (DRM) software to handle the
task. The most common DRM suites that I have seen in the life sciences
are: OpenPSB, PBSPro, GridEngine & Platform LSF.
OpenPBS and GridEngine are free and open-source. LSF and PBSPro are
commercial. You can get the source to PBSPro if you are a paying
customer. I've spent time with all of them and I still feel that right
now Platform LSF is technically the best DRM system available (due
mostly to its fault tolerance characteristics). _However_ LSF is
amazingly expensive and it would generally be overkill and a waste of
money for a basic blast farm, especially in an academic setting or
proof-of-concept project.
Figuring out how you are going to distribute your blast jobs across the
linux systems is a good way to get started.
Other things to keep in mind: Blast is often I/O bound rather than CPU
bound so you need to pay attention to where the data resides and how the
linux systems gain access to it. Locally caching data to cheap IDE disks
in each node is a great way to avoid NFS related bottlenecks. Besides
tuning your I/O subsystems maxing out on physical memory will also help
quite a bit.
my $0.02
-Chris
jie hu wrote:
> I am trying to learn how to run BLAST on a Linux cluter. Could someone
> give me some advice on where I should start? Is there any free software
> available to do this job? Thank you very much.
>
> Jie