[Bioclusters] Local blast server, beowulf vs mosix

Eric Engelhard bioclusters@bioinformatics.org
Fri, 01 Mar 2002 11:23:54 -0800


I agree with jfreeman that this howto is a good place to start, but you
may not want to bother with RedHat 5.2. I set a personal speed record
building out a small (8 node) cluster last week using RedHat 7.2. I used
the kickstart gui to configure bootdisks for the slave nodes. This is an
embarrassingly parallel blast cluster (NFS, postgres, NCBI blastall,
rexec/rsh, and perl).

Performance hint: What you really want to do with this kind of cluster
is to have a good enough local RAM to refdb ratio to prevent disk I/O
churning. If you can run a whole batch with only an initial read, then
the next bottleneck will be the CPU/BUS speed, which is a fairly high
bar. I haven't challenged the performance on this little cluster, but my
work cluster (18 nodes 2GB RAM/node) cuts through >1500 queries/minute
against nr.

In addition to BLAST, this type of system is also ideal for standalone
InterPro.

I split the reference databases with this (babyperl freebee :-) ):


#!/usr/bin/perl
#
#       refdb_splitter.pl - Splits a ref fasta db into $N gzipped chunks
for distribution to cluster
#
#       Usage: zcat ref_fasta_db(.Z or .gz) | CMGD_splitter.pl
#

$N = 8; # your number of nodes here (or node number itself if you want
to run an iteration of this script on each node... parallelize the
splitter)
$fasta="";
$i=0;
$split = 1;
while ($line =<STDIN>) {
        if (grep (/^>/, $line)){$i++;}
        if ($i == 2){
                if ($split > $N){$split = 1;} # or "if ($split % $N ==
0){" for running at each node
                open (PIPE, "|gzip >>ref_fasta_db_$split.gz");
                print PIPE $fasta;
                close PIPE;
                $fasta = "";
                $split++;
                $i = 1;
		#} #decomment for parallel version
        }
        $fasta = "$fasta"."$line";
}

--
Eric Engelhard - www.cvbig.org - www.sagresdiscovery.com


jfreeman wrote:
> 
> Start Here...
> http://www.beowulf-underground.org/doc_project/BIAA-HOWTO/Beowulf-Installation-and-Administration-HOWTO-5.html
> 
> Once you have a small 2 node master/slave cluster running with the slave
> node running starting through tftpboot you are ready for the next level
> of complexity...
> 
> Danny Navarro wrote:
> >
> > Hi all,
> >
> > I would like to set up a linux cluster with some pcs to run blast
> > searches against EST human database. First I will try to blast locally
> > in the master node but I would like also to make a blast server
> > available to the intranet.
> >
> > I have to learn a lot about linux clusters but now I don't know exactly
> > how to start to do this, shall I use beowulf or mosix or there are other
> > better alternatives? What do you think is the best system for doing that
> > task?
> >
> > Thanks
> >
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > http://bioinformatics.org/mailman/listinfo/bioclusters
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> http://bioinformatics.org/mailman/listinfo/bioclusters