[Bioclusters] Questions on mpiBLAST

Thu Feb 3 13:55:18 EST 2005

Q: can someone point me to the results obtained by Joe Landman?  (web
site, or..?)

Many thanks,  -- Kent C. Brodie, Medical College of Wisconsin

> -----Original Message-----
> From: bioclusters-bounces+brodie=mcw.edu at bioinformatics.org
> [mailto:bioclusters-bounces+brodie=mcw.edu at bioinformatics.org] On
Behalf
> Of Chris Dagdigian
> Sent: Thursday, February 03, 2005 12:28 PM
> To: Hrishikesh Deshmukh; Clustering, compute farming & distributed
> computing in life science informatics
> Subject: Re: [Bioclusters] Questions on mpiBLAST
> 
> 
> "parallelizing" blast across cluster nodes only results in significant
> speed gains if you are trying to solve a large problem set or have a
> massive target database that in no way shape or form can squeeze into
> physical memory on one node.
> 
> The performance of BLAST is rate-limited first by how much RAM you
have
> and then by how fast your disk I/O system is.
> 
> I think Joe Landman has also seen incredible variations in blast
> performance by experimenting with non-GNU architecture optimized
> compilers like those from IBM, Intel and the Portland Group.
> 
> 16 machines with 2Gb of RAM reading database files off of ethernet
based
> NFS is a "normal" compute farm config.
> 
> Outside of mpiblast you could be seeing performance lags caused by
your
> network (if you are reading/writing via NFS or AFP) or by physical
memory.
> 
> I'm not an expert on mpiblast but hope to start soon a personal
project
> to integrate it with grid engine mostly to satisfy my own curiosity.
> 
> I agree with what Hrishikesh about your times -- you are searching
with
> a very small query set and you did not mention your target database.
> 
> You may see better performance using one machine -- the first query
will
> be slow but the other queries will come back faster since most or part
> of the target database will still be mmapped or whatever in RAM.
> 
> If you really want to test mpiblast out you need to pick a much larger
> query and target DB set.
> 
> -Chris
> 
> 
> 
> 
> Hrishikesh Deshmukh wrote:
> 
> > Hi,
> > I am no authority on BLAST, i guess you see a linear speedup
increase
> > only when the problem is huge, for 20 odd sequences mpiblast doesn't
> > play, your ncbi blast is good enough! Just curious are the results
for
> > ncbi and mpiblast for the same dataset (input) match exactly?!
> >
> > I am tryting to get BLAST and mpiBLAST running on Sun Grid, right
now
> > BLAST works in serial mode and mpiBLAST is kinds stuck!
> >
> > Cheers,
> > Hrishi
> >
> >
> > On Thu, 03 Feb 2005 11:45:45 -0500, Xiaowu Gai
<xgai at genome.chop.edu>
> wrote:
> >
> >>Hi Everyone:
> >>
> >>We have a 16-node Xserve cluster, with 2GB memory on each node and
dual
> >>processors.  I was able to install mpiBLAST on it, along with
LAM/MPI.
> >>However, the performance that I saw with some test runs has not been
> that
> >>good and quite confusing.  Here is what I did:
> >>
> >>1.) I formatted the nt database:
> >>
> >>mpiformatdb -N 16 -i nt
> >>
> >>2.) I ran the mpiblast on one, two, five, ten, twenty, and more
> sequences
> >>(about 500bp each) and with the command:
> >>
> >>time mpirun N mpiblast -p blastn -d nt -i single.fa -o
blast_results.
> >>
> >>Here are the numbers:
> >>
> >>Single: 1m39.054s
> >>Two: 0m11.009s
> >>Five: 0m16.021s
> >>Ten: 0m46.591s
> >>twenty: 3m7.541s
> >>..
> >>
> >>I am all confused.  First of all, the performance is not that
> impressive.
> >>Secondly, the numbers are very confusing to me.  Why is that a
single
> >>sequence query takes so much more time than a two (BTW, I reran the
> query of
> >>a single sequence right after the query of two and got similar
results)?
> And
> >>query of five takes only 5 seconds more than the query of two and
so
> on..
> >>
> >>I am afraid that I have done something wrong and would really
appreciate
> any
> >>thoughts.
> >>
> >>Thanks
> >>
> >>Xiaowu
> >>
> >>_______________________________________________
> >>Bioclusters maillist  -  Bioclusters at bioinformatics.org
> >>https://bioinformatics.org/mailman/listinfo/bioclusters
> >>
> >
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
> 
> --
> Chris Dagdigian, <dag at sonsorol.org>
> BioTeam  - Independent life science IT & informatics consulting
> Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193
> PGP KeyID: 83D4310E iChat/AIM: bioteamdag  Web: http://bioteam.net
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters