Q: can someone point me to the results obtained by Joe Landman? (web site, or..?) Many thanks, -- Kent C. Brodie, Medical College of Wisconsin > -----Original Message----- > From: bioclusters-bounces+brodie=mcw.edu at bioinformatics.org > [mailto:bioclusters-bounces+brodie=mcw.edu at bioinformatics.org] On Behalf > Of Chris Dagdigian > Sent: Thursday, February 03, 2005 12:28 PM > To: Hrishikesh Deshmukh; Clustering, compute farming & distributed > computing in life science informatics > Subject: Re: [Bioclusters] Questions on mpiBLAST > > > "parallelizing" blast across cluster nodes only results in significant > speed gains if you are trying to solve a large problem set or have a > massive target database that in no way shape or form can squeeze into > physical memory on one node. > > The performance of BLAST is rate-limited first by how much RAM you have > and then by how fast your disk I/O system is. > > I think Joe Landman has also seen incredible variations in blast > performance by experimenting with non-GNU architecture optimized > compilers like those from IBM, Intel and the Portland Group. > > 16 machines with 2Gb of RAM reading database files off of ethernet based > NFS is a "normal" compute farm config. > > Outside of mpiblast you could be seeing performance lags caused by your > network (if you are reading/writing via NFS or AFP) or by physical memory. > > I'm not an expert on mpiblast but hope to start soon a personal project > to integrate it with grid engine mostly to satisfy my own curiosity. > > I agree with what Hrishikesh about your times -- you are searching with > a very small query set and you did not mention your target database. > > You may see better performance using one machine -- the first query will > be slow but the other queries will come back faster since most or part > of the target database will still be mmapped or whatever in RAM. > > If you really want to test mpiblast out you need to pick a much larger > query and target DB set. > > -Chris > > > > > Hrishikesh Deshmukh wrote: > > > Hi, > > I am no authority on BLAST, i guess you see a linear speedup increase > > only when the problem is huge, for 20 odd sequences mpiblast doesn't > > play, your ncbi blast is good enough! Just curious are the results for > > ncbi and mpiblast for the same dataset (input) match exactly?! > > > > I am tryting to get BLAST and mpiBLAST running on Sun Grid, right now > > BLAST works in serial mode and mpiBLAST is kinds stuck! > > > > Cheers, > > Hrishi > > > > > > On Thu, 03 Feb 2005 11:45:45 -0500, Xiaowu Gai <xgai at genome.chop.edu> > wrote: > > > >>Hi Everyone: > >> > >>We have a 16-node Xserve cluster, with 2GB memory on each node and dual > >>processors. I was able to install mpiBLAST on it, along with LAM/MPI. > >>However, the performance that I saw with some test runs has not been > that > >>good and quite confusing. Here is what I did: > >> > >>1.) I formatted the nt database: > >> > >>mpiformatdb -N 16 -i nt > >> > >>2.) I ran the mpiblast on one, two, five, ten, twenty, and more > sequences > >>(about 500bp each) and with the command: > >> > >>time mpirun N mpiblast -p blastn -d nt -i single.fa -o blast_results. > >> > >>Here are the numbers: > >> > >>Single: 1m39.054s > >>Two: 0m11.009s > >>Five: 0m16.021s > >>Ten: 0m46.591s > >>twenty: 3m7.541s > >>.. > >> > >>I am all confused. First of all, the performance is not that > impressive. > >>Secondly, the numbers are very confusing to me. Why is that a single > >>sequence query takes so much more time than a two (BTW, I reran the > query of > >>a single sequence right after the query of two and got similar results)? > And > >>query of five takes only 5 seconds more than the query of two and so > on.. > >> > >>I am afraid that I have done something wrong and would really appreciate > any > >>thoughts. > >> > >>Thanks > >> > >>Xiaowu > >> > >>_______________________________________________ > >>Bioclusters maillist - Bioclusters at bioinformatics.org > >>https://bioinformatics.org/mailman/listinfo/bioclusters > >> > > > > _______________________________________________ > > Bioclusters maillist - Bioclusters at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bioclusters > > -- > Chris Dagdigian, <dag at sonsorol.org> > BioTeam - Independent life science IT & informatics consulting > Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193 > PGP KeyID: 83D4310E iChat/AIM: bioteamdag Web: http://bioteam.net > _______________________________________________ > Bioclusters maillist - Bioclusters at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters