Hrishikesh Deshmukh wrote: > While this debate is on, could somebody answer my question: > Say i have thousnds of sequences in my input file and want to run > mpiBLAST, will mpiBLAST split sequences and allot them to nodes and > then get back results to say one file? Yes > Would it help if say DB (Human) > is installed on everymachine on the node? No, mpiblast will handle this for you. > > Thanks, > Hrishi > > > On Thu, 03 Feb 2005 14:05:32 -0500, Joe Landman > <landman at scalableinformatics.com> wrote: > >>You rang .... :) >> >>Brodie, Kent wrote: >> >>>Q: can someone point me to the results obtained by Joe Landman? (web >>>site, or..?) >>> >>>Many thanks, -- Kent C. Brodie, Medical College of Wisconsin >>> >>> >>> >>> >>> >>>>-----Original Message----- >>>>From: bioclusters-bounces+brodie=mcw.edu at bioinformatics.org >>>>[mailto:bioclusters-bounces+brodie=mcw.edu at bioinformatics.org] On >>> >>>Behalf >>> >>> >>>>Of Chris Dagdigian >>>>Sent: Thursday, February 03, 2005 12:28 PM >>>>To: Hrishikesh Deshmukh; Clustering, compute farming & distributed >>>>computing in life science informatics >>>>Subject: Re: [Bioclusters] Questions on mpiBLAST >>>> >>>> >>>>"parallelizing" blast across cluster nodes only results in significant >>>>speed gains if you are trying to solve a large problem set or have a >>>>massive target database that in no way shape or form can squeeze into >>>>physical memory on one node. >>>> >>>>The performance of BLAST is rate-limited first by how much RAM you >>> >>>have >>> >>> >>>>and then by how fast your disk I/O system is. >>>> >>>>I think Joe Landman has also seen incredible variations in blast >>>>performance by experimenting with non-GNU architecture optimized >>>>compilers like those from IBM, Intel and the Portland Group. >>>> >>>>16 machines with 2Gb of RAM reading database files off of ethernet >>> >>>based >>> >>> >>>>NFS is a "normal" compute farm config. >>>> >>>>Outside of mpiblast you could be seeing performance lags caused by >>> >>>your >>> >>> >>>>network (if you are reading/writing via NFS or AFP) or by physical >>> >>>memory. >>> >>> >>>>I'm not an expert on mpiblast but hope to start soon a personal >>> >>>project >>> >>> >>>>to integrate it with grid engine mostly to satisfy my own curiosity. >>>> >>>>I agree with what Hrishikesh about your times -- you are searching >>> >>>with >>> >>> >>>>a very small query set and you did not mention your target database. >>>> >>>>You may see better performance using one machine -- the first query >>> >>>will >>> >>> >>>>be slow but the other queries will come back faster since most or part >>>>of the target database will still be mmapped or whatever in RAM. >>>> >>>>If you really want to test mpiblast out you need to pick a much larger >>>>query and target DB set. >>>> >>>>-Chris >>>> >>>> >>>> >>>> >>>>Hrishikesh Deshmukh wrote: >>>> >>>> >>>> >>>>>Hi, >>>>>I am no authority on BLAST, i guess you see a linear speedup >>> >>>increase >>> >>> >>>>>only when the problem is huge, for 20 odd sequences mpiblast doesn't >>>>>play, your ncbi blast is good enough! Just curious are the results >>> >>>for >>> >>> >>>>>ncbi and mpiblast for the same dataset (input) match exactly?! >>>>> >>>>>I am tryting to get BLAST and mpiBLAST running on Sun Grid, right >>> >>>now >>> >>> >>>>>BLAST works in serial mode and mpiBLAST is kinds stuck! >>>>> >>>>>Cheers, >>>>>Hrishi >>>>> >>>>> >>>>>On Thu, 03 Feb 2005 11:45:45 -0500, Xiaowu Gai >>> >>><xgai at genome.chop.edu> >>> >>>>wrote: >>>> >>>> >>>>>>Hi Everyone: >>>>>> >>>>>>We have a 16-node Xserve cluster, with 2GB memory on each node and >>> >>>dual >>> >>> >>>>>>processors. I was able to install mpiBLAST on it, along with >>> >>>LAM/MPI. >>> >>> >>>>>>However, the performance that I saw with some test runs has not been >>>> >>>>that >>>> >>>> >>>>>>good and quite confusing. Here is what I did: >>>>>> >>>>>>1.) I formatted the nt database: >>>>>> >>>>>>mpiformatdb -N 16 -i nt >>>>>> >>>>>>2.) I ran the mpiblast on one, two, five, ten, twenty, and more >>>> >>>>sequences >>>> >>>> >>>>>>(about 500bp each) and with the command: >>>>>> >>>>>>time mpirun N mpiblast -p blastn -d nt -i single.fa -o >>> >>>blast_results. >>> >>> >>>>>>Here are the numbers: >>>>>> >>>>>>Single: 1m39.054s >>>>>>Two: 0m11.009s >>>>>>Five: 0m16.021s >>>>>>Ten: 0m46.591s >>>>>>twenty: 3m7.541s >>>>>>.. >>>>>> >>>>>>I am all confused. First of all, the performance is not that >>>> >>>>impressive. >>>> >>>> >>>>>>Secondly, the numbers are very confusing to me. Why is that a >>> >>>single >>> >>> >>>>>>sequence query takes so much more time than a two (BTW, I reran the >>>> >>>>query of >>>> >>>> >>>>>>a single sequence right after the query of two and got similar >>> >>>results)? >>> >>> >>>>And >>>> >>>> >>>>>>query of five takes only 5 seconds more than the query of two and >>> >>>so >>> >>> >>>>on.. >>>> >>>> >>>>>>I am afraid that I have done something wrong and would really >>> >>>appreciate >>> >>> >>>>any >>>> >>>> >>>>>>thoughts. >>>>>> >>>>>>Thanks >>>>>> >>>>>>Xiaowu >>>>>> >>>>>>_______________________________________________ >>>>>>Bioclusters maillist - Bioclusters at bioinformatics.org >>>>>>https://bioinformatics.org/mailman/listinfo/bioclusters >>>>>> >>>>> >>>>>_______________________________________________ >>>>>Bioclusters maillist - Bioclusters at bioinformatics.org >>>>>https://bioinformatics.org/mailman/listinfo/bioclusters >>>> >>>>-- >>>>Chris Dagdigian, <dag at sonsorol.org> >>>>BioTeam - Independent life science IT & informatics consulting >>>>Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193 >>>>PGP KeyID: 83D4310E iChat/AIM: bioteamdag Web: http://bioteam.net >>>>_______________________________________________ >>>>Bioclusters maillist - Bioclusters at bioinformatics.org >>>>https://bioinformatics.org/mailman/listinfo/bioclusters >>> >>>_______________________________________________ >>>Bioclusters maillist - Bioclusters at bioinformatics.org >>>https://bioinformatics.org/mailman/listinfo/bioclusters >> >>-- >>Joseph Landman, Ph.D >>Founder and CEO >>Scalable Informatics LLC, >>email: landman at scalableinformatics.com >>web : http://www.scalableinformatics.com >>phone: +1 734 786 8423 >>fax : +1 734 786 8452 >>cell : +1 734 612 4615 >> >>_______________________________________________ >>Bioclusters maillist - Bioclusters at bioinformatics.org >>https://bioinformatics.org/mailman/listinfo/bioclusters >> > > _______________________________________________ > Bioclusters maillist - Bioclusters at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615