Thanks for the feedback, I am aware of United Devices. They seem to have a very good solution and it is multi-platform too. I am interested in any pain points you have with their or similar technology? Are there features or functionality you would like to see that are missing, specifically geared towards bioinformatics research? If I understand you, you are saying that this is a good example of where grid technology is delivering on the promise. It is pretty clear that when the problem is 'embarrassingly parallel', enterprise grid (cycle stealing) solutions (like UD's) can be an intelligent way to use one's current IT infrastructure investment. I guess what I am looking for is what could be made better. Also, what do you mean by standards folks? Are you talking about Globus Toolkit and its ever evolving architecture (now mainly web-service based)? What do you see as the threat with this? IMHO, MPI is just fine for clusters with good links where security is of no or little concern, however it really isn't made for loosely coupled networks and cycle stealing scenarios. It certainly never has had security in mind. These are the kind of things that, if not baked into the technology, can turn your Grid nodes into a security risk and ultimately a bunch of zombies waiting to be used for a DDOS attack or worse. Once that happens, trying to trust or even keep your Grid could be a tough political battle. On Sun, 9 May 2004, Rayson Ho wrote: > UD: http://www.grid.org/stats/ > > 325,033 years of CPU time collected he he :-) Rayson knows his stuff, as do United Devices. You will not see mpi any where near this 300k+ CPU years. Good point. > BOINC: http://boinc.berkeley.edu This looks great. Classic 'grid hype' this certainly is not. Good stuff. Thanks for sending on the link, I really hope that the standards folk keep the hell away from this. If they do it may have a real chance... Best regards, J. -- James Cuff, D. Phil. Group Leader, Applied Production Systems The Broad Institute. 320 Charles Street, Cambridge, MA. 02141-2023. Tel: 617-252-1925 Fax: 617-258-0903 -----Original Message----- From: bioclusters-admin@bioinformatics.org [mailto:bioclusters-admin@bioinformatics.org] On Behalf Of bioclusters-request@bioinformatics.org Sent: Monday, May 10, 2004 11:01 AM To: bioclusters@bioinformatics.org Subject: Bioclusters digest, Vol 1 #483 - 4 msgs When replying, PLEASE edit your Subject line so it is more specific than "Re: Bioclusters digest, Vol..." And, PLEASE delete any unrelated text from the body. Today's Topics: 1. non linear scale-up issues? (David Gayler) 2. Re: MPI clustalw (Tim Cutts) 3. Re: non linear scale-up issues? (Rayson Ho) 4. Re: non linear scale-up issues? (James Cuff) --__--__-- Message: 1 From: "David Gayler" <dag_project@sbcglobal.net> To: <bioclusters@bioinformatics.org> Date: Sun, 9 May 2004 12:22:59 -0500 Subject: [Bioclusters] non linear scale-up issues? Reply-To: bioclusters@bioinformatics.org Hi, I am a newbie and not a bio guy, but rather a computer science guy. I am interested in why you are seeing these kind of horrible non-linear Scale-out numbers. Please excuse my ignorance, but is the problem that there are dependencies between jobs and this doesn't scale? What exactly is the relationship of this problem to MPI? I just have to wonder if figuring out a better way to divide and conquer has any merit. I am interested in y'alls feedback as my company is working on a Windows .NET based Grid solution and we want to focus on the bioinformatics community. It seems to me that a lot of researchers spend time worrying about getting faster results (understandably), however it doesn't seem like there is much in the way of cycle-stealing grid software solutions that are flexible, secure, and easy to use. I want to know what is missing currently to get these faster results reliably, despite hardware faults, etc. I am aware of Condor (free), DataSynapse, Platform Computing, and others. I am interested in knowing what is, if anything, lacking in these solutions. Thanks in advance. -----Original Message----- From: bioclusters-admin@bioinformatics.org [mailto:bioclusters-admin@bioinformatics.org] On Behalf Of bioclusters-request@bioinformatics.org Sent: Sunday, May 09, 2004 11:01 AM To: bioclusters@bioinformatics.org Subject: Bioclusters digest, Vol 1 #482 - 1 msg When replying, PLEASE edit your Subject line so it is more specific than "Re: Bioclusters digest, Vol..." And, PLEASE delete any unrelated text from the body. Today's Topics: 1. Re: MPI clustalw (Guy Coates) -- __--__-- Message: 1 Date: Sun, 9 May 2004 11:17:16 +0100 (BST) From: Guy Coates <gmpc@sanger.ac.uk> To: bioclusters@bioinformatics.org Subject: [Bioclusters] Re: MPI clustalw Reply-To: bioclusters@bioinformatics.org > example web servers and services where you need rapid response for > single, or small numbers of jobs. We (well, the ensembl-ites) do run a small amount of mpi-clustalw. The algorithm scales OK for small alignment (but they run quickly, so why bother?) but is horrible for large alignments. These are figures for an alignment of a set of 9658 sequences, running on Dual 2.8GHz PIV machines with gigabit. Ncpus Runtime Efficiency ---- ------- ----------- 2 28:21:33 1 4 19:49:05 0.72 8 14:49:02 0.48 10 14:09:41 0.4 16 13:37:36 0.26 24 13:00:30 0.18 32 12:48:39 0.14 48 12:48:39 0.09 64 11:19:40 0.08 96 11:30:09 0.05 128 11:13:28 0.04 However, although the scaling is horrible, it does at least bring the runtime down to something more manageable. MPI clustalw only gets run for the alignments that the single CPU version chokes on. It may not be pretty, but at least you do get an answer, eventually. Horses for courses and all that. > > Guy/Tim - did you ever deploy that HMMer PVM cluster we talked about > for the Pfam web site? > It's on the ever-expanding list of things to do. So, does anyone here have any opinions/experience on the PVM verison of HMMer? Guy -- Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK Tel: +44 (0)1223 834244 ex 7199 -- __--__-- _______________________________________________ Bioclusters maillist - Bioclusters@bioinformatics.org https://bioinformatics.org/mailman/listinfo/bioclusters End of Bioclusters Digest --__--__-- Message: 2 From: Tim Cutts <tjrc@sanger.ac.uk> Subject: Re: [Bioclusters] MPI clustalw Date: Sun, 9 May 2004 19:10:45 -0300 To: bioclusters@bioinformatics.org Reply-To: bioclusters@bioinformatics.org On 6 May 2004, at 10:30 pm, James Cuff wrote: > Guy/Tim - did you ever deploy that HMMer PVM cluster we talked about > for the Pfam web site? That's still a watch this space. The PFAMers are still talking about doing some MPI/PVM stuff. Like you, I am still a sceptic. Having said that, some of the EBI chaps are running MPI clustalw -- not sure which one -- on the IBM blades, and claim to be getting decent performance out of it. That's GBit interconnect, as you know, and we limit them to 10 CPUs (i.e. 5 machines) per job. Haven't got as far as getting it to schedule sensibly with the network topology. LSF has a useful looking sc_topology.so mbschd plugin, but I've not found any docs for it yet. Tim - Dr Tim Cutts Informatics Systems Group Wellcome Trust Sanger Institute Hinxton, Cambridge, CB10 1SA, UK --__--__-- Message: 3 Date: Sun, 9 May 2004 17:00:50 -0700 (PDT) From: Rayson Ho <raysonlogin@yahoo.com> Subject: Re: [Bioclusters] non linear scale-up issues? To: bioclusters@bioinformatics.org Reply-To: bioclusters@bioinformatics.org UD: http://www.grid.org/stats/ 325,033 years of CPU time collected BOINC: http://boinc.berkeley.edu Will be used by the next generation of SETI@home and other research projects. Rayson --- David Gayler <dag_project@sbcglobal.net> wrote: > however it doesn't seem like there is much in the > way of > cycle-stealing grid software solutions that are flexible, secure, and > easy > to use. I want to know what is missing currently to get these faster > results > reliably, despite hardware faults, etc. > > I am aware of Condor (free), DataSynapse, Platform Computing, and > others. I > am interested in knowing what is, if anything, lacking in these > solutions. > > Thanks in advance. > > -----Original Message----- > From: bioclusters-admin@bioinformatics.org > [mailto:bioclusters-admin@bioinformatics.org] On Behalf Of > bioclusters-request@bioinformatics.org > Sent: Sunday, May 09, 2004 11:01 AM > To: bioclusters@bioinformatics.org > Subject: Bioclusters digest, Vol 1 #482 - 1 msg > > When replying, PLEASE edit your Subject line so it is more specific > than "Re: Bioclusters digest, Vol..." And, PLEASE delete any > unrelated text from the body. > > > Today's Topics: > > 1. Re: MPI clustalw (Guy Coates) > > -- __--__-- > > Message: 1 > Date: Sun, 9 May 2004 11:17:16 +0100 (BST) > From: Guy Coates <gmpc@sanger.ac.uk> > To: bioclusters@bioinformatics.org > Subject: [Bioclusters] Re: MPI clustalw > Reply-To: bioclusters@bioinformatics.org > > > example web servers and services where you need rapid response for > > single, or small numbers of jobs. > > We (well, the ensembl-ites) do run a small amount of mpi-clustalw. > The > algorithm scales OK for small alignment (but they run quickly, so why > bother?) but is horrible for large alignments. > > These are figures for an alignment of a set of 9658 sequences, > running on > Dual 2.8GHz PIV machines with gigabit. > > Ncpus Runtime Efficiency > ---- ------- ----------- > 2 28:21:33 1 > 4 19:49:05 0.72 > 8 14:49:02 0.48 > 10 14:09:41 0.4 > 16 13:37:36 0.26 > 24 13:00:30 0.18 > 32 12:48:39 0.14 > 48 12:48:39 0.09 > 64 11:19:40 0.08 > 96 11:30:09 0.05 > 128 11:13:28 0.04 > > However, although the scaling is horrible, it does at least bring the > runtime down to something more manageable. MPI clustalw only gets run > for > the alignments that the single CPU version chokes on. It may not be > pretty, but at least you do get an answer, eventually. Horses for > courses > and all that. > > > > > > Guy/Tim - did you ever deploy that HMMer PVM cluster we talked > about > > for the Pfam web site? > > > > It's on the ever-expanding list of things to do. So, does anyone here > have > any opinions/experience on the PVM verison of HMMer? > > > Guy > -- > Guy Coates, Informatics System Group > The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK > Tel: +44 (0)1223 834244 ex 7199 > > > > > -- __--__-- > > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters > > > End of Bioclusters Digest > > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover --__--__-- Message: 4 Date: Sun, 9 May 2004 22:16:13 -0400 (EDT) From: James Cuff <jcuff@broad.mit.edu> To: bioclusters@bioinformatics.org Subject: Re: [Bioclusters] non linear scale-up issues? Reply-To: bioclusters@bioinformatics.org On Sun, 9 May 2004, Rayson Ho wrote: > UD: http://www.grid.org/stats/ > > 325,033 years of CPU time collected he he :-) Rayson knows his stuff, as do United Devices. You will not see mpi any where near this 300k+ CPU years. Good point. > BOINC: http://boinc.berkeley.edu This looks great. Classic 'grid hype' this certainly is not. Good stuff. Thanks for sending on the link, I really hope that the standards folk keep the hell away from this. If they do it may have a real chance... Best regards, J. -- James Cuff, D. Phil. Group Leader, Applied Production Systems The Broad Institute. 320 Charles Street, Cambridge, MA. 02141-2023. Tel: 617-252-1925 Fax: 617-258-0903 --__--__-- _______________________________________________ Bioclusters maillist - Bioclusters@bioinformatics.org https://bioinformatics.org/mailman/listinfo/bioclusters End of Bioclusters Digest