[Bioclusters] LSF vs SGE -- follow-up

Ron Chen bioclusters@bioinformatics.org
Sun, 18 May 2003 10:31:54 -0700 (PDT)


The presentation Platform puts on the web:

Performance, Scalability, Robustness

                       LSF 5         Closest
Competitor

Clusters               100+                 1

CPUs                 200000+               300

Jobs                 500000+             ~10000+
(active across clusters)

Fairshare Utilization  ~100%               ~50%

Query Time         20% better than        40% slower
                      LSF 4.2              than LSF 5

Scheduler Usage        4K/job             28K/job


My first question is, how did Platform come up with
the numbers? Did they measure their "Closest
Competitor" in their _own_ lab, with their _own_
engineers?

M$ pays for other companies to do the job. Granted,
Platform is not as rich as M$, but may be they should
provide more detail about the setup?

(And may be the presentations they show on customers
sites are more interesting, but we just don't know!)

I mentioned some of this before:

Clusters: SGE, PBS can use Globus, Silver, and other
meta schedulers. Silver can scale over 100 clusters
too.

CPUs: a university runs SGE on a 1300 hosts cluster,
and 
the FSL runs SGE on their 768 node, 1536 CPU cluster.

Scheduler usage: from my measurement, SGE schedd only
uses 4.5 K/job, which is around 9K/job peak.

Jobs (active across clusters):
LSF:  500000+ (and they said they have 100+ clusters)

So, SGE with meta-scheduler can get:
     ~10000+ * 100 = ~1,000,000+

Compare to LSF 5:
                      500,000+

What has happened?
May be I am an idiot, Chris :) Please help me to the
math, thanks.

I don't have a large cluster to do the other
measurements, but it would be intesting to find out
how Platform got the numbers.

 -Ron

--- chris dagdigian <dag@sonsorol.org> wrote:
> 
> Your advocacy efforts have the potential to do the
> same harm to Grid 
> Engine as the rabid "if it's not GNU/Linux it's
> shit!" crowd.  You 
> obviously know what you are talking about and your
> effort is really 
> important but in recent times some sort of other
> personal agenda or 
> vendetta has been creeping into your posts and
> tainting your message.
> 
> Platform Computing is not "the enemy".
> 
> Commercial software is not "the enemy".
> 
> Choosing a DRM layer is one of the single most
> important decisions made 
> for any given clustering project. The decision
> should be made after 
> carefully evaluating the options and choosing the
> one that works best 
> for the project, budget, use cases and situation. As
> I said in a 
> previous post I use both SGE and LSF in my daily
> work.
> 
> More comments below...
> 
> Ron Chen wrote:
> > IMO, Platform is not that friendly. They like to
> > spread FUD about other products, e.g. They said
> that
> > PBS can't handle over 5000 jobs, and someone told
> me
> > that they said similar things about SGE.
> > 
> 
> Platform techies, engineers, developers and support
> staff have always 
> been friendly and professional. I've had developers
> in Beijing and 
> Toronto go _way_ out of their way to help me out
> whenever I've needed a 
> hand. Same with the SGE developers and people on the
> sge mailing lists. 
> Techies in general are Good People.
> 
> Platform salespeople on the other hand have
> sometimes been arrogant and 
> complacent. For a long time they had a lock on what
> was unquestionably 
> the best product in the space and they charged and
> acted accordingly.
> 
> This has changed now that there is way more
> competition especially in 
> the Linux clustering space. The salespeople are
> hungry and responsive 
> now. Trust me :)
> 
> Platform sales people have never been as aggressive
> or as creepy as EMC 
> salespeople though. Wow those dudes were like a
> cult.
> 
> Regarding FUD
> 
> Much of the FUD comes from salespeople who have been
> given competitive 
> briefing information on other products that is
> usually not up-to-date.
> 
> OpenPBS _did_ have problems handling more than 5,000
> jobs at a time
> 
> OpenPBS _did_ have documented problems with
> submitted jobs just 
> *vanishing* from the system -- a critical flaw in my
> mind.
> 
> GridEngine did have its growing pains; even in the
> biocluster space
> 
> The real problem (I think) was that by the time this
> news filtered out 
> from the community and into the hands of the
> Platform salespeople the 
> initial bugs/issues had been fixed.
> 
> So the problem really with the salespeople is that
> they know their own 
> product in its current incarnation extremely well
> but have been briefed 
> on "old" products put out by the competition. This
> makes them look bad 
> when talking to super-educated customers.
> 
> The best way to fight FUD is with the truth. If
> anyone has recent FUD 
> stories to share from a recent Platform sales pitch
> I'd love to hear 
> about it in this forum.
> 
> 
> > (In fact, since LSF is not free, they should
> compare
> > LSF with PBSPro. And also, I started using SGE
> from
> > the very beginning, and I know that even early
> > versions of SGE can handle many more jobs than
> that!)
> 
> The reason everyone has to compare/contrast against
> OpenPBS is that 
> every single lame-ass wannabe cluster vendor seems
> to have just lazily 
> stuck a OpenPBS RPM into their cluster image and
> started calling it a 
> "turnkey solution".
> 
> SGE has also had its growing pains and bad
> deployment stories which is 
> where I'm sure some of the SGE FUD is coming from. I
> do think however 
> that SGE is improving and adding new functionality
> at a totally amazing 
> rate.
> 
> > 
> >
>
http://www.beowulf.org/pipermail/beowulf/2001-October/001486.html
> >
>
http://bioinformatics.org/pipermail/bioclusters/2001-October/000035.html
> > 
> 
> > Also, both PBS and SGE follow DRMAA, but Platform
> > pushes NPI -- the LSF API as the "standard".
> > 
> 
> Both DRMAA and NPI seem to be coming from the Global
> Grid Forum (GGF) 
> these days. I know Platform is involved at some
> level with GGF.
> 
> Anyone from Platform or GGF who can chime in on what
> the current 
> situation is? What's up with NPI vs DRMAA?
> 
> 
> > Do you think we should support these kind of
> > companies?
> > 
> >  -Ron
> 
> Platform is not evil. Selling & supporting good
> software is not wrong.
> 
> 
=== message truncated ===


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com