{ bias alert: I get paid to work on both LSF and SGE and I was once paid by SunEd to work on their N1GE training materials. I am a wannabe Grid Engine developer. My company does lots of custom integration and training work with both products. Neither LSF nor the SGE people tend to be totally happy when I write these sorts of emails so take everything I say with the proper amount of caution and always do your own testing, research and due diligence ... } I use LSF and SGE both in my work and have done so for many years now- my specific choice depends on the project, the workflow and the end-user requirements. Both LSF and SGE are excellent for typical life science use cases and workflows. So good that they are the only 2 products that I'll use professionally. It's not worth dealing with the 2nd tier products any more when SGE and LSF do what I need exceptionally well. If forced to give an explanation of the differences between the two I'd have to say this ... Platform LSF has the "best" product still among any and all competition. It is hands down the best overall product if you survey all the competition with an eye towards advanced features and functions. *However* things are at a point now in 2005 where the "stuff" that gives LSF the edge has nothing to do with the core product (batch scheduling & policy based resource allocation within a compute farm or cluster...). When it comes to the core work of distributing jobs and doing resource allocation among a distributed set of heterogeneous hardware resources then ALL the current products do a good/excellent job (SGE in particular but also Torque/PBSPro etc. etc.) In fact I tend to assume that I'll be using SGE on any new project with LSF held in the background as an option should the project demands dictate it. So if you are looking for a good cluster resource allocation mechanism then SGE and LSF compare very very favorably. SGE is improving at an incredibly rapid rate and it just keeps getting better and better. When you compare on price then SGE is the hands- down winner since the open source product is free to download and use. I love SGE and am trying to be a better and more frequent contributor to the user community. I'm not sure what Sun charges for the official N1GE 6 product and have no idea how that compares to Platform's LSF pricing. The 'stuff' that tends to tip the evaluation equation over towards the LSF camp are generally layered features and advanced capabilities that people are willing to pay for once they realize they actually need them. These features are not *core* things that everyone needs to care about or use. Things like: - Platform LSF ships "for free" a web portal interface that provides both end-user and LSF-admin functions. Last time I used it, they were running it as a java/tomcat application server but this may have changed. The web-portal you get with LSF is far better than any open source or commercial web front end for Grid Engine. - Platform LSF has exposed APIs for java, C and webservices programmers who want to write cluster-aware code and workflows. It is a bit harder to dig ones claws into the SGE internals (despite having the source) and the DRMAA stuff is still under heavy development - Platform LSF ships with layered features that supply things that one would typically configure (and support) personally within Grid Engine. Doing this within grid engine is certainly possible but requires a certain level of SGE expertise and comfort. These include things like (a) tight integration with parallel environments (MPICH etc.) and high-speed low latency interconnects like Myrinet and Infiniband, (b) tight integration with FlexLM license servers. The layered products cost extra money but Platform LSF will formally support them and "make them work" which can be important in some enterprise environments. There are many Platform layered products that add extra features/functions to the core or base LSF product. This is probably the main differentiator between LSF and SGE. - Platform LSF currently has a better reliability/resiliency/fail- over framework than Grid Engine which is still in the midst of sorting out it's transition to berkeley-db based spooling mechanisms. In an LSF cluster the nodes will automatically "elect" a new master should the current master go down. In SGE you have to configure qmaster failover hosts and live with some fairly significant filesystem and RPC server constraints should you want to have fail- over while using berkley spooling. If you skip berkeley spooling you can use "classic" spooling and achieve simple failover between master hosts that share a NFS filesystem. To be fair though, the "reliability" risk with SGE has more to do with the reliability of the hardware you use on the qmaster host, as the actual SGE software is pretty darn solid and robust. Neither LSF nor SGE crash on me so my "failover" efforts concentrate more on making sure the Linux/ Solaris/Apple OS X server is reliable/available. - If you want a WAN-scale "real grid" deployment, Platform will happily sell you (and support) the LSF Multicluster product. To do this with SGE you'd have to hire consultants or otherwise follow the footsteps of the groups that are seriously doing hardcore SGE/Globus integration. It is not trivial and not something I'd recommend for new SGE admins or users. SGE is fantastic within a LAN or subnet but things get really complicated as soon as you bring on other grids, firewalls and remote network links. - Platform used to have much better documentation but that has changed. The SGE 6.x documentation collection is actually very good now. - Platform provides official support on the widest variety of OS platforms. If they sell it for a platform, they support it on that platform. Sun may only "officially" support the use of their N1GE version on Solaris/Linux/Windows. If you need support for SGE on your OS X system or your SGI Altix box then you need to either do it yourself or hire one of the third party people/companies that specialize in this. The SGE mailing list is a fantastic first-pass resource and there are several companies that can contract SGE support for you on any platform you can think of. * Warning: I may be wrong about the scope of Sun's official N1GE support... - Configuring and managing Platform LSF feels to me as if it requires "less work" than a similar Grid Engine setup. Advanced SGE administration and configuration is still relatively undocumented and even though I've been using it seriously for years now I still learn new things every week from the masters who converse on the sge-users mailing list. Many of the techniques and tips they talk about on the mailing list have never been formally documented or written about except perhaps as a basic HOWTO or a simple mailing list thread. Someone still needs to write the "Advanced Grid Engine Administration & Tuning" book. On the plus side, as someone who does SGE support, training and integration I tend to get some interesting work out of this discrepancy! etc etc. The main point I'm trying to make is that now in 2005, SGE is hands- down a serious and equal competitor to LSF. The main reason one would choose LSF tends to be for the extra layered products and features that your organization may need that SGE either can't provide in commercial/supported form or that you yourself no longer want to be personally responsible for managing and maintaining. If SGE works fine for you then there is no real cause to switch over unless during your eval you learn about some layered feature that you decide you can't live without any more. Also LSF and SGE can coexist on the same cluster if you want to run them both side-by-side for a while. -Chris On Aug 16, 2005, at 8:19 AM, Richard Wonka wrote: > > Hi list, > > I am currently running sybyl, glide, flexX/flexS, FeatureTrees, > LigPrep and moe using the SGE on a couple of dedicated machines > and some workstations during the off_hours. (all of which are > running debian sarge) > > Now Platform wants me to testdrive LSF and I'm wondering if I > should put the extra work into a test setup. > > I feel that SGE works well for my needs, but then, maybe LSF has > some major advantage that I'm not aware of? > > So: > > * What are the major differences between the two and why might I > want to use LSF instead of SGE? > * Have any of You experience with either or both systems? > * If so, What are they and what are You using now? > > with Greetings, > > Richard > ______________________________________________________________________ > ___ > Mit der Gruppen-SMS von WEB.DE FreeMail können Sie eine SMS an alle > Freunde gleichzeitig schicken: http://freemail.web.de/features/? > mc=021179 > > > > _______________________________________________ > Bioclusters maillist - Bioclusters at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters >