> Excuse my ignorance, I've never used Condor but what's a pull based > system? What would be the advantages over PBS/SGE (which I assume are push > type systems)? In a "push" system, a central scheduler assigns work to the compute nodes. Nodes sit idle until work is assigned to them. In a "pull" system, each node is responsible for deciding when to make itself available for work, for requesting an appropriate job, and for keeping track of the administrative details associated with the job. In my experience, push systems (including all centrally scheduled clusters using PBS, SGE, NQE, LSF, and the like) are easier to debug, since you only have to deal with one scheduler. They are sometimes less efficient, since if the scheduler is too busy to assign work to a node (or doesn't know that the node exists, or is dumb about planning use of resources, or any of a number of other cases) then the node in question may go unused. Pull systems (including SETI@home and friends, as well as Platform's grid offering) are best suited for cycle stealing and ad-hoc clustering. They can be really troublesome to debug, since what gets lost in the case of errors is usually job state, instead of cycles on nodes. Instead of the cluster operating at less than peak efficiency, it loses jobs. This can be frustrating for both users and admins. Condor (at least when I used it a year or so ago) is really a hybrid system. There is a central "matchmaker" which is responsible for pairing "classads" from compute resources with jobs needing to be done. Condor's biggest strength is the thinking that Myron Livney and his group put into the social side of distributed computing. Condor has great facilities for managing the 'owner comfort' problem: Grid computing is great, but only if I get first crack at machines that I own, my competitors never get to use them, and I get to have my machine back the instant I move my mouse. -Chris Dwan