Pull-based job scheduling (was: [Bioclusters] Best ways to ta ckle migration to dedicated cluster/Farm)

Fri, 26 Mar 2004 14:30:04 -0800

I have also successfully implemented a RDBMS-based pull system that worked
very well.  I did this at Perlegen as part of their SNP Discovery project
where we processed over 100 TB of microarray imagery in a bit over a year.  
The implementation there was for a ~60 node Windows compute cluster.  I used
Oracle as the "scheduling" engine, and employed a thin "pull/dispatch"
process on each client.  This thin pull/dispatch process was also deployable
on Windows desktops to provide cycle stealing during off-hours.

We were able to monitor/schedule 10s of thousands of tasks per day using
this model; I believe the system could have scaled to many 100s of 1000s,
and would have supported a cluster of hundreds of nodes.

From my perspective, a database-centric pull model has a number of benefits,
including:

* the fact that SQL is used as the task dispatching and reporting protocol
(cross-platform portable)
* immediate access (via excel and query reporting tools) to scheduling and
execution information for exception handling, utilization reporting,
trending, etc.
* it is easy to record job state simply with a quick DB transaction, and
thus simple to track job progress and identify hung or dead jobs
* the matchmaking policy can be implemented with stored procedure logic that
can be augmented to include things like operational priorities, which can
all be table-driven

Bruce

Bruce Moxon
Chief Solutions Architect, Panasas Inc.
Delivering the premier storage system for scalable Linux clusters

www.panasas.com
bmoxon@panasas.com
510-608-7778