[Bioclusters] Newbie question: simple low-admin non-threaded Debian-based cluster solution?

Speakman, John H./Epidemiology-Biostatistics speakmaj at MSKCC.ORG
Thu Jan 20 19:04:08 EST 2005

Hi Glen


Thanks - I agree - Rocks looks great - but I agreed with the users not
to consider a non-Debian-based solution unless it simply will not work
any other way...  that'll teach me to make pledges with users.





From: bioclusters-bounces at bioinformatics.org
[mailto:bioclusters-bounces at bioinformatics.org] On Behalf Of Glen Otero
Sent: Thursday, January 20, 2005 6:56 PM
To: Clustering, compute farming & distributed computing in life science
Subject: Re: [Bioclusters] Newbie question: simple low-admin
non-threaded Debian-based cluster solution?


Check out Rocks (http://www.rocksclusters.org). IMHO it is much better
than FAI and SIS. It also includes SGE. 


On Jan 20, 2005, at 3:47 PM, Speakman, John
H./Epidemiology-Biostatistics wrote: 




	If anyone can review the below and suggest a way to go, or even
better something I have gotten completely wrong, it would be much







	Ten HP Proliant nodes, one DL380 and nine DL140.  Each node has
two 3.2Ghz Xeon processors.  They do not have a dedicated switch; the
infrastructure folks say they want to implement this using a VLAN.  We
have some performance concerns here but have agreed to give it a try. 


	User characteristics: 


	The users are biostatisticians who typically program in R; they
often use plug-in R modules like bioconductor.  They always want the
newest version of R right away.  Also they may also write programs in C
or Fortran.  Data files are usually small.  Nothing fancy like BLAST,


	User concerns: 


	Users require a Linux clustering environment which enables them
to interact with the cluster as though it were a single system (via ssh
or X) but which will distribute compute-intensive jobs across nodes.  As
the code is by and large not multithreaded, it is expected that each job
will be farmed out to an idle compute node and probably stay there until
it is done.   That's fine.  In other words, to use all twenty CPUs we
will need twenty concurrent jobs. 


	Administration concerns: 


	The cluster must require the absolute minimum of configuration
and maintenance, because I've got to do it and I'm hardly ever around
these days. 


	Other concerns: 


	Users and administrators alike have a preference for Debian
Linux over other distributions.  Users also have an aversion to non-free
software.  Either or both of these considerations could be overridden if
the reasons were pressing. 


	Cluster software requirements: 




	The cluster must have a mean of deploying Linux to the nodes and
keeping their configurations (including updates to the operating system
and applications, lists of users, printers, etc.) in synchronization. 



	The cluster must have a means of transparently distributing jobs
to idle CPUs.  It's not necessarily to actively rebalance this when a
job has started - it's okay if, once tied to a node, it stays there. 


	Potential solutions: 


	We like the look of NPACI Rocks but its non-Debian-ness makes it
a last resort only.  What we would really like to try is a Debian
version of NPACI Rocks; in its absence we will probably have to use two
separate packages to fulfil the requirements of #1 and #2 above. 


	Sensible options for #1 seem to be: 



	SystemImager (www.systemimager.org) 



	FAI (http://www.informatik.uni-koeln.de/fai/), maybe also
involving the use of cfengine2 (http://www.iu.hio.no/cfengine/) 


	SystemImager is the better-established product and looks to be
simpler to set up than FAI and/or cfengine2, in both of which the
learning curve looks steep.  However, FAI seems more elegant and more
like the idea of "NPACI Rocks Debian" that we're looking for, implying
that once set up FAI/cfengine2 will require less ongoing maintenance. 


	Sensible options for #2 seem to be: 










	Sun GridEngine N1 


	Note: all of the above have commercial versions; we'd be
reluctant to consider them unless it means big savings in administration
time and effort.  We get the impression OpenMosix (and, to a lesser
extent, OpenPBS) have question marks over how much time and resources
the people maintaining these products have, suggesting bugs, instability
and not keeping up with kernel/library updates, etc.  Sun GridEngine
seems more robust but does not seem to have a big Debian user base. 


	What do you all should we try first? 







	John Speakman 


	Manager, Clinical Research Systems 


	Memorial Sloan-Kettering Cancer Center 


	307 East 63rd Street, New York NY 10021 USA 


	+1 646 735 8187 - SpeakmaJ at mskcc.org 






	Please note that this e-mail and any files transmitted with it
may be 

	privileged, confidential, and protected from disclosure under 

	applicable law. If the reader of this message is not the

	recipient, or an employee or agent responsible for delivering

	message to the intended recipient, you are hereby notified that

	reading, dissemination, distribution, copying, or other use of

	communication or any of its attachments is strictly prohibited.

	you have received this communication in error, please notify the

	sender immediately by replying to this message and deleting this

	message, any attachments, and all copies and backups from your 




	Bioclusters maillist - Bioclusters at bioinformatics.org 



Glen Otero Ph.D. 

Linux Prophet 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://bioinformatics.org/pipermail/bioclusters/attachments/20050120/017db84f/attachment-0001.htm

More information about the Bioclusters mailing list