[Bioclusters] Best ways to tackle migration to dedicated cluster/Farm

Wed, 24 Mar 2004 16:22:29 -0500

Hi Ross,

Your "node pull" system sounds fast, stable and well suited for the 
workflow that you need to manage/operate. I certainly would not throw it 
away in favor of a more general purpose cluster queuing system like SGE 
  if you go the dedicated resource route.

You may want to plan on keeping your existing workflow system while also 
planning to deploy distributed resource management suite like Platform 
LSF or Sun Grid Engine on to the same hardware.

There is no reason why your 'pull' system can not be used alongside a 
more general purpose resource allocation and scheduling framework. This 
lets you use 'what works' for your most important workflow while also 
providing a more general purpose framework for use perhaps by others in 
your institution.

Providing a share NFS resource to the dedicated hardware will make the 
usage and management of the queuing system easier but it is not an 
absolute requirement for either SGE or LSF.

Hardware selection is something I can't really help with -- you are in 
the position to make the most informed decision and seem to have the 
benchmarking process well in hand.

If pressed I'd say that the most "common" hardware configuration I see 
is a dual-CPU box with 4GB of physical memory. There are plenty of 
exceptions though both in CPU count and memory size.

The apps that you did mention are performance bound by both memory and 
disk IO speeds. You will probably find that you can get a significant 
speed boost by benchmarking on nodes that have 2 drives mirrored or 
striped via a software RAID set. The IO gains with multiple drives and 
software RAID can be significant.

-Chris

Ross Crowhurst wrote:
> Also, our existing farm uses "node pull". That is, as nodes come
online a process on each node requests from a mysql configuration
database the type of jobs that the node is capable of undertaking, then
requests a chunk of jobs from a mysql database functioning as a jobs
queue. The nodes process their chunk of jobs and post parsed results
directly back to the appropriate mysql database. All blasts are
performed by piping from the control script to blast then piping results
back in for parsing. No physical sequence/report files are read or
written to local disk (except for interproscan). I used to use NFS and
have the nodes send results files back to an NFS server where they were
parsed to database but that is incredibly slow compared to the system I
now operate. The "node pull" system seems ideal for our current
environment but if we move to a farm/cluster that is available 24x7
there may be a better way to do it (use SGE, standard cluster queuing
systems etc). If I move to splitting databases then its seems I am back
to using NFS, generation of physical reports and parsing these on one or
more servers (parsing itself could be a new job type and merged blast
reports redistributed to the cluster to parse?). Is there a consensus on
the best or most appropriate way to tackle this in a dedicated cluster
environment? I would welcome input on this as well.

-- 
Chris Dagdigian, <dag@sonsorol.org>
BioTeam  - Independent life science IT & informatics consulting
Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E iChat/AIM: bioteamdag  Web: http://bioteam.net