[Bioclusters] Re:Advice on getting started with clustering, LSF, Xserve

Elia Stupka bioclusters@bioinformatics.org
Thu, 7 Nov 2002 09:36:28 +0800 (SGT)

> I'd love to find some sort of 'bioclustering for dummies' that
> outlines the usual solutions and approaches, also on the software side
> something that describes the fundamentals of writing perl and java to
> exploit clusters and even some simple examples/test packages that I
> could play with to get my feet wet.

Unfortunately there isn't as simple a thing as a "bioclusters for
dummies", else consultants would be out of business and mailing lists
would be dead ;)

My side of the expertise in this area is with bioinformatics pipelines,
having worked with the ensembl pipeline and now having developed with my
group our own flexible open source pipeline in perl. You are welcome to
read the docs of BioPipe at www.biopipe.org. As a general note it is
absolutely worthwhile having something like LSF for your load
sharing. Even though BioPipe does a lot of job management and tracking we
still rely on a load sharing software underneath it. We tend to use LSF
because we could afford it. If you can afford it, it is by far the most
robust and sophisticated load sharing software. Bear in mind that prices
have been going down, and also that they (don't quote me) change depending
on the weather and the LSF salesman usually.... if you can't get LSF, SGE
(Sun Grid Engine) is a good alternative, and so is PBSPro. We have the
wrappers for LSF and PBS for BioPipe. We never got around to writing one
for SGE, but it's very straight-forward, just an extra module that issues
the right commands...

> ease of administration seems to be another pro for LSF which is a big  
> thing as we just want it to work, we dont really want to babysit this  
> stuff - what sort of sysadmin commitment is needed to make this work?

LSF will save you great job management headaches, as long as the initial
setup is done well. Sysadmin commitment will be shifted to more standard
queries like installing programs,etc. Bear in mind that you need a
brilliant sysadmin in the first month or so, when you are building the
cluster, deciding how to spread the blast databases, optimising LSF,etc.

BioPipe will save you the second layer of headaches, i.e. automating
analysis workflows, reproducing them easily,etc.

> any thoughts/experiences with using Xserve in the mix with other
> platforms and Xserve vs intel solutions?

We are currently experimenting with Xserves that we have on loan from
Apple, and are incredibly pleased by some of the performance we get
out. Hybrid clusters are never a major issue *as long as* you have a good
sysadmin who can deal with endianness, file systems, and as long as you
give the sysadmin a good picture of where the possible bottlenecks will

Hope it helped, 


