[Bioclusters] Sequest on Linux

Wed Aug 17 12:55:13 EDT 2005

Unfortunately, the interconnect integrated with our bladecenter is
internal 10/100 switch(es).  I expect that gig versions are now
available.  But, we also only have eight (8) blades, so the overall
network environment for us isn't as critical as say, for the 200-node
cluster you mentioned below.  

We have used GigE-copper for many of our OTHER higher-end systems, and
it seems to work rather well.  It's certainly cost effective and you
just don't have to worry about special cables, adapters, and so on.....

And yes, we seem to have had our share of internal mini-drive drive
failures in those blades.   In a year's time we've lost three drives
(out of 16).  My personal guess is that those drives are susceptible to
internal heat... Our blades are VERY early models (pre-production).
Fortunately, as I mentioned earlier our nodes are essentially clones of
each other- not difficult to recover from a blown drive.   But it is a
tad time-consuming..

I am not sure about our overall utilization with regards to jobs, I have
not set up any detailed monitoring yet.  Our analysts do however appear
to be quite pleased with the overall performance though.

No, we don't have any NAS here - we just haven't had to require that
much storage.   Our environment here is primarily compute-intensive, and
most servers we have use simple locally-attached scsi drives for the
most part.

--kcb

> -----Original Message-----
> From: bioclusters-bounces+brodie=mcw.edu at bioinformatics.org
> [mailto:bioclusters-bounces+brodie=mcw.edu at bioinformatics.org] On
Behalf
> Of Botka, Christopher
> Sent: Wednesday, August 17, 2005 11:11 AM
> To: Clustering, compute farming & distributed computing in life
science
> informatics
> Subject: RE: [Bioclusters] Sequest on Linux
> 
> Hi,
> 
> Thanks for all the info.  From what I have heard, the win2k server
head
> node is still a requirement.  What interconnect do you guys have on
your
> bladecenter?  I was at the Bauer Center at Harvard prior to coming
here
> and we had about 200 Intel IBM Blades and other than the 40GB IDE
drive
> failures on the early ones (lots of them failed), I liked the blades a
> lot.  We have Dell 1Us and are still trying to decide what to use GigE
> CU, GigF Fiber, or HBA->SAN.  If I can get away with GbitE Copper, I'd
> be thrilled.  Do you figure PVM is causing the network bandwidth usage
> for you guys?  How many CPUs/job on average?
> 
> Do you guys have any network attached storage?  I am planning to NAS
> attached as much as I can and hopefully just have OS/scratch on the
> nodes.  I'm still hopeful that in the middle to long term to be able
to
> integrate the Sequest nodes with LSF.  I've done some work with MPI
and
> LSF.  The only problem is that in a hi-use cluster LSF grabs a CPU at
a
> time and hold them until the total number of requested CPUs become
> available.  This can tie up a CPU for quite a while as it waits for
> others to get free.  I have not had any experience with PVM and a
> scheduler, though I can't imagine it's too much different.
> 
> Thanks again for your time.
> 
> Chris
> 
> 
> 
> -----Original Message-----
> From:
>
bioclusters-bounces+christopher.botka=joslin.harvard.edu at bioinformatics.
> org
>
[mailto:bioclusters-bounces+christopher.botka=joslin.harvard.edu at bioinfo
> rmatics.org] On Behalf Of Brodie, Kent
> Sent: Tuesday, August 16, 2005 11:44 AM
> To: Clustering, compute farming & distributed computing in life
science
> informatics
> Subject: RE: [Bioclusters] Sequest on Linux
> 
> Hi.
> 
> As Simon introduced, we're running Sequest on Linux.   The
installation
> happens to be Suse 8; that's what was chosen because that provided the
> best powerpc support at that time.   Ultimately, we'll likely switch
the
> nodes to RedHat later this year to match up with out other Linux
> servers.   For Sequest, the particular flavor does not matter too
much.
> The head node of our Sequest environment is a Windows 2000 Server.
That
> was a requirement from what I know.  (The JS20/Sequest install here
> pre-dated my arrival...).
> 
> >From what I have seen, there's not really all that much I/O required
to
> make the Sequest animal work.   Most of the bandwidth needs are really
> going to be on the network side, I believe.  That's where our JS20
> bladecenter excels, because of the common network backplane.   The
> chunks of data being analyzed are really nothing more than bits of
> text..
> 
> The Sequest head node basically blasts stuff to the remote worker
nodes
> via RSH/etc.   The raw files (FASTA and so on) used for comparative
> analysis are all copied to the worker nodes "ahead of time", and the
> data file chunks being analyzed just really aren't that large.    In
out
> environment, the JS20's have two little 40-GB 2.5" internal SCSI
drives
> on the blades.    The primary drive only has 3G used (operating
system,
> apps, sequest), and the secondary drive only has 1.6G used (raw data
> files).  I do not suspect you're going to notice huge Sequest time
> differences based on the drives.....   (I could be wrong?).
> 
> For backups, we really don't care much, since each worker node is more
> or less a clone of the other.   A dead node is easily replicated.   We
> do keep weekly backups of the first node "just in case".
> 
> For integration, my understanding is that A Sequest-installed series
of
> systems can co-exist with other job scheduler environments on the same
> cluster.    As I mentioned earlier, RSH playes a huge part in keeping
> Sequest talking.    The technical communication between the nodes just
> really isn't that complicated, and my assumption is something like
> LSF/SGE/PBS/etc could peacefully co-exist.
> 
> If you have further scientific-like queries, Simon will tackle those,
> and I'll be happy to address any other sysadmin-like questions you may
> have.
> 
> --Kent C. Brodie, MS
>   Department of Physiology
>   Human & Molecular Genetics Center
>   Medical College of Wisconsin
> 
> 
> 
> 
> > -----Original Message-----
> > From: bioclusters-bounces+brodie=mcw.edu at bioinformatics.org
> > [mailto:bioclusters-bounces+brodie=mcw.edu at bioinformatics.org] On
> Behalf
> > Of Botka, Christopher
> > Sent: Monday, August 15, 2005 11:46 PM
> > To: bioclusters at bioinformatics.org
> > Subject: [Bioclusters] Sequest on Linux
> >
> >
> > Is anyone out there running Sequest on Linux for MS analysis?  We
are
> in
> > the process of setting up a modest sized cluster to run Sequest and
> would
> > be interested in sharing info and experiences with anyone out there
> who
> > might be doing the same.
> >
> > Some issues:
> >
> >    1. I/O requirements - what's the minimum thruput needed to run
> Sequest.
> > We are gong to test both SATA and FC drives with multiple types of
> > interconnects, as well as local SCSI drives.
> >    2. Integration of the Thermo queuing system with other job
> management
> > systems (LSF/SGE etc) - Can Sequest be integrated into a general
> purpose
> > cluster?
> >    3. Middle to long term storage requirements and back up
strategies.
> >
> > Thanks,
> >
> > Chris
> >
> > botka at joslin.harvard.edu
> >
> >
> >
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters