[Bioclusters] Re: use og public PC

Tue, 17 Feb 2004 11:08:02 +0530

Dear friend 
you can use SGE ( sun grid engine) this may solve your problems
best wishes

Bony De Kumar
National facility for transgenic and gene knock out mice
Lab#N301
CCMB, Uppal Road
Hyderabad
500 007
Phone91-040-27192899
     91-040-27160222-41(20 lines)

Bony@ccmb.res.in
Inbo@rediffmail.com

On Mon, 16 Feb 2004 bioclusters-request@bioinformatics.org wrote:

> When replying, PLEASE edit your Subject line so it is more specific
> than "Re: Bioclusters digest, Vol..."  And, PLEASE delete any
> unrelated text from the body.
> 
> 
> Today's Topics:
> 
>    1. Using semi-public PCs for heavy computation jobs (Arnon Klein)
>    2. Re: Using semi-public PCs for heavy computation jobs (Chris Dwan (CCGB))
>    3. Re: Using semi-public PCs for heavy computation jobs (Ron Chen)
>    4. Re: Using semi-public PCs for heavy computation jobs (Arnon Klein)
>    5. RE: Using semi-public PCs for heavy computation jobs (John Van Workum)
>    6. Re: Using semi-public PCs for heavy computation jobs (Dan Bolser)
> 
> --__--__--
> 
> Message: 1
> Date: Sun, 15 Feb 2004 20:49:26 +0200
> From: Arnon Klein <klein@pob.huji.ac.il>
> To: Bioclusters <bioclusters@bioinformatics.org>
> Subject: [Bioclusters] Using semi-public PCs for heavy computation jobs
> Reply-To: bioclusters@bioinformatics.org
> 
> As part of my graduate research, I need to run a job of a genome-wide 
> scale. Using all of the computers available to me at my lab, this can 
> take about 6 months. We don't have a cluster...
> I am already making use of a students computer lab, after-hours. Those 
> computers run linux, and it was a no-brainer : just hacked some scripts 
> to rshell into the machines, activated by the crond. It's not enough, 
> though.
> While I'm looking at the option of getting or buying CPU time on a 
> cluster, I am also tempted to make use of other public PCs at the 
> campus. The ideal thing here is to have something like SETI@home or 
> Fold@home, but I would go for anything that will allow me to have my 
> jobs running on as many PCs as possible here, while not making me the 
> enemy of the system admins...
> We're talking about Windows based PCs (mostly 2000 or XP), at least some 
> of them are managed using a central image.
> Right now it looks like the simplest option is to install sshd or telnet 
> service on them, and have a script that logs in after hours, and  
> execute some binary. However, I'm not sure this would go well with the 
> sys-admins (security implications?).
> I think it would be best if I can approach the authorities with a 
> sensible plan - first impressions are very important...
> 
> I would like to hear anything about this subject: configuration 
> suggestions, past experience, encouragements, discouragements, etc.
> 
> Arnon
> 
> --__--__--
> 
> Message: 2
> Date: Sun, 15 Feb 2004 14:10:13 -0600 (CST)
> From: "Chris Dwan (CCGB)" <cdwan@mail.ahc.umn.edu>
> To: Bioclusters <bioclusters@bioinformatics.org>
> Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation jobs
> Reply-To: bioclusters@bioinformatics.org
> 
> 
> > As part of my graduate research, I need to run a job of a genome-wide
> > scale. Using all of the computers available to me at my lab, this can
> > take about 6 months. We don't have a cluster...
> 
> That must be a pretty impressive job.  I'd love to hear more about your
> research that takes so much CPU time.  That, however, is probably a
> different thread.  I have some experience with the situation you describe.
> 
> I've built up, by hook and crook, semi-integrated access to a moderate
> number of compute resources (a few hundred CPUs) spread over several
> campuses.  I suspect we're in something of the same boat, resource wise.
> You could call this hodgepodge a "grid," but that lends a dignity and
> maturity to a system whose only really positive attribute is that it
> functions to get real work done.
> 
> Below is a list, easiest to hardest, of systems I've hooked in:
> 
> * My very own cluster that I admin
> * The cluster maintained by our local supercomputing center
> * Clusters maintained by collaborators at other institutions
> * Lab workstations that I admin, running Linux or OS X
> * Lab workstations maintained by a someone else, running Linux or OS-X
> * Lab workstations which usually run Windows, maintained by someone else,
>     which can be rebooted into Linux or OS X at night
> 
> Then, of course, there are the systems which I decided would be too much
> trouble, particularly given the number of CPUs in question:
> 
> * Lab workstations running OS 9 or Windows, which I can't get rebooted
>     into Linux.
> 
> > I am already making use of a students computer lab, after-hours. Those
> > computers run linux, and it was a no-brainer : just hacked some scripts
> > to rshell into the machines, activated by the crond. It's not enough,
> > though.
> 
> The major queuing systems for clusters (LSF, PBS-Pro (Torque?), and SGE)
> each have facilities for cycle stealing from workstations.  The very
> best way to approach this situation is to convince the lab admin to set up
> a queuing system to run jobs on those machines only in certain hours, or
> (better) when there is nobody logged in at the terminal, the load is below
> the number of CPUs, the mouse hasn't moved and no key has been struck in
> 15 minutes, or whatever they're most comfortable with.
> 
> Many folks on this list (myself included) have written our very own rsh /
> cron based remote job execution system.  Unless there's a really good
> reason to do it (infinite CPUs available, but there's no chance of getting
> a queueing system installed so you have to hack) experience says that it's
> better to use an established package with a user support base and code
> maintained by somebody else.
> 
> A system which has been around a long time, is quite mature, and doesn't
> get nearly enough credit is Condor (from the University of Wisconsin).
> It's explicitly designed as a cycle scavanger.  I know some cluster admins
> who run condor to "backfill" their tightly scheduled cluster.
> 
> Integrating a set of queuing systems across domains remains tricky.  I've
> found that, despite unlimited hype, grid software (including the globus
> package) remains best suited to a single administrative domain, single
> administrator setup.  Of course, I haven't tried all of the offerings, and
> I haven't installed globus this week...so things might have changed.  I
> encourage you to try all the options available and see what
> works for you.  Exclude Sun's "Grid Engine" from the above statement, as
> they have a slightly different definition of a "grid" than we're talking
> about here.
> 
> Anyway, I've got a horrible, hacked together "metascheduler" that has
> nothing going for it except the fact that it works.  I would happily throw
> it away if someone came out with a product or tool that did the same
> thing.  It, on a user by user basis maintains a list of the resources to
> which that user can connect.  It loops over a queue of jobs, checking to
> see which resource is not overloaded and sending jobs out as appropriate.
> 
> I've used both PBS and SGE's faclity for "sloughing off" jobs from one
> queue to another.  These are neat, unless you need to hand off jobs for
> only some users, but not others, to go between SGE and PBS, or to really
> keep track of errors.  Probably best suited to a setup with several
> queues maintained by a single admin or single team.
> 
> > While I'm looking at the option of getting or buying CPU time on a
> > cluster, I am also tempted to make use of other public PCs at the
> > campus. The ideal thing here is to have something like SETI@home or
> > Fold@home, but I would go for anything that will allow me to have my
> > jobs running on as many PCs as possible here, while not making me the
> > enemy of the system admins...
> 
> United Devices sells a software package to do this, and some very large
> corporate installations (2000+ CPUs) have been brough online.  The trick
> with systems like this is getting enough systems to offset the high
> latencies and (generally) low performance CPUs.  If you can convince your
> University IT department to make a campus wide resource of this sort, it
> will be terrific.  On the other hand, you'll probably have to share it.
> 
> In any solution you build, data motion and error detection / correction
> will be the biggest time-sinks.
> 
> I've found that, for jobs requiring a moderate size dataset (my core set
> of BLAST targets is around 14GB) data motion should be decoupled from CPU
> finding.  I.e:  I have one process that pushes data out to compute
> resources on a regular basis, and jobs are only scheduled onto nodes that
> have the needed data.  This means that I have to ask my partners not just
> for access to their CPUs, but for a bit of storage dedicated to me, as
> close to the compute nodes as possible.
> 
> > I think it would be best if I can approach the authorities with a
> > sensible plan - first impressions are very important...
> 
> The social aspects of this sort of distributed computation are, by far,
> the most important.  If there is trust between the administrative domains,
> the rest is really just technical work.  Without the trust, it's nearly
> impossible to make even the best plan succeed.
> 
> > I would like to hear anything about this subject: configuration
> > suggestions, past experience, encouragements, discouragements, etc.
> 
> Me too.  I've got my experiences and opinions, and I'm always interested
> in other takes on similar problems.  On a totally selfish note, if anyone
> wants to share CPUs with me, we can expand each other's grids.
> 
> Any takers?
> 
> -Chris Dwan
>  The University of Minnesota
> 
> --__--__--
> 
> Message: 3
> Date: Sun, 15 Feb 2004 16:53:29 -0800 (PST)
> From: Ron Chen <ron_chen_123@yahoo.com>
> Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation jobs
> To: bioclusters@bioinformatics.org
> Reply-To: bioclusters@bioinformatics.org
> 
> GridEngine (SGE) 6.0 will integrate with JXTA,
> offering JxGrid, to provide P2P workload management
> like SETI@home.
> 
> http://gridengine.sunsource.net/project/gridengine/workshop22-24.09.03/proceedings.html
> "Resource Discovery in Sun Grid Engine using JXTA"
> 
> However, SGE 6.0 is not available until May 2004. So I
> suggest another package called "BOINC".
> 
> http://boinc.berkeley.edu
> 
> BOINC is free+opensource, supports multiple platforms
> (Windows, Linux, Solaris, MacOSX).
> 
> Your approach of installing sshd/telnetd is OK, but
> the sys.admins will not like opening a port, since
> hackers can get in easiler. BOINC does not leave a
> port open, and it uses http to get the workload (so
> easiler to go through firewalls). Moreover, it allows
> suspending the work when users access the machine, and
> allow better scheduling. Further it has better file
> transfer than home-made solutions.
> 
> I would suggest you to look at the link above as I do
> not fully know all the features!
> 
>  -Ron
> 
> --- Arnon Klein <klein@pob.huji.ac.il> wrote:
> > As part of my graduate research, I need to run a job
> > of a genome-wide 
> > scale. Using all of the computers available to me at
> > my lab, this can 
> > take about 6 months. We don't have a cluster...
> > I am already making use of a students computer lab,
> > after-hours. Those 
> > computers run linux, and it was a no-brainer : just
> > hacked some scripts 
> > to rshell into the machines, activated by the crond.
> > It's not enough, 
> > though.
> > While I'm looking at the option of getting or buying
> > CPU time on a 
> > cluster, I am also tempted to make use of other
> > public PCs at the 
> > campus. The ideal thing here is to have something
> > like SETI@home or 
> > Fold@home, but I would go for anything that will
> > allow me to have my 
> > jobs running on as many PCs as possible here, while
> > not making me the 
> > enemy of the system admins...
> > We're talking about Windows based PCs (mostly 2000
> > or XP), at least some 
> > of them are managed using a central image.
> > Right now it looks like the simplest option is to
> > install sshd or telnet 
> > service on them, and have a script that logs in
> > after hours, and  
> > execute some binary. However, I'm not sure this
> > would go well with the 
> > sys-admins (security implications?).
> > I think it would be best if I can approach the
> > authorities with a 
> > sensible plan - first impressions are very
> > important...
> > 
> > I would like to hear anything about this subject:
> > configuration 
> > suggestions, past experience, encouragements,
> > discouragements, etc.
> > 
> > Arnon
> > _______________________________________________
> > Bioclusters maillist  - 
> > Bioclusters@bioinformatics.org
> >
> https://bioinformatics.org/mailman/listinfo/bioclusters
> 
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! Finance: Get your refund fast by filing online.
> http://taxes.yahoo.com/filing.html
> 
> --__--__--
> 
> Message: 4
> Date: Mon, 16 Feb 2004 17:36:30 +0200
> From: Arnon Klein <klein@pob.huji.ac.il>
> To: bioclusters@bioinformatics.org
> Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation jobs
> Reply-To: bioclusters@bioinformatics.org
> 
> Thanks Chris and Ron for the responses. I've found BOINC and Condor as 
> very interesting and possible solutions for my problem. Since my 
> software is built around java RMI (using Master/Worker paradigm), they 
> also feel the most natural transitions (The heavy calculations are done 
> in C, if anyone is worried about optimization...) .
> I'll try to pull this off, and I'll come back to this list with the 
> story of how it went.
> 
> Arnon
> 
> 
> --__--__--
> 
> Message: 5
> From: "John Van Workum" <jdvw@tticluster.com>
> To: <bioclusters@bioinformatics.org>
> Subject: RE: [Bioclusters] Using semi-public PCs for heavy computation jobs
> Date: Mon, 16 Feb 2004 11:10:36 -0500
> Reply-To: bioclusters@bioinformatics.org
> 
> Arnon,
> 
> You may want to look at GreenTea. It is a pure Java "grid" platform that may mesh well with your java RMI.
> http://www.greenteatech.com/
> 
> Regards,
> 
> John
> TTI
> 
> > -----Original Message-----
> > From: bioclusters-admin@bioinformatics.org
> > [mailto:bioclusters-admin@bioinformatics.org]On Behalf Of Arnon Klein
> > Sent: Monday, February 16, 2004 10:37 AM
> > To: bioclusters@bioinformatics.org
> > Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation
> > jobs
> >
> >
> > Thanks Chris and Ron for the responses. I've found BOINC and Condor as
> > very interesting and possible solutions for my problem. Since my
> > software is built around java RMI (using Master/Worker paradigm), they
> > also feel the most natural transitions (The heavy calculations are done
> > in C, if anyone is worried about optimization...) .
> > I'll try to pull this off, and I'll come back to this list with the
> > story of how it went.
> >
> > Arnon
> >
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
> >
> 
> 
> --__--__--
> 
> Message: 6
> Date: Mon, 16 Feb 2004 16:12:37 +0000 (GMT)
> From: Dan Bolser <dmb@mrc-dunn.cam.ac.uk>
> To: bioclusters@bioinformatics.org
> Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation jobs
> Reply-To: bioclusters@bioinformatics.org
> 
> On Mon, 16 Feb 2004, Arnon Klein wrote:
> 
> > Thanks Chris and Ron for the responses. I've found BOINC and Condor as 
> > very interesting and possible solutions for my problem. Since my 
> > software is built around java RMI (using Master/Worker paradigm), they 
> > also feel the most natural transitions (The heavy calculations are done 
> > in C, if anyone is worried about optimization...) .
> 
> What kind of calculation are you doing? 
> Cheers,
> Dan.
> 
> > I'll try to pull this off, and I'll come back to this list with the 
> > story of how it went.
> > 
> > Arnon
> > 
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
> > 
> 
> 
> 
> --__--__--
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
> 
> 
> End of Bioclusters Digest
>