[Bioclusters] Re: use og public PC
Bony De Kumar
bioclusters@bioinformatics.org
Tue, 17 Feb 2004 11:08:02 +0530
Dear friend
you can use SGE ( sun grid engine) this may solve your problems
best wishes
Bony De Kumar
National facility for transgenic and gene knock out mice
Lab#N301
CCMB, Uppal Road
Hyderabad
500 007
Phone91-040-27192899
91-040-27160222-41(20 lines)
Bony@ccmb.res.in
Inbo@rediffmail.com
On Mon, 16 Feb 2004 bioclusters-request@bioinformatics.org wrote:
> When replying, PLEASE edit your Subject line so it is more specific
> than "Re: Bioclusters digest, Vol..." And, PLEASE delete any
> unrelated text from the body.
>
>
> Today's Topics:
>
> 1. Using semi-public PCs for heavy computation jobs (Arnon Klein)
> 2. Re: Using semi-public PCs for heavy computation jobs (Chris Dwan (CCGB))
> 3. Re: Using semi-public PCs for heavy computation jobs (Ron Chen)
> 4. Re: Using semi-public PCs for heavy computation jobs (Arnon Klein)
> 5. RE: Using semi-public PCs for heavy computation jobs (John Van Workum)
> 6. Re: Using semi-public PCs for heavy computation jobs (Dan Bolser)
>
> --__--__--
>
> Message: 1
> Date: Sun, 15 Feb 2004 20:49:26 +0200
> From: Arnon Klein <klein@pob.huji.ac.il>
> To: Bioclusters <bioclusters@bioinformatics.org>
> Subject: [Bioclusters] Using semi-public PCs for heavy computation jobs
> Reply-To: bioclusters@bioinformatics.org
>
> As part of my graduate research, I need to run a job of a genome-wide
> scale. Using all of the computers available to me at my lab, this can
> take about 6 months. We don't have a cluster...
> I am already making use of a students computer lab, after-hours. Those
> computers run linux, and it was a no-brainer : just hacked some scripts
> to rshell into the machines, activated by the crond. It's not enough,
> though.
> While I'm looking at the option of getting or buying CPU time on a
> cluster, I am also tempted to make use of other public PCs at the
> campus. The ideal thing here is to have something like SETI@home or
> Fold@home, but I would go for anything that will allow me to have my
> jobs running on as many PCs as possible here, while not making me the
> enemy of the system admins...
> We're talking about Windows based PCs (mostly 2000 or XP), at least some
> of them are managed using a central image.
> Right now it looks like the simplest option is to install sshd or telnet
> service on them, and have a script that logs in after hours, and
> execute some binary. However, I'm not sure this would go well with the
> sys-admins (security implications?).
> I think it would be best if I can approach the authorities with a
> sensible plan - first impressions are very important...
>
> I would like to hear anything about this subject: configuration
> suggestions, past experience, encouragements, discouragements, etc.
>
> Arnon
>
> --__--__--
>
> Message: 2
> Date: Sun, 15 Feb 2004 14:10:13 -0600 (CST)
> From: "Chris Dwan (CCGB)" <cdwan@mail.ahc.umn.edu>
> To: Bioclusters <bioclusters@bioinformatics.org>
> Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation jobs
> Reply-To: bioclusters@bioinformatics.org
>
>
> > As part of my graduate research, I need to run a job of a genome-wide
> > scale. Using all of the computers available to me at my lab, this can
> > take about 6 months. We don't have a cluster...
>
> That must be a pretty impressive job. I'd love to hear more about your
> research that takes so much CPU time. That, however, is probably a
> different thread. I have some experience with the situation you describe.
>
> I've built up, by hook and crook, semi-integrated access to a moderate
> number of compute resources (a few hundred CPUs) spread over several
> campuses. I suspect we're in something of the same boat, resource wise.
> You could call this hodgepodge a "grid," but that lends a dignity and
> maturity to a system whose only really positive attribute is that it
> functions to get real work done.
>
> Below is a list, easiest to hardest, of systems I've hooked in:
>
> * My very own cluster that I admin
> * The cluster maintained by our local supercomputing center
> * Clusters maintained by collaborators at other institutions
> * Lab workstations that I admin, running Linux or OS X
> * Lab workstations maintained by a someone else, running Linux or OS-X
> * Lab workstations which usually run Windows, maintained by someone else,
> which can be rebooted into Linux or OS X at night
>
> Then, of course, there are the systems which I decided would be too much
> trouble, particularly given the number of CPUs in question:
>
> * Lab workstations running OS 9 or Windows, which I can't get rebooted
> into Linux.
>
> > I am already making use of a students computer lab, after-hours. Those
> > computers run linux, and it was a no-brainer : just hacked some scripts
> > to rshell into the machines, activated by the crond. It's not enough,
> > though.
>
> The major queuing systems for clusters (LSF, PBS-Pro (Torque?), and SGE)
> each have facilities for cycle stealing from workstations. The very
> best way to approach this situation is to convince the lab admin to set up
> a queuing system to run jobs on those machines only in certain hours, or
> (better) when there is nobody logged in at the terminal, the load is below
> the number of CPUs, the mouse hasn't moved and no key has been struck in
> 15 minutes, or whatever they're most comfortable with.
>
> Many folks on this list (myself included) have written our very own rsh /
> cron based remote job execution system. Unless there's a really good
> reason to do it (infinite CPUs available, but there's no chance of getting
> a queueing system installed so you have to hack) experience says that it's
> better to use an established package with a user support base and code
> maintained by somebody else.
>
> A system which has been around a long time, is quite mature, and doesn't
> get nearly enough credit is Condor (from the University of Wisconsin).
> It's explicitly designed as a cycle scavanger. I know some cluster admins
> who run condor to "backfill" their tightly scheduled cluster.
>
> Integrating a set of queuing systems across domains remains tricky. I've
> found that, despite unlimited hype, grid software (including the globus
> package) remains best suited to a single administrative domain, single
> administrator setup. Of course, I haven't tried all of the offerings, and
> I haven't installed globus this week...so things might have changed. I
> encourage you to try all the options available and see what
> works for you. Exclude Sun's "Grid Engine" from the above statement, as
> they have a slightly different definition of a "grid" than we're talking
> about here.
>
> Anyway, I've got a horrible, hacked together "metascheduler" that has
> nothing going for it except the fact that it works. I would happily throw
> it away if someone came out with a product or tool that did the same
> thing. It, on a user by user basis maintains a list of the resources to
> which that user can connect. It loops over a queue of jobs, checking to
> see which resource is not overloaded and sending jobs out as appropriate.
>
> I've used both PBS and SGE's faclity for "sloughing off" jobs from one
> queue to another. These are neat, unless you need to hand off jobs for
> only some users, but not others, to go between SGE and PBS, or to really
> keep track of errors. Probably best suited to a setup with several
> queues maintained by a single admin or single team.
>
> > While I'm looking at the option of getting or buying CPU time on a
> > cluster, I am also tempted to make use of other public PCs at the
> > campus. The ideal thing here is to have something like SETI@home or
> > Fold@home, but I would go for anything that will allow me to have my
> > jobs running on as many PCs as possible here, while not making me the
> > enemy of the system admins...
>
> United Devices sells a software package to do this, and some very large
> corporate installations (2000+ CPUs) have been brough online. The trick
> with systems like this is getting enough systems to offset the high
> latencies and (generally) low performance CPUs. If you can convince your
> University IT department to make a campus wide resource of this sort, it
> will be terrific. On the other hand, you'll probably have to share it.
>
> In any solution you build, data motion and error detection / correction
> will be the biggest time-sinks.
>
> I've found that, for jobs requiring a moderate size dataset (my core set
> of BLAST targets is around 14GB) data motion should be decoupled from CPU
> finding. I.e: I have one process that pushes data out to compute
> resources on a regular basis, and jobs are only scheduled onto nodes that
> have the needed data. This means that I have to ask my partners not just
> for access to their CPUs, but for a bit of storage dedicated to me, as
> close to the compute nodes as possible.
>
> > I think it would be best if I can approach the authorities with a
> > sensible plan - first impressions are very important...
>
> The social aspects of this sort of distributed computation are, by far,
> the most important. If there is trust between the administrative domains,
> the rest is really just technical work. Without the trust, it's nearly
> impossible to make even the best plan succeed.
>
> > I would like to hear anything about this subject: configuration
> > suggestions, past experience, encouragements, discouragements, etc.
>
> Me too. I've got my experiences and opinions, and I'm always interested
> in other takes on similar problems. On a totally selfish note, if anyone
> wants to share CPUs with me, we can expand each other's grids.
>
> Any takers?
>
> -Chris Dwan
> The University of Minnesota
>
> --__--__--
>
> Message: 3
> Date: Sun, 15 Feb 2004 16:53:29 -0800 (PST)
> From: Ron Chen <ron_chen_123@yahoo.com>
> Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation jobs
> To: bioclusters@bioinformatics.org
> Reply-To: bioclusters@bioinformatics.org
>
> GridEngine (SGE) 6.0 will integrate with JXTA,
> offering JxGrid, to provide P2P workload management
> like SETI@home.
>
> http://gridengine.sunsource.net/project/gridengine/workshop22-24.09.03/proceedings.html
> "Resource Discovery in Sun Grid Engine using JXTA"
>
> However, SGE 6.0 is not available until May 2004. So I
> suggest another package called "BOINC".
>
> http://boinc.berkeley.edu
>
> BOINC is free+opensource, supports multiple platforms
> (Windows, Linux, Solaris, MacOSX).
>
> Your approach of installing sshd/telnetd is OK, but
> the sys.admins will not like opening a port, since
> hackers can get in easiler. BOINC does not leave a
> port open, and it uses http to get the workload (so
> easiler to go through firewalls). Moreover, it allows
> suspending the work when users access the machine, and
> allow better scheduling. Further it has better file
> transfer than home-made solutions.
>
> I would suggest you to look at the link above as I do
> not fully know all the features!
>
> -Ron
>
> --- Arnon Klein <klein@pob.huji.ac.il> wrote:
> > As part of my graduate research, I need to run a job
> > of a genome-wide
> > scale. Using all of the computers available to me at
> > my lab, this can
> > take about 6 months. We don't have a cluster...
> > I am already making use of a students computer lab,
> > after-hours. Those
> > computers run linux, and it was a no-brainer : just
> > hacked some scripts
> > to rshell into the machines, activated by the crond.
> > It's not enough,
> > though.
> > While I'm looking at the option of getting or buying
> > CPU time on a
> > cluster, I am also tempted to make use of other
> > public PCs at the
> > campus. The ideal thing here is to have something
> > like SETI@home or
> > Fold@home, but I would go for anything that will
> > allow me to have my
> > jobs running on as many PCs as possible here, while
> > not making me the
> > enemy of the system admins...
> > We're talking about Windows based PCs (mostly 2000
> > or XP), at least some
> > of them are managed using a central image.
> > Right now it looks like the simplest option is to
> > install sshd or telnet
> > service on them, and have a script that logs in
> > after hours, and
> > execute some binary. However, I'm not sure this
> > would go well with the
> > sys-admins (security implications?).
> > I think it would be best if I can approach the
> > authorities with a
> > sensible plan - first impressions are very
> > important...
> >
> > I would like to hear anything about this subject:
> > configuration
> > suggestions, past experience, encouragements,
> > discouragements, etc.
> >
> > Arnon
> > _______________________________________________
> > Bioclusters maillist -
> > Bioclusters@bioinformatics.org
> >
> https://bioinformatics.org/mailman/listinfo/bioclusters
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Finance: Get your refund fast by filing online.
> http://taxes.yahoo.com/filing.html
>
> --__--__--
>
> Message: 4
> Date: Mon, 16 Feb 2004 17:36:30 +0200
> From: Arnon Klein <klein@pob.huji.ac.il>
> To: bioclusters@bioinformatics.org
> Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation jobs
> Reply-To: bioclusters@bioinformatics.org
>
> Thanks Chris and Ron for the responses. I've found BOINC and Condor as
> very interesting and possible solutions for my problem. Since my
> software is built around java RMI (using Master/Worker paradigm), they
> also feel the most natural transitions (The heavy calculations are done
> in C, if anyone is worried about optimization...) .
> I'll try to pull this off, and I'll come back to this list with the
> story of how it went.
>
> Arnon
>
>
> --__--__--
>
> Message: 5
> From: "John Van Workum" <jdvw@tticluster.com>
> To: <bioclusters@bioinformatics.org>
> Subject: RE: [Bioclusters] Using semi-public PCs for heavy computation jobs
> Date: Mon, 16 Feb 2004 11:10:36 -0500
> Reply-To: bioclusters@bioinformatics.org
>
> Arnon,
>
> You may want to look at GreenTea. It is a pure Java "grid" platform that may mesh well with your java RMI.
> http://www.greenteatech.com/
>
> Regards,
>
> John
> TTI
>
> > -----Original Message-----
> > From: bioclusters-admin@bioinformatics.org
> > [mailto:bioclusters-admin@bioinformatics.org]On Behalf Of Arnon Klein
> > Sent: Monday, February 16, 2004 10:37 AM
> > To: bioclusters@bioinformatics.org
> > Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation
> > jobs
> >
> >
> > Thanks Chris and Ron for the responses. I've found BOINC and Condor as
> > very interesting and possible solutions for my problem. Since my
> > software is built around java RMI (using Master/Worker paradigm), they
> > also feel the most natural transitions (The heavy calculations are done
> > in C, if anyone is worried about optimization...) .
> > I'll try to pull this off, and I'll come back to this list with the
> > story of how it went.
> >
> > Arnon
> >
> > _______________________________________________
> > Bioclusters maillist - Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
> >
>
>
> --__--__--
>
> Message: 6
> Date: Mon, 16 Feb 2004 16:12:37 +0000 (GMT)
> From: Dan Bolser <dmb@mrc-dunn.cam.ac.uk>
> To: bioclusters@bioinformatics.org
> Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation jobs
> Reply-To: bioclusters@bioinformatics.org
>
> On Mon, 16 Feb 2004, Arnon Klein wrote:
>
> > Thanks Chris and Ron for the responses. I've found BOINC and Condor as
> > very interesting and possible solutions for my problem. Since my
> > software is built around java RMI (using Master/Worker paradigm), they
> > also feel the most natural transitions (The heavy calculations are done
> > in C, if anyone is worried about optimization...) .
>
> What kind of calculation are you doing?
> Cheers,
> Dan.
>
> > I'll try to pull this off, and I'll come back to this list with the
> > story of how it went.
> >
> > Arnon
> >
> > _______________________________________________
> > Bioclusters maillist - Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
> >
>
>
>
> --__--__--
>
> _______________________________________________
> Bioclusters maillist - Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>
>
> End of Bioclusters Digest
>