[Bioclusters] Question about grid
Arnon Klein
bioclusters@bioinformatics.org
Thu, 13 May 2004 17:18:33 +0300
If you settle on using Java, then executing complete chunks of code can
be restricted into a "sand box" where you can control (almost?) every
aspect of security that you wish.
But, like Tim Cutts said - it's java, and has certain issues with
performance when it comes to "pure" CPU utilization. However, I'm
willing to suffer 2-fold hit in speed, if it lets me execute on 10x more
CPUs than I have at my lab. And if it can also save me the politics
(actually - begging) involved with having sys-admins let me execute on
their machines...
4 months ago I embarked on a quest to get my hands on as much CPU time
as possible, and did some research, also on this list. One of the
interesting links I got is this:
http://www.cs.may.ie/~tkeane/distributed/index.html , and it looks like
it's a pretty good system for java based calculation, that can be grown
into something quite good in comparison with existing GRID solutions.
BTW - I ended up getting the local PC farm administrator letting me run
my stuff as is, without any further infrastructure (got me ~200
desktops, idle or close to idle most of the time). The data distribution
is less of a problem, because it is a molecular-dynamics-like
calculation - high CPU/IO ratio.
Dan Bolser wrote:
>On Thu, 13 May 2004, Arnon Klein wrote:
>
>
>
>>That's a really exciting concept, but why stop at dedicated cpu-service
>>servers? Why not harness P2P into this, where each client has the
>>ability both to donate cycles and use other people's cycles.
>>I can see this benefitting organisations such as universities, where
>>there is a lot of unuesed desktop power, and many people who need that
>>power, but for short periods only, so didn't get dedicated facilities.
>>
>>
>
>Right! I was strugling with the problem of how to include non web server
>machines in the system (behind firewall, not running a webserver, on the
>same lan as the webserver). If the webserver could privatly talk to
>'internal' nodes (via intranet), it could then expose this via its
>internet connection.
>
>Cool idea!
>
>
>
>>So imagine running these servers/client hybrids, that accept code (in
>>binary, bytecode, or source code format), on anywhere between thousands
>>to millions of computers...
>>If the code is self-contained, then the bandwidth and latency issues are
>>not as big as they would be for small chunks of instructions. I think
>>that even with very fast networks, latencies will kill the benefits when
>>scaling into something larger than a single LAN segment, so to avoid it,
>>you have to batch the instructions together (i.e. send complete functions).
>>
>>
>
>I see, so the 'calculator' as I describe it could be an actuall
>programming language? Sending code is all well and good, but it adds the
>complexity which I want to remove.
>
>However, if you could 'install' your code 'on-the-web' (i.e. standard
>packages exposed on a server), then we could all use the same code /
>distribute code (packages) in P2P environment.
>
>Hmm...
>
>
>
>
>>Doing this in Java is actualy pretty easy, since RMI lets you transport
>>an object containing both code and data over the network.
>>You put up a server, exposing a method such as:
>>
>>interface Computable {
>> public Object compute() throws Exception;
>>}
>>
>>public Object compute(Computable job) throws RemoteException;
>>
>>and using the Java RMI facilities, call this method on the server with
>>an object that implements a method called "compute" that does the
>>computation.
>>
>>Ofcourse , like you said, security and accounting issues will pose
>>problems for a wide-spread installation.
>>
>>
>
>
>This is the problem with sending code again, and why existing grid
>projects are quite complex (as I understand them).
>
>If you send low level code, security and accounting are not a problem. You
>just have to deal with load balancing, client selection etc..
>
>Thanks for your comments,
>
>Dan.
>
>
>
>
>>Arnon
>>
>>Dan Bolser wrote:
>>
>>
>>
>>>Hello,
>>>
>>>I had an idea to do with grid computing, but it may be total garbage.
>>>
>>>I heard about some clever people who started to 'steal' computation from
>>>unsuspecting web sites by hijacking the normal function of the site and
>>>co-opting its computations into a different program.
>>>
>>>If these stories are true, surly we could do this with a bit more
>>>civility, and set up a bunch of generic 'calculators' through the web
>>>which could then be used for grid computing.
>>>
>>>The way I imagine the system is this...
>>>
>>>Program starts by searching the web for calculators, the code is compiled
>>>for the 'web-engine' so every single instruction is encoded as an HTTP /
>>>CGI / XML request, and all instructions are performed over the web on a
>>>shifting number of calculators.
>>>
>>>Actually, I found something similar hear...
>>>
>>>http://ausweb.scu.edu.au/aw02/papers/refereed/kelly/paper.html
>>>
>>>I wanted to ask about the feasibility of such an idea.
>>>
>>>For example if one machine sent all its instructions to another over a
>>>gigabit intra net, how much slower would this be than local computation?
>>>
>>>Is a gigabit LAN 1/2/3/10/100/1000 orders of magnitude slower than
>>>internal CPU communication channels?
>>>
>>>The power of an open source system like this would be if someone like
>>>Apache would take the idea on board and release it as part of its standard
>>>distribution. However, even if every web server on the web were running
>>>such a calculator (why not be ambitious), could the system be fast enough?
>>>
>>>
>>>Naturally there are a lot of issues regarding distribution / allocation /
>>>scheduling etc. but before we get into nasty details, is the idea remotely
>>>worth consideration?
>>>
>>>How difficult would it be to make a Java compiler accommodate such a
>>>web-engine?
>>>
>>>Thanks very much for any feedback,
>>>
>>>Dan.
>>>
>>>_______________________________________________
>>>Bioclusters maillist - Bioclusters@bioinformatics.org
>>>https://bioinformatics.org/mailman/listinfo/bioclusters
>>>
>>>
>>>
>>>
>>>
>>>
>>_______________________________________________
>>Bioclusters maillist - Bioclusters@bioinformatics.org
>>https://bioinformatics.org/mailman/listinfo/bioclusters
>>
>>
>>
>
>_______________________________________________
>Bioclusters maillist - Bioclusters@bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
>
>
>
>