[Bioclusters] Question about grid

Arnon Klein bioclusters@bioinformatics.org
Thu, 13 May 2004 17:18:33 +0300


If you settle on using Java, then executing complete chunks of code can 
be restricted into a "sand box" where you can control (almost?) every 
aspect of security that you wish.

But, like Tim Cutts said - it's java, and has certain issues with 
performance when it comes to "pure" CPU utilization. However, I'm 
willing to suffer 2-fold hit in speed, if it lets me execute on 10x more 
CPUs than I have at my lab. And if it can also save me the politics 
(actually - begging) involved with having sys-admins let me execute on 
their machines...

4 months ago I embarked on a quest to get my hands on as much CPU time 
as possible, and did some research, also on this list. One of the 
interesting links I got is this: 
http://www.cs.may.ie/~tkeane/distributed/index.html , and it looks like 
it's a pretty good system for java based calculation, that can be grown 
into something quite good in comparison with existing GRID solutions.

BTW - I ended up getting the local PC farm administrator letting me run 
my stuff as is, without any further infrastructure (got me ~200 
desktops, idle or close to idle most of the time). The data distribution 
is less of a problem, because it is a molecular-dynamics-like 
calculation - high CPU/IO ratio.

Dan Bolser wrote:

>On Thu, 13 May 2004, Arnon Klein wrote:
>
>  
>
>>That's a really exciting concept, but why stop at dedicated cpu-service 
>>servers? Why not harness P2P into this, where each client has the 
>>ability both to donate cycles and use other people's cycles.
>>I can see this benefitting organisations such as universities, where 
>>there is a lot of unuesed desktop power, and many people who need that 
>>power, but for short periods only, so didn't get dedicated facilities.
>>    
>>
>
>Right! I was strugling with the problem of how to include non web server
>machines in the system (behind firewall, not running a webserver, on the
>same lan as the webserver). If the webserver could privatly talk to
>'internal' nodes (via intranet), it could then expose this via its
>internet connection.
>
>Cool idea!
>
>  
>
>>So imagine running these servers/client hybrids, that accept code (in 
>>binary, bytecode, or source code format), on anywhere between thousands 
>>to millions of computers...
>>If the code is self-contained, then the bandwidth and latency issues are 
>>not as big as they would be for small chunks of instructions. I think 
>>that even with very fast networks, latencies will kill the benefits when 
>>scaling into something larger than a single LAN segment, so to avoid it, 
>>you have to batch the instructions together (i.e. send complete functions).
>>    
>>
>
>I see, so the 'calculator' as I describe it could be an actuall
>programming language? Sending code is all well and good, but it adds the
>complexity which I want to remove. 
>
>However, if you could 'install' your code 'on-the-web' (i.e. standard
>packages exposed on a server), then we could all use the same code /
>distribute code (packages) in P2P environment.
>
>Hmm...
>
>
>  
>
>>Doing this in Java is actualy pretty easy, since RMI lets you transport 
>>an object containing both code and data over the network.
>>You put up a server, exposing a method such as:
>>
>>interface Computable {
>>	public Object compute() throws Exception;
>>}
>>
>>public Object compute(Computable job) throws RemoteException;
>>
>>and using the Java RMI facilities, call this method on the server with 
>>an object that implements a method called "compute" that does the 
>>computation.
>>
>>Ofcourse , like you said, security and accounting issues will pose 
>>problems for a wide-spread installation.
>>    
>>
>
>
>This is the problem with sending code again, and why existing grid
>projects are quite complex (as I understand them).
>
>If you send low level code, security and accounting are not a problem. You
>just have to deal with load balancing, client selection etc..
>
>Thanks for your comments,
>
>Dan.
>
>
>  
>
>>Arnon
>>
>>Dan Bolser wrote:
>>
>>    
>>
>>>Hello,
>>>
>>>I had an idea to do with grid computing, but it may be total garbage.
>>>
>>>I heard about some clever people who started to 'steal' computation from
>>>unsuspecting web sites by hijacking the normal function of the site and
>>>co-opting its computations into a different program. 
>>>
>>>If these stories are true, surly we could do this with a bit more
>>>civility, and set up a bunch of generic 'calculators' through the web
>>>which could then be used for grid computing.
>>>
>>>The way I imagine the system is this... 
>>>
>>>Program starts by searching the web for calculators, the code is compiled
>>>for the 'web-engine' so every single instruction is encoded as an HTTP /
>>>CGI / XML request, and all instructions are performed over the web on a
>>>shifting number of calculators.
>>>
>>>Actually, I found something similar hear...
>>>
>>>http://ausweb.scu.edu.au/aw02/papers/refereed/kelly/paper.html
>>>
>>>I wanted to ask about the feasibility of such an idea. 
>>>
>>>For example if one machine sent all its instructions to another over a
>>>gigabit intra net, how much slower would this be than local computation?
>>>
>>>Is a gigabit LAN 1/2/3/10/100/1000 orders of magnitude slower than
>>>internal CPU communication channels?
>>>
>>>The power of an open source system like this would be if someone like
>>>Apache would take the idea on board and release it as part of its standard
>>>distribution. However, even if every web server on the web were running
>>>such a calculator (why not be ambitious), could the system be fast enough?
>>>
>>>
>>>Naturally there are a lot of issues regarding distribution / allocation /
>>>scheduling etc. but before we get into nasty details, is the idea remotely
>>>worth consideration?
>>>
>>>How difficult would it be to make a Java compiler accommodate such a 
>>>web-engine?
>>>
>>>Thanks very much for any feedback,
>>>
>>>Dan.
>>>
>>>_______________________________________________
>>>Bioclusters maillist  -  Bioclusters@bioinformatics.org
>>>https://bioinformatics.org/mailman/listinfo/bioclusters
>>>
>>>
>>> 
>>>
>>>      
>>>
>>_______________________________________________
>>Bioclusters maillist  -  Bioclusters@bioinformatics.org
>>https://bioinformatics.org/mailman/listinfo/bioclusters
>>
>>    
>>
>
>_______________________________________________
>Bioclusters maillist  -  Bioclusters@bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
>
>
>  
>