[Bioclusters] Question about grid
Chris Dwan
bioclusters@bioinformatics.org
Thu, 13 May 2004 08:41:32 -0500
> I heard about some clever people who started to 'steal' computation
> from
> unsuspecting web sites by hijacking the normal function of the site and
> co-opting its computations into a different program.
This is certainly an interesting idea, but I doubt that it will have
practical application in terms of solving real scientific computing
issues. The problems are that the computing power available is rather
small, the security and trust issues are rather large, and the overall
gain (vs. buying a computer of one's own) is almost certainly negative.
I file this in the same bin with the people who want to store data for
half an hour by bouncing radio waves off of Mars, people who do
arithmetic using the sign bits on ethernet cards, people who store data
for seconds at a time by modulating the signal on their long haul FC
loop, and the like. Fascinating games, but you get more juice out of
one of those $900 desktops we were talking about earlier.
> If these stories are true, surly we could do this with a bit more
> civility, and set up a bunch of generic 'calculators' through the web
> which could then be used for grid computing.
The recent merge between the web services and the grid folks gives me
hope that something similar to this will be possible, but at the level
of workflows which integrate applications, rather than applications
which integrate instructions.
I don't know anyone who is willing to open up their servers to a truly
anonymous user, for truly arbitrary computation. Keep in mind the
security concerns surrounding simple things like open network relays
and anonymous ftp. The real power will be as we begin to converge on
some standards for Remote Method Invocation (RMI) (web services are an
example), authentication and authorization (SSL certs seem to be the
order of the day), resource specification and discovery (the GGF and
bioMOBY folks have put a lot of thought into this), and the like.
I believe that many of us would be willing to, and in fact want to,
offer our special algorithm, tool, or dataset via "the grid" rather
than just as a web page. We already have "federating" solutions to
database interoperability, but it's difficult with current tools to
specify things like expiration dates, update frequency, and API changes
vs. transient errors. Many people are already offering very generic
SOAP APIs to their applications. This is a start, and there will be
much more.
In terms of anonymous vs. authorized users: As it becomes more and
more costly for me to allow someone else to use my resources, I would
like to be able to throttle their usage (or at least their priority)
according to my relationship with them. It's a lot easier to form
collaborations once we have a standard way of allowing our software to
work together. This is the "virtual organization" idea spoken of by
the grid folks. Of course, this is far from a new idea: Organizations
realized thousands of years ago that standardizing communication
channels and components made them much more efficient.
I think that our sharing will need to be deliberate and mindful of the
human issues involved. It's much easier to justify (at the level of a
CIO) that we're making a technology decision because it will make it
easier for us to collaborate with our peers, than to justify installing
a piece of software that will make our servers available for the entire
planet to use anonymously.
All that said: I'm quite excited to share the resources at my
disposal. I have two small clusters, a pile of software, and a bunch
of databases. If we define a few applications / data resources, I
would be happy to try to get a web service / OGSA / SOAP / RMI version
of them up. The thing I won't do again is a "hello world" into the
void. We need to focus on the useful and the needed.
Are people sitting on their hands, wishing for some particular
resource? We've got bio-mirror and the like. We've got net-blast.
What's the next step, and how can I help?
-Chris Dwan