[BiO BB] Re: [Bioclusters] General question on time consuming problems
Tim Cutts
tjrc at sanger.ac.uk
Fri Apr 22 05:34:32 EDT 2005
On 20 Apr 2005, at 7:22 pm, George White wrote:
> The other problem is that many the real-world clusters are lucky to get
> 50% uptime. The one down the hall was fried when the A/C died. They
> fixed all that, took a couple weeks to get a new A/C installed, and
> then a
> cable to the RAID stopped working, so now they have to get the cable
> and
> hope the files weren't damaged. You hear the success stories from
> people
> who have been lucky with A/C hardware, etc., but there are also lots of
> cluster owners who are swamped by the upkeep and or poorly maintained
> physical plant (power problems, A/C, etc.).
But then, as you say, if your problem is really embarrassingly
parallel, and you code it right, losing a few nodes here and there
isn't a problem. One of the nice things about embarrassingly parallel
problems is that they tend to allow for gradual loss of capacity. It's
quite useful for us; it allows us to wait for a number of nodes to fail
before we batch them up and send them back for repair. This saves a
lot of money in support costs, as well as effort on our part.
Tim
--
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233
More information about the BBB
mailing list