[Bioclusters] gmond and high loads under Suse w/ 2.6 kernel?

Tim Cutts bioclusters@bioinformatics.org
Thu, 4 Nov 2004 16:23:43 +0000

On 4 Nov 2004, at 4:02 pm, Alan Kilian wrote:

>   Chris,
>     I don't have any answers, but I think restating this might help
>     people spot the problem.
>> When I turn off GANGLIA's gmon daemon, the load drops down to ordinary
>> rest states (0.1-ish).  After some debugging to isolate the behavior,
>> there's clearly a causal link between gmond on the portal and these
>> high loads.

We saw load explode on our cluster at one point, with gmond processes 
using 99% CPU.  Network performance was awful.  Some judicious use of 
tcpdump revealed that a farm node had got itself into a strange state 
where the machine had crashed, and was spewing multicast packets onto 
the network at the rate of one a millisecond.

Unsurprisingly, the gmond processes on all the other nodes had a hard 
time coping with this.

So:  check what's going on on your network as well...


Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233