[Bioclusters] gmond and high loads under Suse w/ 2.6 kernel?
Tim Cutts
bioclusters@bioinformatics.org
Thu, 4 Nov 2004 16:23:43 +0000
On 4 Nov 2004, at 4:02 pm, Alan Kilian wrote:
>
> Chris,
>
> I don't have any answers, but I think restating this might help
> people spot the problem.
>
>> When I turn off GANGLIA's gmon daemon, the load drops down to ordinary
>> rest states (0.1-ish). After some debugging to isolate the behavior,
>> there's clearly a causal link between gmond on the portal and these
>> high loads.
We saw load explode on our cluster at one point, with gmond processes
using 99% CPU. Network performance was awful. Some judicious use of
tcpdump revealed that a farm node had got itself into a strange state
where the machine had crashed, and was spewing multicast packets onto
the network at the rate of one a millisecond.
Unsurprisingly, the gmond processes on all the other nodes had a hard
time coping with this.
So: check what's going on on your network as well...
Tim
--
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233