[Bioclusters] gmond and high loads under Suse w/ 2.6 kernel?

Joe Landman bioclusters@bioinformatics.org
Thu, 04 Nov 2004 10:40:57 -0500

Hi Chris:

  Usually you see loads climb when either

a) processes start fighting for shared resources
b) something is blocking file reads/writes

Most of the problems I have seen with daemons have been the latter.

Run vmstat 1, and see if under the "b" column, you have something 
blocking on IO.  I might also suggest strace.  Remember this mantra: 
"strace is your friend".  Run strace on the gmond process with the 
options for reporting timing.  It is amazing what you see stuff get 
stuck on.  Usually after an strace session I have narrowed down the 
problem tremendously.

Strace is your friend.  Not sure if I mentioned this.


ps: strace is your friend ...

Chris Dwan wrote:

> I'm working with a cluster which has unexplained high load values 
> (hovering between 1 and 2 with the system sitting idle) on the 
> portal.  It's a 32 node, 64 cpu opteron cluster, running SUSE, with 
> the 2.6 kernel.
> When I turn off GANGLIA's gmon daemon, the load drops down to ordinary 
> rest states (0.1-ish).  After some debugging to isolate the behavior, 
> there's clearly a causal link between gmond on the portal and these 
> high loads.
> Gmond does not appear to be taking very much cpu time, doesn't hang 
> out in "top", and otherwise doesn't seem to be the real problem.  The 
> cluster is relatively small (32 nodes).  If I turn off all of the 
> cluster gmond processes, the load drops some, but not all the way to a 
> rest state.
> The system is sluggish when the load reports high, but not as sluggish 
> as I might expect.
> Has anyone seen this before?  It's more annoying than anything else.  
> I'm tempted to blame "something in the kernel" and "multicast," but I 
> would love to have a more robust explanation.
> -Chris Dwan
>  The BioTeam
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 612 4615