From: Silver, J. <jon...@un...> - 2014-05-30 17:42:40
|
I’m monitoring a set of different machines and most things look pretty good. However, on the top most ganglia page, one machine is reporting very high load in the summary section on the left: host (physical view) CPUs Total: 1 Hosts up: 1 Hosts down: 0 Current load Avg(15,5,1m) 144% 141% 155% Avg Utilization (last hour) 211% However the graphs don’t really show this at all and while lower graphs still have these wacky numbers on the left, all of the charts look ok. I did an rrddump and all of the numbers look low: <!-- 2014-05-30 16:00:00 UTC / 1401465600 --> <row><v>1.7556666667e-01</v></row> <!-- 2014-05-30 16:10:00 UTC / 1401466200 --> <row><v>1.7168333333e-01</v></row> <!-- 2014-05-30 16:20:00 UTC / 1401466800 --> <row><v>2.7051666667e-01</v></row> <!-- 2014-05-30 16:30:00 UTC / 1401467400 --> <row><v>1.9386666667e-01</v></row> <!-- 2014-05-30 16:40:00 UTC / 1401468000 --> <row><v>3.1000000000e-01</v></row> <!-- 2014-05-30 16:50:00 UTC / 1401468600 --> <row><v>3.0245000000e-01</v></row> 1. What is the exact definition of load? The only thing that I could find was number of processes per the number of CPUs. Is that the definition? That does not really make sense or is it the number of processes waiting for execution? So, I’m trying to understand what this means. Why do/does the details on the left, show that the machine is over utilized, but not when I look at the details. Is there a good definition of “load”. secondly, the graphs for CPU Nice show 100 m – 300 m (& %) on the left legend. What is the meaning of this range? I would have expected 0 – 100 %. Thanks, jonathan |