From: Caleb E. <cal...@gm...> - 2006-10-26 14:07:38
|
I'm using Ganglia to keep tabs on a cluster of Linux machines that do consistent heavy network I/O, among other things. Yesterday I had cause to look closely at the network graphs, notably the "Network last hour" graph and noticed that the Bytes/sec graphs occasionally dropped to zero during periods of what was otherwise heavy utilization. At the same time the "Packets last hour" would not drop to zero, indicating that data was still being sent/received. So I did some digging and it turns out that these local minima coincide with the 32-bit counters in the /proc/net/dev file rolling over. The code in libmetrics appears to detect this situation, but the result is that it gets a zero sample when this occurs (from libmetrics/linux/metrics.c): diff = bytes_in - last_bytes_in; val.f = 0.; if ( diff >= 1. ) { t = proc_net_dev.last_read - stamp; val.f = diff / t; } So val.f will be zero when bytes_in < last_bytes_in. Since these counters are known to be 32-bit unsigned integers, can't this code do a better job of calculating diff? Something like: diff = bytes_in - last_bytes_in; if (diff < 0. && last_bytes_in > 0.) diff += UINT_MAX; val.f = 0. // the rest as above Thoughts? -- Caleb Epstein |