|
From: <lor...@pn...> - 2010-04-28 09:49:30
|
Ok,
choosing just some interfaces for the switches (instead of all - more than
200 for the Cisco 4500s) has stopped the crashing issue.
Now, though, I have a few chronical purples - all of which used to be green
earlier on (when devmon was crashing at every cycle).
cpu disk fans if_stat info memory msgs
DUBBBRIIA yellow - green green green green -
DUBBBRIIB yellow - green purple green green -
DUBVG01 green - green green green green -
DUBVG02 purple - green green green purple -
dubdmz_sw01 green - purple green green green -
Any pointers at how to get this addressed?
Thanks
Loris
Loris
Serena/PFPC/DUB/P
NC To
dev...@li...
26/04/2010 14:26 t
cc
Subject
Devmon consistently crashing
(Document link: Loris Serena)
Hi,
I am running devmon, version 0.3.1-beta1 on an ESX VM CentOS 5.4 - talking
to a BB 1.9i server on the same box.
I am (trying to monitor) a few routers and switches and one NetApp filer
[bb@odyssey devmon]$ grep DEVMON ~bb/bb/etc/bb-hosts
x.y.z.1 compton #
DEVMON:tests(if_load,status,volume),except(volume;dfFileSys;na:.*.snapshot)
x.y.z.2 DUBBBRIIA # DEVMON:model(cisco;4500)
x.y.z.3 DUBBBRIIB # DEVMON:model(cisco;4500)
x.y.z.4 dubdmz_sw01 # DEVMON
x.y.z.5 I-DUBL-C01 #
DEVMON:tests(cpu,if_col,if_dsc,if_err,if_load,if_stat,power,serial)
x.y.z.6 I-DUBL-C02 # DEVMON
x.y.z.7 DUBVG01 # DEVMON
x.y.z.8 DUBVG02 # DEVMON
[bb@odyssey devmon]$
They all got discovered ok with the --readbbhosts option, except DUBVG01
and DUBVG02 (both 2811 routers).
Then, by running /usr/local/devmon/devmon, it all seems to start ok,
[bb@odyssey devmon]$ ps -ef | grep devmon
bb 8910 1 5 14:15 ? 00:00:01 devmon[master]
bb 8912 8910 0 14:15 ? 00:00:00 devmon
bb 8913 8910 2 14:15 ? 00:00:00 devmon
bb 8914 8910 0 14:15 ? 00:00:00 devmon
bb 8915 8910 0 14:15 ? 00:00:00 devmon
bb 8916 8910 0 14:15 ? 00:00:00 devmon
bb 8917 8910 2 14:15 ? 00:00:00 devmon
bb 8918 8910 0 14:15 ? 00:00:00 devmon
bb 8919 8910 0 14:15 ? 00:00:00 devmon
bb 8920 8910 0 14:15 ? 00:00:00 devmon
bb 8921 8910 0 14:15 ? 00:00:00 devmon
but all I see in the log file is
[10-04-26@13:40:32] ---Initilizing devmon...
[10-04-26@13:40:32] Node 0 reporting to localhost
[10-04-26@13:40:32] Running under process id: 19360
[10-04-26@13:40:32] Entering poll loop
[10-04-26@13:50:32] ---Initilizing devmon...
[10-04-26@13:50:32] Node 0 reporting to localhost
[10-04-26@13:50:32] Running under process id: 25080
[10-04-26@13:50:32] Entering poll loop
[10-04-26@13:55:31] ---Initilizing devmon...
[10-04-26@13:55:31] Node 0 reporting to localhost
[10-04-26@13:55:31] Running under process id: 28265
[10-04-26@13:55:31] Entering poll loop
[10-04-26@14:00:31] ---Initilizing devmon...
[10-04-26@14:00:31] Node 0 reporting to localhost
[10-04-26@14:00:31] Running under process id: 31631
[10-04-26@14:00:31] Entering poll loop
[10-04-26@14:05:32] ---Initilizing devmon...
[10-04-26@14:05:32] Node 0 reporting to localhost
[10-04-26@14:05:32] Running under process id: 1873
[10-04-26@14:05:32] Entering poll loop
[10-04-26@14:10:32] ---Initilizing devmon...
[10-04-26@14:10:32] Node 0 reporting to localhost
[10-04-26@14:10:32] Running under process id: 5703
[10-04-26@14:10:32] Entering poll loop
[10-04-26@14:15:31] ---Initilizing devmon...
[10-04-26@14:15:31] Node 0 reporting to localhost
[10-04-26@14:15:31] Running under process id: 8910
[10-04-26@14:15:31] Entering poll loop
[10-04-26@14:20:31] ---Initilizing devmon...
[10-04-26@14:20:31] Node 0 reporting to localhost
[10-04-26@14:20:31] Running under process id: 11400
[10-04-26@14:20:31] Entering poll loop
[bb@odyssey devmon]$
After a while, devmopn crashes, leaving just one orphan child process,
which takes a hell of a lot of CPU time.
[bb@odyssey devmon]$ ps -ef | grep devmon
bb 11411 1 56 14:20 ? 00:01:37 devmon
bb 13086 24440 0 14:23 pts/1 00:00:00 grep devmon
[bb@odyssey devmon]$ top
last pid: 13189; load avg: 0.99, 0.63, 0.46; up 0+16:15:43
14:23:45
108 processes: 2 running, 106 sleeping
CPU states: 37.6% user, 0.0% nice, 13.4% system, 48.9% idle, 0.0% iowait
Kernel: 22 ctxsw, 952 intr, 1 newproc
Memory: 776M used, 1235M free, 158M buffers, 394M cached
Swap: 5760M free
This terminal can only display 25 processes
PID USERNAME THR PRI NICE SIZE RES SHR STATE TIME CPU COMMAND
11411 bb 1 25 0 48M 14M 464K run 1:57 94.00% devmon
2018 bb 1 15 0 3988K 464K 300K sleep 1:06 1.00% bbd
18656 bb 3 15 0 26M 2608K 776K sleep 1:13 0.00%
routermon
6 root 1 10 -5 0K 0K 0K sleep 0:54 0.00% events/0
398 root 1 10 -5 0K 0K 0K sleep 0:38 0.00%
kjournald
1880 mysql 13 15 0 191M 26M 4444K sleep 0:25 0.00% mysqld
167 root 1 15 0 0K 0K 0K sleep 0:13 0.00% pdflush
15959 pg10326 1 15 0 88M 1968K 1200K sleep 0:04 0.00% sshd
2 root 1 -99 -5 0K 0K 0K sleep 0:02 0.00%
migration/0
4 root 1 -99 -5 0K 0K 0K sleep 0:02 0.00%
migration/1
12372 apache 1 15 0 177M 3104K 1208K sleep 0:01 0.00% httpd
8160 apache 1 15 0 177M 3096K 1212K sleep 0:01 0.00% httpd
18553 pg10326 1 15 0 89M 1960K 1200K sleep 0:01 0.00% sshd
1552 root 1 18 0 11M 384K 256K sleep 0:01 0.00%
irqbalance
3336 apache 1 15 0 177M 3092K 1204K sleep 0:01 0.00% httpd
3 root 1 34 19 0K 0K 0K sleep 0:01 0.00%
ksoftirqd/0
1782 root 1 15 0 61M 2332K 812K sleep 0:00 0.00% sendmail
7794 apache 1 15 0 177M 3088K 1204K sleep 0:00 0.00% httpd
1 root 1 15 0 10M 704K 592K sleep 0:00 0.00% init
11092 apache 1 15 0 177M 3076K 1204K sleep 0:00 0.00% httpd
456 root 1 15 -4 12M 784K 400K sleep 0:00 0.00% udevd
1913 root 1 16 0 62M 1212K 652K sleep 0:00 0.00% sshd
31 root 1 10 -5 0K 0K 0K sleep 0:00 0.00%
kblockd/1
30 root 1 10 -5 0K 0K 0K sleep 0:00 0.00%
kblockd/0
Any ideas about how to get this addressed?
Loris
|