From: <lor...@pn...> - 2010-04-28 09:49:30
|
Ok, choosing just some interfaces for the switches (instead of all - more than 200 for the Cisco 4500s) has stopped the crashing issue. Now, though, I have a few chronical purples - all of which used to be green earlier on (when devmon was crashing at every cycle). cpu disk fans if_stat info memory msgs DUBBBRIIA yellow - green green green green - DUBBBRIIB yellow - green purple green green - DUBVG01 green - green green green green - DUBVG02 purple - green green green purple - dubdmz_sw01 green - purple green green green - Any pointers at how to get this addressed? Thanks Loris Loris Serena/PFPC/DUB/P NC To dev...@li... 26/04/2010 14:26 t cc Subject Devmon consistently crashing (Document link: Loris Serena) Hi, I am running devmon, version 0.3.1-beta1 on an ESX VM CentOS 5.4 - talking to a BB 1.9i server on the same box. I am (trying to monitor) a few routers and switches and one NetApp filer [bb@odyssey devmon]$ grep DEVMON ~bb/bb/etc/bb-hosts x.y.z.1 compton # DEVMON:tests(if_load,status,volume),except(volume;dfFileSys;na:.*.snapshot) x.y.z.2 DUBBBRIIA # DEVMON:model(cisco;4500) x.y.z.3 DUBBBRIIB # DEVMON:model(cisco;4500) x.y.z.4 dubdmz_sw01 # DEVMON x.y.z.5 I-DUBL-C01 # DEVMON:tests(cpu,if_col,if_dsc,if_err,if_load,if_stat,power,serial) x.y.z.6 I-DUBL-C02 # DEVMON x.y.z.7 DUBVG01 # DEVMON x.y.z.8 DUBVG02 # DEVMON [bb@odyssey devmon]$ They all got discovered ok with the --readbbhosts option, except DUBVG01 and DUBVG02 (both 2811 routers). Then, by running /usr/local/devmon/devmon, it all seems to start ok, [bb@odyssey devmon]$ ps -ef | grep devmon bb 8910 1 5 14:15 ? 00:00:01 devmon[master] bb 8912 8910 0 14:15 ? 00:00:00 devmon bb 8913 8910 2 14:15 ? 00:00:00 devmon bb 8914 8910 0 14:15 ? 00:00:00 devmon bb 8915 8910 0 14:15 ? 00:00:00 devmon bb 8916 8910 0 14:15 ? 00:00:00 devmon bb 8917 8910 2 14:15 ? 00:00:00 devmon bb 8918 8910 0 14:15 ? 00:00:00 devmon bb 8919 8910 0 14:15 ? 00:00:00 devmon bb 8920 8910 0 14:15 ? 00:00:00 devmon bb 8921 8910 0 14:15 ? 00:00:00 devmon but all I see in the log file is [10-04-26@13:40:32] ---Initilizing devmon... [10-04-26@13:40:32] Node 0 reporting to localhost [10-04-26@13:40:32] Running under process id: 19360 [10-04-26@13:40:32] Entering poll loop [10-04-26@13:50:32] ---Initilizing devmon... [10-04-26@13:50:32] Node 0 reporting to localhost [10-04-26@13:50:32] Running under process id: 25080 [10-04-26@13:50:32] Entering poll loop [10-04-26@13:55:31] ---Initilizing devmon... [10-04-26@13:55:31] Node 0 reporting to localhost [10-04-26@13:55:31] Running under process id: 28265 [10-04-26@13:55:31] Entering poll loop [10-04-26@14:00:31] ---Initilizing devmon... [10-04-26@14:00:31] Node 0 reporting to localhost [10-04-26@14:00:31] Running under process id: 31631 [10-04-26@14:00:31] Entering poll loop [10-04-26@14:05:32] ---Initilizing devmon... [10-04-26@14:05:32] Node 0 reporting to localhost [10-04-26@14:05:32] Running under process id: 1873 [10-04-26@14:05:32] Entering poll loop [10-04-26@14:10:32] ---Initilizing devmon... [10-04-26@14:10:32] Node 0 reporting to localhost [10-04-26@14:10:32] Running under process id: 5703 [10-04-26@14:10:32] Entering poll loop [10-04-26@14:15:31] ---Initilizing devmon... [10-04-26@14:15:31] Node 0 reporting to localhost [10-04-26@14:15:31] Running under process id: 8910 [10-04-26@14:15:31] Entering poll loop [10-04-26@14:20:31] ---Initilizing devmon... [10-04-26@14:20:31] Node 0 reporting to localhost [10-04-26@14:20:31] Running under process id: 11400 [10-04-26@14:20:31] Entering poll loop [bb@odyssey devmon]$ After a while, devmopn crashes, leaving just one orphan child process, which takes a hell of a lot of CPU time. [bb@odyssey devmon]$ ps -ef | grep devmon bb 11411 1 56 14:20 ? 00:01:37 devmon bb 13086 24440 0 14:23 pts/1 00:00:00 grep devmon [bb@odyssey devmon]$ top last pid: 13189; load avg: 0.99, 0.63, 0.46; up 0+16:15:43 14:23:45 108 processes: 2 running, 106 sleeping CPU states: 37.6% user, 0.0% nice, 13.4% system, 48.9% idle, 0.0% iowait Kernel: 22 ctxsw, 952 intr, 1 newproc Memory: 776M used, 1235M free, 158M buffers, 394M cached Swap: 5760M free This terminal can only display 25 processes PID USERNAME THR PRI NICE SIZE RES SHR STATE TIME CPU COMMAND 11411 bb 1 25 0 48M 14M 464K run 1:57 94.00% devmon 2018 bb 1 15 0 3988K 464K 300K sleep 1:06 1.00% bbd 18656 bb 3 15 0 26M 2608K 776K sleep 1:13 0.00% routermon 6 root 1 10 -5 0K 0K 0K sleep 0:54 0.00% events/0 398 root 1 10 -5 0K 0K 0K sleep 0:38 0.00% kjournald 1880 mysql 13 15 0 191M 26M 4444K sleep 0:25 0.00% mysqld 167 root 1 15 0 0K 0K 0K sleep 0:13 0.00% pdflush 15959 pg10326 1 15 0 88M 1968K 1200K sleep 0:04 0.00% sshd 2 root 1 -99 -5 0K 0K 0K sleep 0:02 0.00% migration/0 4 root 1 -99 -5 0K 0K 0K sleep 0:02 0.00% migration/1 12372 apache 1 15 0 177M 3104K 1208K sleep 0:01 0.00% httpd 8160 apache 1 15 0 177M 3096K 1212K sleep 0:01 0.00% httpd 18553 pg10326 1 15 0 89M 1960K 1200K sleep 0:01 0.00% sshd 1552 root 1 18 0 11M 384K 256K sleep 0:01 0.00% irqbalance 3336 apache 1 15 0 177M 3092K 1204K sleep 0:01 0.00% httpd 3 root 1 34 19 0K 0K 0K sleep 0:01 0.00% ksoftirqd/0 1782 root 1 15 0 61M 2332K 812K sleep 0:00 0.00% sendmail 7794 apache 1 15 0 177M 3088K 1204K sleep 0:00 0.00% httpd 1 root 1 15 0 10M 704K 592K sleep 0:00 0.00% init 11092 apache 1 15 0 177M 3076K 1204K sleep 0:00 0.00% httpd 456 root 1 15 -4 12M 784K 400K sleep 0:00 0.00% udevd 1913 root 1 16 0 62M 1212K 652K sleep 0:00 0.00% sshd 31 root 1 10 -5 0K 0K 0K sleep 0:00 0.00% kblockd/1 30 root 1 10 -5 0K 0K 0K sleep 0:00 0.00% kblockd/0 Any ideas about how to get this addressed? Loris |