From: Peter W. <pet...@gm...> - 2017-01-17 14:13:54
|
Hi Wim, 2017-01-16 15:42 GMT+01:00 W.J.M. Nelis <Wim...@nl...>: > Hello Peter, > > > There is one network object that is queried by two vm's (xymon-dev, > > xymon-acc) and one hw box. > > The two vm's are preps for a new xymon server with the latest and > greatest; > > the hw box is our current xymon prod box. > > > > Only the hw box works as expected. The two vm's are similar build and > > produces spikes at different times. > So the spikes seem to result from a timing problem. > Do you see messages containing "duplicate RRD data with same timestamp" in > /var/log/xymon/rrd-status.log? > No "duplicate RRD" messages found. I'm trying to see if a hardware box (the new Xymon server to be) produces the same behaviour. > > > What the three hosts have in common is the Xymon-html-output for this > > network device. Which shows normal output. The hidden devmon code is also > > normal. A batch job that queries the network device also produces normal > > output (your previous suggestion, as I recall). > You've shown the list of names of the interfaces, retrieved using snmpwalk. > Some interface names appear two times. This implies that two updates are > sent at the same time for some interfaces to RRD. From what I've seen, the > value at the second occurrence of an interface name is always zero. If that > one is handled before the other one, you'll get a spike. > That would be a good assumption. > > > So I have rrd files that when I look into these files show absurd values. > > But that could be perhaps an error in the processing of the hidden devmon > > code? Or an rrd anomaly? Or just my own 'unknown error'? > The problem does not occur at xymon running on bare metal. My guess is that > if you retrieve the list of interfaces from that machine, there are no > duplicate interface names in it. Thus, probably you're problem will be gone > if you move from a VM to bare metal. > The same devmon if_load-test is performed on bare metal, that produces the same hidden devices output with a 0 value for the duplicate devices. And yes, on this server, there are no spikes. That's whats makes it so weird to me. But I am going to implement the same test on the new hardware, that is waiting for deployment ;-) > > > > > What I just did: > > I tried to minimize the shown devices. So I managed to do so, by > re-using a > > field shown (Errors in) from the If_Err-test. > This I do not understand. > Sorry, I tried to exclude the duplicate (hidden) output by trying to restrict devmon. Because the if_err-output does not show the duplicate output. I stole some code and tried to apply this to the if_load test. It did not work, so I reverted the test. :-/ > > But underneath all the > > 64-bits interfaces are queried, so devmon output is still shown for all > the > > devices. > > > > <!--DEVMON RRD: if_load 0 0DS:ds0:COUNTER:600:0:U > > DS:ds1:COUNTER:600:0:Umgmt0 8879632404:16047447934loopback0 > > 15094564810:145416650Ethernet3_1 5145707654:5008405830Ethernet3_2 > > 275179849444513:598883108402034Ethernet3_3 > > 1100302946652053:951076190448071Ethernet3_4 > > 51407929508676:69081629766292Ethernet3_5 > > 823099309481:5352820011173Ethernet3_6 199285632:199285632Ethernet3_7 > > 199285632:199285632Ethernet3_8 199285632:199285632Ethernet4_1 > > 3118632864693475:491943590772051Ethernet4_2 > > 414769440784827:649329384941863Ethernet4_3 > > 103430757126278:415216974406768Ethernet4_4 > > 156480646041793:681504646556060Ethernet4_5 > > 299330487390116:1361606088437009Ethernet4_6 > > 45595170267863:313429801053492Ethernet4_7 > > 199285632:199285632Ethernet4_8 199285632:199285632Ethernet3_1 > > 0:0Ethernet3_2 0:0Ethernet3_3 0:0Ethernet4_1 0:0Ethernet4_2 > > 0:0Ethernet4_3 0:0Ethernet4_4 0:0Ethernet4_5 0:0Ethernet4_6 > > 0:0Ethernet3_4 0:0Ethernet3_5 0:0--> > > > > And now I will try your suggestion using exceptions file: > > > > [root@uhu-o if_load]# cat exceptions > > > > #ifName : alarm : .+ > > > > ifName : ignore : Nu.+|Vl.+|VLAN.+ > > > > ifHCInOctets : only : 0 > > Unfortunately the output remains the same as above (for all counters). > > Perhaps I'm missing something here...? > > > Sorry, I've made an error in this configuration snippet. The idea is to > ignore those interfaces which show an octet counter with value 0, as those > are (almost always) the duplicate interfaces. The snippet should read: > > ifHCInOctets : ignore : 0 > Done, but it did not work. I have to investigate more to make sure I'm using the correct syntax and name. Something for later this week... > > Which version of devmon are you using? > 0.3.1-beta1 old (on my old hobbit/xymon server, and development server) and the one from the recent tree (that works fine with Xymon with the new hosts.d/subdirectories). Both produce the same spikey result. > In the version at sourceforge, devmon-0.3.1-beta1.tar.gz, a parameter > 'all' is used in the rrd directive of the TABLE command in file message. In > a later patch, available via URL > https://sourceforge.net/p/devmon/code/HEAD/tree/trunk/, this has been > changed a bit. How does your message file looks like? > > Regards, > Wim Nelis. > The message file: <b>Interface error rates:</b> Input load: yellow={ifInLoad_T.thresh:yellow}%, red={ifInLoad_T.thresh:red}% Output load: yellow={ifOutLoad_T.thresh:yellow}%, red={ifOutLoad_T.thresh:red}% TABLE: noalarmsmsg,rrd(DS:ds0:ifHCInOctets:COUNTER; DS:ds1:ifHCOutOctets:COUNTER) Interface Name|Interface Speed|Rate in (load %)|Rate out (load %) {ifName}{ifAliasBox}|{ifHighSpeed_T}|{ifInLoad_T.color}{ifInSpeed_T} ({ifInLoad_T}%){ifInLoad_T.errors}|{ifOutLoad_T.color}{ifOutSpeed_T} ({ifOutLoad_T}%){ifOutLoad_T.errors} Waiting for the hardware deployment for the next step ;-) Peter |