So, I changed all my hosts to "v2" snmp polls yesterday
and it ran all night. This morning the graphs were beautiful.
This morning I wanted to see if I could recreate the problem
and so I changed teh hosts back to "v1" and the gaps in the
graphs returned.
(note, that there could be an unidentified third cause?)
Anyway the really odd thing is that here is the output from
my rrd.log for one disk monitored on one host:
[root@... scripts]# grep remedy01 ../log/rrd.log | grep \
hdd | egrep "9.1.7.1|9.1.8.1"
...
08/20/2004 09:10 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 09:15 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 09:15 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 09:20 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 09:20 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 09:25 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 09:25 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 09:30 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 09:30 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 09:35 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 09:35 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 09:40 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 09:40 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 09:45 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 09:45 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 09:50 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 09:50 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 09:55 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 09:55 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 10:00 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 10:00 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 10:05 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 10:05 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 10:10 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 10:10 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 10:15 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216684
08/20/2004 10:15 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524428
08/20/2004 10:20 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216588
08/20/2004 10:20 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524524
08/20/2004 10:25 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216604
08/20/2004 10:25 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524508
08/20/2004 10:30 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216620
08/20/2004 10:30 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524492
08/20/2004 10:35 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216636
08/20/2004 10:35 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524476
08/20/2004 10:40 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216656
08/20/2004 10:40 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524456
08/20/2004 10:45 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216660
08/20/2004 10:45 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524452
08/20/2004 10:50 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_used, oid:
.1.3.6.1.4.1.2021.9.1.8.1, output: 1216660
08/20/2004 10:50 AM - CMDPHP: Poller[0] Host[5] SNMP: v1:
remedy01.reston.tnsi.com, dsname: hdd_free, oid:
.1.3.6.1.4.1.2021.9.1.7.1, output: 4524452
As you can see from the rrd.log the OID was polled successfully every time.
The output looks beautiful.
However... here is the output from rrdtool on ths file:
[root@... rra]# rrdtool dump \
remedy01_primary_remedy_app_server_hdd_free_96.rrd | grep "2004-08-20 "
...
<!-- 2004-08-20 09:10:00 EDT / 1093007400 --> <row><v> 4.5244280000e+06
</v><v> 1.2166840000e+06 </v></row>
<!-- 2004-08-20 09:15:00 EDT / 1093007700 --> <row><v> 4.5244280000e+06
</v><v> 1.2166840000e+06 </v></row>
<!-- 2004-08-20 09:20:00 EDT / 1093008000 --> <row><v> 4.5244280000e+06
</v><v> 1.2166840000e+06 </v></row>
<!-- 2004-08-20 09:25:00 EDT / 1093008300 --> <row><v> 4.5244280000e+06
</v><v> 1.2166840000e+06 </v></row>
<!-- 2004-08-20 09:30:00 EDT / 1093008600 --> <row><v> 4.5244280000e+06
</v><v> 1.2166840000e+06 </v></row>
<!-- 2004-08-20 09:35:00 EDT / 1093008900 --> <row><v> 4.5244280000e+06
</v><v> 1.2166840000e+06 </v></row>
<!-- 2004-08-20 09:40:00 EDT / 1093009200 --> <row><v> 4.5244280000e+06
</v><v> 1.2166840000e+06 </v></row>
<!-- 2004-08-20 09:45:00 EDT / 1093009500 --> <row><v> 4.5244280000e+06
</v><v> 1.2166840000e+06 </v></row>
<!-- 2004-08-20 09:50:00 EDT / 1093009800 --> <row><v> 4.5244280000e+06
</v><v> 1.2166840000e+06 </v></row>
<!-- 2004-08-20 09:55:00 EDT / 1093010100 --> <row><v> 4.5244280000e+06
</v><v> 1.2166840000e+06 </v></row>
<!-- 2004-08-20 10:00:00 EDT / 1093010400 --> <row><v> NaN </v><v> NaN
</v></row>
<!-- 2004-08-20 10:05:00 EDT / 1093010700 --> <row><v> NaN </v><v> NaN
</v></row>
<!-- 2004-08-20 10:10:00 EDT / 1093011000 --> <row><v> NaN </v><v> NaN
</v></row>
<!-- 2004-08-20 10:15:00 EDT / 1093011300 --> <row><v> 4.5245240000e+06
</v><v> 1.2165880000e+06 </v></row>
<!-- 2004-08-20 10:20:00 EDT / 1093011600 --> <row><v> 4.5245240000e+06
</v><v> 1.2165880000e+06 </v></row>
<!-- 2004-08-20 10:25:00 EDT / 1093011900 --> <row><v> NaN </v><v> NaN
</v></row>
<!-- 2004-08-20 10:30:00 EDT / 1093012200 --> <row><v> NaN </v><v> NaN
</v></row>
<!-- 2004-08-20 10:35:00 EDT / 1093012500 --> <row><v> NaN </v><v> NaN
</v></row>
<!-- 2004-08-20 10:40:00 EDT / 1093012800 --> <row><v> NaN </v><v> NaN
</v></row>
<!-- 2004-08-20 10:45:00 EDT / 1093013100 --> <row><v> NaN </v><v> NaN
</v></row>
<!-- 2004-08-20 10:50:00 EDT / 1093013400 --> <row><v> NaN </v><v> NaN
</v></row>
...
Right now I am going to change the poller back to "v2" for all the hosts
and see if the graph data problem again disappears. After 20-30 minutes
on "v2" the gaps in the graphs are gone.
|