Re: [Mon-devel] netappdiskfail.monitor and friends
Brought to you by:
trockij
From: Ed R. <er...@pa...> - 2008-03-11 00:27:05
|
On Mon, Mar 10, 2008 at 03:15:53PM -0800, Augie Schwer wrote: > I created a new monitor to check for failed disks in NetApps; it's > based on the netappfree.monitor. > > I meant to just check it into the devel branch, but I ended up putting > it in both stable and devel.; hopefully not a big deal. Since it's a new program, it won't destabilize any existing code, which to me is the prime definition for something "stable". While we're on the subject of netapps, let's talk about who's polling what. My shop's mon.m4 runs several tests against our NetApps: ping rpc.monitor netappfree.monitor netappraidstat.monitor snmpvar.monitor - tests these vars: Variable NETAPP_FAILED_FAN_COUNT OID .1.3.6.1.4.1.789.1.2.4.2.0 Description Number of Netapp Failed Fans is Variable NETAPP_FAILED_POWER_SUPPLY_COUNT OID .1.3.6.1.4.1.789.1.2.4.4.0 Description Number of Netapp Failed Power Supplies is Variable NETAPP_TEMPERATURE_WARNING OID .1.3.6.1.4.1.789.1.2.4.1.0 Description Netapp Temperature Warning Decode 1 no Decode 2 yes Variable NETAPP_RAID_SPARES OID .1.3.6.1.4.1.789.1.6.4.8.0 Description Number of Available Spare Disks is 1) Did we ever check in all of Todd's fixes to netappraidstat.monitor? I have a version he emailed me on 21 Dec 2005 after we got it working on with the elderly NetApps at my site. 2) Augie, reading over your new script, it looks like we could do the same thing by using snmpvar.monitor on NETWORK-APPLIANCE-MIB::diskFailedCount with an entry like this in snmpvar.def: Variable NETAPP_DISKS_FAILED OID .1.3.6.1.4.1.789.1.6.4.7.0 Description Number of Failed Disks is It would be nice if we could roll all this stuff up and make up a NetApp "howto" page for Mon. I would volunteer, except as mentioned above the NetApps I have access to are rather ancient, and we're phasing them out... |