Thread: Re: [Mon-devel] [Mon-commit] mon/mon.d netappdiskfail.monitor, 1.1, 1.2
Brought to you by:
trockij
From: Augie S. <aug...@gm...> - 2008-03-10 23:15:55
|
I created a new monitor to check for failed disks in NetApps; it's based on the netappfree.monitor. I meant to just check it into the devel branch, but I ended up putting it in both stable and devel.; hopefully not a big deal. --Augie On Mon, Mar 10, 2008 at 3:12 PM, Augie Schwer <as...@us...> wrote: > Update of /cvsroot/mon/mon/mon.d > In directory sc8-pr-cvs16.sourceforge.net:/tmp/cvs-serv21048 > > Added Files: > netappdiskfail.monitor > Log Message: > Created a new monitor to check for failed NetApp disks. > > --- NEW FILE: netappdiskfail.monitor --- > #!/usr/bin/perl > # > # Use SNMP to get disk failures from a NetApp. > # > # USAGE > # [--community=<SNMP COMMUNITY>] [--timeout=<seconds>] [--retries=<count>] host1 host2 ... > > # sample invocation in mon.cf, with local MIB directory for the Netapp MIB > # NETWORK-APPLIANCE-MIB.txt (copy from /etc/mib/netapp.mib on filer): > # service diskfree > # description test disk failure on Netapp filers > # depend SELF:ping > # MIBDIRS=/usr/local/share/snmp/mibs > # interval 7m > # monitor netappdiskfail.monitor > > # > # This requires the UCD SNMP library and G.S. Marzot's Perl SNMP > # module. > # > > use strict; > > use SNMP; > use Getopt::Long; > > $ENV{"MIBS"} = 'RFC1213-MIB:NETWORK-APPLIANCE-MIB'; > > my %opt = (); > > GetOptions (\%opt, "community=s", "timeout=i", "retries=i"); > > die "no host arguments\n" if (@ARGV == 0); > > my $RET = 0; > my @ERRS = (); > my %HOSTS = (); > > my $COMM = $opt{"community"} || "public"; > my $TIMEOUT = $opt{"timeout"} || 2; $TIMEOUT *= 1000 * 1000; > my $RETRIES = $opt{"retries"} || 5; > > my $sess; > > foreach my $host (@ARGV) { > if (!defined($sess = new SNMP::Session (DestHost => $host, > Timeout => $TIMEOUT, Community => $COMM, > Retries => $RETRIES, > Version => 1))) { > $RET = 1; > $HOSTS{$host} ++; > push (@ERRS, "could not create session to $host: " . $SNMP::Session::ErrorStr); > next; > } > > if ($sess->get('diskFailedCount.0') > 0) { > $HOSTS{$host} ++; > push (@ERRS, "Disk failure - " . $sess->get('diskFailedCount.0') . " failed disk(s)."); > push (@ERRS, $sess->get('diskFailedMessage.0')); > $RET = 1; > } > } > > if ($RET) { > print join(" ", sort keys %HOSTS), "\n\n", join("\n", @ERRS), "\n"; > } > > exit $RET; > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Mon-commit mailing list > Mon...@li... > https://lists.sourceforge.net/lists/listinfo/mon-commit > -- Augie Schwer - Augie@Schwer.us - http://schwer.us Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072 |
From: Ed R. <er...@pa...> - 2008-03-11 00:27:05
|
On Mon, Mar 10, 2008 at 03:15:53PM -0800, Augie Schwer wrote: > I created a new monitor to check for failed disks in NetApps; it's > based on the netappfree.monitor. > > I meant to just check it into the devel branch, but I ended up putting > it in both stable and devel.; hopefully not a big deal. Since it's a new program, it won't destabilize any existing code, which to me is the prime definition for something "stable". While we're on the subject of netapps, let's talk about who's polling what. My shop's mon.m4 runs several tests against our NetApps: ping rpc.monitor netappfree.monitor netappraidstat.monitor snmpvar.monitor - tests these vars: Variable NETAPP_FAILED_FAN_COUNT OID .1.3.6.1.4.1.789.1.2.4.2.0 Description Number of Netapp Failed Fans is Variable NETAPP_FAILED_POWER_SUPPLY_COUNT OID .1.3.6.1.4.1.789.1.2.4.4.0 Description Number of Netapp Failed Power Supplies is Variable NETAPP_TEMPERATURE_WARNING OID .1.3.6.1.4.1.789.1.2.4.1.0 Description Netapp Temperature Warning Decode 1 no Decode 2 yes Variable NETAPP_RAID_SPARES OID .1.3.6.1.4.1.789.1.6.4.8.0 Description Number of Available Spare Disks is 1) Did we ever check in all of Todd's fixes to netappraidstat.monitor? I have a version he emailed me on 21 Dec 2005 after we got it working on with the elderly NetApps at my site. 2) Augie, reading over your new script, it looks like we could do the same thing by using snmpvar.monitor on NETWORK-APPLIANCE-MIB::diskFailedCount with an entry like this in snmpvar.def: Variable NETAPP_DISKS_FAILED OID .1.3.6.1.4.1.789.1.6.4.7.0 Description Number of Failed Disks is It would be nice if we could roll all this stuff up and make up a NetApp "howto" page for Mon. I would volunteer, except as mentioned above the NetApps I have access to are rather ancient, and we're phasing them out... |
From: Augie S. <aug...@gm...> - 2008-03-11 16:35:58
|
On Mon, Mar 10, 2008 at 4:27 PM, Ed Ravin <er...@pa...> wrote: > 2) Augie, reading over your new script, it looks like we could do > the same thing by using snmpvar.monitor on > NETWORK-APPLIANCE-MIB::diskFailedCount with an entry like this in > snmpvar.def: Yeah, I thought about that, but shied away because I wanted to get the diskFailedMessage too; even though that may end up being totally useless. > It would be nice if we could roll all this stuff up and make up a NetApp > "howto" page for Mon. I would volunteer, except as mentioned above the > NetApps I have access to are rather ancient, and we're phasing them out... Well maybe you could write up the structure and I could fill it out; we have a handful of F760s and a couple FAS3020s. -- Augie Schwer - Augie@Schwer.us - http://schwer.us Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072 |