From: Jim P. <ji...@ya...> - 2007-04-16 19:12:31
|
Everyday, for the past few months, I've been getting Munin server notifications about smartd and sensor specs being in CRITICAL state (all zeros). I don't suspect any hardware problems, so I suspect something crazy is going on w/ Munin or the plugins. Here's the details. Over the past several days I have received both CRITICAL and OK notices at the following times: 1220 Monday 1010 Monday 1820 Sunday 1010 Sunday 0620 Sunday 2220 Saturday 1010 Saturday 0020 Saturday 1820 Friday 1220 Friday 1010 Friday 1820 Thursday 1220 Thursday 1010 Thursday The WARNING emails look like this: <domain> :: <hostname> :: S.M.A.R.T values for drive hda CRITICALs: Multi_Zone_Error_Rate is 0.00 (outside range [051:]), Calibration_Retry_Count is 0.00 (outside range [051:]), Raw_Read_Error_Rate is 0.00 (outside range [051:]), Spin_Up_Time is 0.00 (outside range [021:]), Seek_Error_Rate is 0.00 (outside range [051:]), Reallocated_Sector_Ct is 0.00 (outside range [140:]), Spin_Retry_Count is 0.00 (outside range [051:]). <domain> :: <hostname> :: Fans CRITICALs: CPU Fan is 0.00 (outside range [2500:]). <domain> :: <hostname> :: Voltages CRITICALs: +5V is 0.00 (outside range [4.75:5.25]), -12V is 0.00 (outside range [-15.00:-14.80]), +12V is 0.00 (outside range [10.82:13.19]), +3.3V is 0.00 (outside range [3.14:3.50]), VCore is 0.00 (outside range [1.25:1.50]), V5SB is 0.00 (outside range [4.76:5.24]). The OK emails look like this: <domain> :: <hostname> :: S.M.A.R.T values for drive hda OKs: Spin_Retry_Count is 100.00, Spin_Up_Time is 185.00, Raw_Read_Error_Rate is 200.00, Multi_Zone_Error_Rate is 200.00, Seek_Error_Rate is 200.00, Calibration_Retry_Count is 100.00, Reallocated_Sector_Ct is 200.00. <domain> :: <hostname> :: Fans OKs: CPU Fan is 3082.00. <domain> :: <hostname> :: Voltages OKs: +12V is 12.04, +5V is 5.04, +3.3V is 3.36, V5SB is 4.95, -12V is -14.91, VCore is 1.40. Additionally, in the same timeframes, I get these cron errors from munin-cron: Lock already exists: /var/run/munin/munin-update.lock. Dying. There are no significant crontab entries that fire at 10 or 20 minutes past the hour. Only munin-cron would be executing at */5 minutes. Additionally smartd.conf contains only this entry (i.e. tests are off): /dev/hda -H -p -l error -o off -S off -r 194 -I 7 Any ideas on how to resolve this? -Jim P. |