From: Andres S. <dil...@at...> - 2005-05-06 20:27:35
|
Apologies if people get this twice; gmane is misbehaving (I posted this yesterday, and it still hasn't shown up yet). On Thu, 05 May 2005 18:12:23 -0400, Andres Salomon wrote: > Hi, >=20 > I have a 3ware raid controller with 4 (hotswap) drives attached. I wou= ld > like to be notified when one of those drives die, or are about to die.=20 > I have the following in smartd.conf: >=20 > /dev/hda -a -m dil...@at... -s S/../.././18 > /dev/sda -d 3ware,0 -a -d removable -m dil...@at... -s S/../..= /./18 > /dev/sda -d 3ware,1 -a -d removable -m dil...@at... -s S/../..= /./18 > /dev/sda -d 3ware,2 -a -d removable -m dil...@at... -s S/../..= /./18 > /dev/sda -d 3ware,3 -a -d removable -m dil...@at... -s S/../..= /./18 >=20 > I was forced to put the '-d removable' in there so that smartd would > actually start up if a device is removed. The ideal behavior, at least > for me, would be for smartd to send a notification if the device that i= t's > supposed to be monitoring doesn't exist. >=20 > When I start up smartd, it sees all 5 drives. However, when I pull one= of > the drives out in order to test it, smartd doesn't send a notification = out > about a drive disappearing. I figured that it would at least notify m= e > when it issued a test command (at 18:00:00), but it doesn't even do tha= t.=20 > Am I missing something? >=20 > Here's some logs from smartd: > May 5 17:58:02 fs0 smartd[480]: Configuration file /etc/smartd.conf pa= rsed. > May 5 17:58:02 fs0 smartd[480]: Device: /dev/hda, opened > May 5 17:58:02 fs0 smartd[480]: Device: /dev/hda, found in smartd data= base. > May 5 17:58:03 fs0 smartd[480]: Device: /dev/hda, is SMART capable. Ad= ding to "monitor" list. > May 5 17:58:03 fs0 smartd[480]: Device: /dev/sda [3ware_disk_00], open= ed > May 5 17:58:03 fs0 smartd[480]: Device: /dev/sda [3ware_disk_00], not = found in > smartd database. > May 5 17:58:03 fs0 smartd[480]: Device: /dev/sda [3ware_disk_00], is S= MART capable. Adding to "monitor" list. > May 5 17:58:03 fs0 smartd[480]: Device: /dev/sda [3ware_disk_01], open= ed > May 5 17:58:03 fs0 smartd[480]: Device: /dev/sda [3ware_disk_01], not = found in > smartd database. > May 5 17:58:04 fs0 smartd[480]: Device: /dev/sda [3ware_disk_01], is S= MART capable. Adding to "monitor" list. > May 5 17:58:04 fs0 smartd[480]: Device: /dev/sda [3ware_disk_02], open= ed > May 5 17:58:04 fs0 smartd[480]: Device: /dev/sda [3ware_disk_02], not = found in > smartd database. > May 5 17:58:05 fs0 smartd[480]: Device: /dev/sda [3ware_disk_02], is S= MART capable. Adding to "monitor" list. > May 5 17:58:05 fs0 smartd[480]: Device: /dev/sda [3ware_disk_03], open= ed > May 5 17:58:05 fs0 smartd[480]: Device: /dev/sda [3ware_disk_03], not = found in > smartd database. > May 5 17:58:06 fs0 smartd[480]: Device: /dev/sda [3ware_disk_03], is S= MART capable. Adding to "monitor" list. > May 5 17:58:06 fs0 smartd[480]: Monitoring 5 ATA and 0 SCSI devices > May 5 17:58:09 fs0 smartd[482]: smartd has fork()ed into background mo= de. New PID=3D482. > May 5 17:58:09 fs0 smartd[482]: file /var/run/smartd.pid written conta= ining PID 482 > May 5 17:58:19 fs0 smartd[482]: Device: /dev/hda, SMART Prefailure Att= ribute: 7 Seek_Error_Rate changed from 200 to 100 > May 5 17:58:19 fs0 smartd[482]: Device: /dev/hda, SMART Usage Attribut= e: 194 Temperature_Celsius changed from 117 to 116 > May 5 18:00:00 fs0 smartd[482]: Device: /dev/hda, starting scheduled S= hort Self-Test. > May 5 18:00:04 fs0 smartd[482]: Device: /dev/sda [3ware_disk_00], star= ting scheduled Short Self-Test. > May 5 18:00:04 fs0 smartd[482]: Device: /dev/sda [3ware_disk_01], not = capable of Offline or Self-Testing. > May 5 18:00:09 fs0 smartd[482]: Device: /dev/sda [3ware_disk_02], star= ting scheduled Short Self-Test. > May 5 18:00:13 fs0 smartd[482]: Device: /dev/sda [3ware_disk_03], star= ting scheduled Short Self-Test. > May 5 18:00:40 fs0 smartd[482]: Device: /dev/sda [3ware_disk_00], SMAR= T > Prefail >=20 > And so on. Here's what it logs when I start smartd w/ one of the drive= s > offline: >=20 > May 5 17:57:08 fs0 smartd[599]: Device: /dev/sda [3ware_disk_00], open= ed > May 5 17:57:08 fs0 smartd[599]: Device: /dev/sda [3ware_disk_00], not = found in > smartd database. > May 5 17:57:08 fs0 smartd[599]: Device: /dev/sda [3ware_disk_00], is S= MART capable. Adding to "monitor" list. > May 5 17:57:08 fs0 smartd[599]: Device: /dev/sda [3ware_disk_01], open= ed > May 5 17:57:08 fs0 smartd[599]: WARNING - NO DEVICE FOUND ON 3WARE CON= TROLLER (disk 1) > May 5 17:57:08 fs0 smartd[599]: Device: /dev/sda [3ware_disk_01], not = ATA, no IDENTIFY DEVICE Structure > May 5 17:57:08 fs0 smartd[599]: Unable to register ATA device /dev/sda= [3ware_disk_01] at line 85 of file /etc/smartd.conf > May 5 17:57:08 fs0 smartd[599]: Device /dev/sda [3ware_disk_01] not av= ailable > May 5 17:57:08 fs0 smartd[599]: Device: /dev/sda [3ware_disk_02], open= ed > May 5 17:57:08 fs0 smartd[599]: Device: /dev/sda [3ware_disk_02], not = found in > smartd database. > May 5 17:57:09 fs0 smartd[599]: Device: /dev/sda [3ware_disk_02], is S= MART capable. Adding to "monitor" list. >=20 >=20 > This happens w/ smartmontools 5.33 and 5.32. Oh, and starting it w/ '-= M > test' emails me, so I know that notifications work. |