Thread: [Smartmontools-support]smartmontools failure reporting problem

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Smartmontools has some problems reporting the failed status of this
disk. The disk isn't making it easy by the looks of it, but
smartmontools needs to IMHO pick up on failed selftests.
The disk is throwing read-errors left right and center, filesystem is
corrupted, and some data is lost while other data is still retrievable,
and that s**t-disk is still reporting
  SMART overall-health self-assessment test result: PASSED
Yeaaaa... right. I don't think so.

http://smartmontools.sourceforge.net/
version 5.1.4
SuSE 8.2, kernel 2.4.20

Device Model:     Maxtor 51536H2                          
Serial Number:    F2119J0C            
Firmware Version: JAC61HU0
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 0

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Off-line data collection status: (0x02)	Offline data collection activity 
					completed without error.
Self-test execution status:      ( 118)	The previous self-test completed having
					the read element of the test failed.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000a   253   252   000    Old_age      -       32
  3 Spin_Up_Time            0x0027   228   228   063    Pre-fail     -       7753
  4 Start_Stop_Count        0x0032   253   253   000    Old_age      -       327
  5 Reallocated_Sector_Ct   0x0033   212   212   063    Pre-fail     -       105
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail     -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age      -       0
  8 Seek_Time_Performance   0x0027   251   246   187    Pre-fail     -       40371
  9 Power_On_Hours          0x0032   247   247   000    Old_age      -       3639
 10 Spin_Retry_Count        0x002b   253   252   223    Pre-fail     -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail     -       0
 12 Power_Cycle_Count       0x0032   251   251   000    Old_age      -       1101
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age      -       0
197 Current_Pending_Sector  0x0008   237   237   000    Old_age      -       16
198 Offline_Uncorrectable   0x0008   252   252   000    Old_age      -       1
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age      -       0
201 Unknown_Attribute       0x000a   253   252   000    Old_age      -       1
202 Unknown_Attribute       0x000a   253   252   000    Old_age      -       0
203 Unknown_Attribute       0x000b   253   252   180    Pre-fail     -       3
204 Unknown_Attribute       0x000a   253   252   000    Old_age      -       0
205 Unknown_Attribute       0x000a   252   171   000    Old_age      -       2
207 Unknown_Attribute       0x002a   253   252   000    Old_age      -       0
208 Unknown_Attribute       0x002a   253   252   000    Old_age      -       0
209 Unknown_Attribute       0x0024   193   190   000    Old_age      -       0
 96 Unknown_Attribute       0x0004   253   253   000    Old_age      -       0
 97 Unknown_Attribute       0x0004   253   253   000    Old_age      -       0
 98 Unknown_Attribute       0x0004   253   253   000    Old_age      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age      -       0

SMART Self-test log, version number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short off-line      Completed: read failure       60%      2104         0x00007d94
# 2  Extended off-line   Completed: read failure       40%      2104         0x00007d94
# 3  Extended off-line   Completed: read failure       40%      2103         0x00007d94
# 4  Short off-line      Completed: read failure       60%      2103         0x00007d94
# 5  Short off-line      Completed: read failure       60%      2103         0x00007d94

Ok, let's see whether smartd would actually ring the alarm bells:

# SMART config is: 
# /dev/hda -d ata -S on -o on -a 
Sep 19 20:06:56 Rescue smartd[744]: smartd version 5.1-4: S.M.A.R.T. Monitoring Daemon 
Sep 19 20:06:56 Rescue smartd[744]: Home page is http://smartmontools.sourceforge.net/  
Sep 19 20:06:56 Rescue smartd[744]: Using configuration file /etc/smartd.conf 
Sep 19 20:06:56 Rescue smartd[746]: Device: /dev/hda, opened 
Sep 19 20:06:56 Rescue smartd[746]: Device: /dev/hda, enabled SMART Attribute Autosave. 
Sep 19 20:06:56 Rescue smartd[746]: Device: /dev/hda, enabled SMART Automatic Offline Testing. 
Sep 19 20:06:57 Rescue smartd[746]: Device: /dev/hda, is SMART capable. Adding to "monitor" list. 
Sep 19 20:06:57 Rescue smartd[746]: Started monitoring 1 ATA and 0 SCSI devices 
#
# running smartctl -t short here, which fails after 60% with read error
#
Sep 19 20:21:37 Rescue smartd[746]: Signal USR1 - checking devices now rather than in 929 seconds. 
Sep 19 20:21:37 Rescue smartd[746]: Device: /dev/hda, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 251 to 250 
Sep 19 20:21:37 Rescue smartd[746]: Device: /dev/hda, SMART Usage Attribute: 209 Unknown_Attribute changed from 193 to 192 
Sep 19 20:21:37 Rescue smartd[746]: Device: /dev/hda, Self-Test Log error count increased from 5 to 6 

Not really. smartd doesn't tell me that this disk is essentially already
dead. Especially, it should pick up on

  Self-test execution status:    ( 118)	The previous self-test completed having
					the read element of the test failed.

This leaves me with a question: smartd doesn't run any self-tests. Am I
supposed to set up cron jobs for that? It would be more sensible for
smartd to take care of it.

Thanks much for the software,

Volker

Thread: [Smartmontools-support]smartmontools failure reporting problem

Disk Inspection and Monitoring

smartmontools-support