Thanks, Fujitsu has some weird way of decoding this attribute (reallocated_sector_count) but we have an NDA with them so we will eventually find out what is the best way.

The thing I am most interested is in what other parameters can give a hint to a failing condition. This drive in particular isn't doing anything useful right now but the eventual goal is to predict a failing drive and copy its data before it fails.

Thanks


On 12/5/06, Bruce Allen <ballen@gravity.phys.uwm.edu> wrote:
I think your drive is OK.  The reallocated sector count value looks very
high (7D000000000 in base 16) but I think that the leading 7D can be
ignored.

Phil, do you agree?

Cheers,
        Bruce


On Tue, 5 Dec 2006, Saqib bin Sohail wrote:

> Hi Guys
>
> I would like to know when the health of my drive is going down. From the
> archives I have found out that if the Max temperature is beyond certain
> threshold and Reallocated_Sector_Count is beyond certain threshold then the
> drive should be replaced. Below is the smart report of one of the drives and
> I would like to know if there are other attributes which could indicate the
> bad health of the drive.
>
> Thanks
>
>
>
> smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce
> Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Device Model:     FUJITSU MHV2040BH
> Serial Number:    NW96T6425AGY
> Firmware Version: 00000029
> User Capacity:    40,007,761,920 bytes
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   7
> ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
> Local Time is:    Tue Dec  5 12:07:45 2006 MST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status:  (0x00) Offline data collection activity
>                                       was never started.
>                                       Auto Offline Data Collection:
> Disabled.
> Self-test execution status:      (   0) The previous self-test routine
> completed
>                                       without error or no self-test has
> ever
>                                       been run.
> Total time to complete Offline
> data collection:                 ( 240) seconds.
> Offline data collection
> capabilities:                    (0x7b) SMART execute Offline immediate.
>                                       Auto Offline data collection on/off
> support.
>                                       Suspend Offline collection upon new
>                                       command.
>                                       Offline surface scan supported.
>                                       Self-test supported.
>                                       Conveyance Self-test supported.
>                                       Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before entering
>                                       power-saving mode.
>                                       Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                       General Purpose Logging supported.
> Short self-test routine
> recommended polling time:        (   2) minutes.
> Extended self-test routine
> recommended polling time:        (  28) minutes.
> Conveyance self-test routine
> recommended polling time:        (   2) minutes.
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate     0x000f   100   100   046    Pre-fail
> Always       -       113258
> 2 Throughput_Performance  0x0005   100   100   030    Pre-fail
> Offline      -       12058624
> 3 Spin_Up_Time            0x0003   100   100   025    Pre-fail
> Always       -       1
> 4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       165
> 5 Reallocated_Sector_Ct   0x0033   100   100   024    Pre-fail
> Always       -       8589934592000
> 7 Seek_Error_Rate         0x000f   100   100   047    Pre-fail
> Always       -       3965
> 8 Seek_Time_Performance   0x0005   100   100   019    Pre-fail
> Offline      -       0
> 9 Power_On_Hours          0x0032   094   094   000    Old_age
> Always       -       11667700
> 10 Spin_Retry_Count        0x0013   100   100   020    Pre-fail
> Always       -       0
> 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       165
> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
> Always       -       29
> 193 Load_Cycle_Count        0x0032   100   100   000    Old_age
> Always       -       1225
> 194 Temperature_Celsius     0x0022   100   100   000    Old_age
> Always       -       30 (Lifetime Min/Max 20/49)
> 195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age
> Always       -       850
> 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age
> Always       -       453967872
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x000f   100   100   060    Pre-fail
> Always       -       0
> 203 Run_Out_Cancel          0x0002   100   100   000    Old_age
> Always       -       1529030378498
> 240 Head_Flying_Hours       0x003e   200   200   000    Old_age
> Always       -       0
>
> SMART Error Log Version: 1
> ATA Error Count: 3866 (device log contains only the most recent five errors)
>       CR = Command Register [HEX]
>       FR = Features Register [HEX]
>       SC = Sector Count Register [HEX]
>       SN = Sector Number Register [HEX]
>       CL = Cylinder Low Register [HEX]
>       CH = Cylinder High Register [HEX]
>       DH = Device/Head Register [HEX]
>       DC = Device Command Register [HEX]
>       ER = Error register [HEX]
>       ST = Status register [HEX]
> Powered_Up_Time is measured from power on, and printed as
> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> SS=sec, and sss=millisec. It "wraps" after 49.710 days.
>
> Error 3866 occurred at disk power-on lifetime: 2569 hours (107 days + 1
> hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 84 51 60 44 71 55 40  Error: ICRC, ABRT 96 sectors at LBA = 0x00557144 =
> 5599556
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
> -- -- -- -- -- -- -- --  ----------------  --------------------
> 25 00 00 a4 6f 55 40 00   1d+04:01:50.324  READ DMA EXT
> 25 00 00 54 58 09 40 00   1d+04:01:50.288  READ DMA EXT
> 25 00 00 02 da cf 40 00   1d+04:01: 50.239  READ DMA EXT
> 25 00 00 4e 70 27 40 00   1d+04:01:50.184  READ DMA EXT
> 25 00 00 cc 1c 3f 40 00   1d+04:01:50.142  READ DMA EXT
>
> Error 3865 occurred at disk power-on lifetime: 2441 hours (101 days + 17
> hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 84 51 20 34 7b cd 40  Error: ICRC, ABRT 32 sectors at LBA = 0x00cd7b34 =
> 13466420
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
> -- -- -- -- -- -- -- --  ----------------  --------------------
> 25 00 00 54 79 cd 40 00  11d+11:50:33.694  READ DMA EXT
> 25 00 00 30 da 24 40 00  11d+11:50:33.657  READ DMA EXT
> 25 00 00 5c 15 fb 40 00  11d+11:50: 33.613  READ DMA EXT
> 25 00 00 52 38 2c 40 00  11d+11:50:33.568  READ DMA EXT
> 25 00 00 fc 4a b7 40 00  11d+11:50:33.534  READ DMA EXT
>
> Error 3864 occurred at disk power-on lifetime: 2387 hours (99 days + 11
> hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 84 51 10 69 91 05 40  Error: ICRC, ABRT 16 sectors at LBA = 0x00059169 =
> 364905
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
> -- -- -- -- -- -- -- --  ----------------  --------------------
> 25 00 00 79 8f 05 40 00   9d+05:52:17.716  READ DMA EXT
> 25 00 00 35 aa 6c 40 00   9d+05:52:17.662  READ DMA EXT
> 25 00 00 d4 ea 08 40 00   9d+05:52: 17.636  READ DMA EXT
> 25 00 00 c2 43 c4 40 00   9d+05:52:17.606  READ DMA EXT
> 25 00 00 b7 2e 05 40 00   9d+05:52:17.568  READ DMA EXT
>
> Error 3863 occurred at disk power-on lifetime: 2367 hours (98 days + 15
> hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 84 51 50 09 ae 86 40  Error: ICRC, ABRT 80 sectors at LBA = 0x0086ae09 =
> 8826377
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
> -- -- -- -- -- -- -- --  ----------------  --------------------
> 25 00 00 59 ad 86 40 00   8d+10:12:50.647  READ DMA EXT
> 25 00 00 1c 1f 2c 40 00   8d+10:12:50.597  READ DMA EXT
> 25 00 00 e0 6b 34 40 00   8d+10:12: 50.547  READ DMA EXT
> 25 00 00 e4 30 2f 40 00   8d+10:12:50.519  READ DMA EXT
> 25 00 00 3c 51 fc 40 00   8d+10:12:50.486  READ DMA EXT
>
> Error 3862 occurred at disk power-on lifetime: 2367 hours (98 days + 15
> hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 84 51 50 72 ff 81 40  Error: ICRC, ABRT 80 sectors at LBA = 0x0081ff72 =
> 8519538
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
> -- -- -- -- -- -- -- --  ----------------  --------------------
> 25 00 00 c2 fe 81 40 00   8d+09:15:31.070  READ DMA EXT
> 25 00 00 a2 1a 2c 40 00   8d+09:15:31.027  READ DMA EXT
> 25 00 00 e8 67 21 40 00   8d+09:15: 30.987  READ DMA EXT
> 25 00 00 eb d2 2f 40 00   8d+09:15:30.945  READ DMA EXT
> 25 00 00 28 d9 18 40 00   8d+09:15:30.915  READ DMA EXT
>
> SMART Self-test log structure revision number 1
> No self-tests have been logged.  [To run self-tests, use: smartctl -t]
>
>
> SMART Selective self-test log data structure revision number 1
> SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>   1        0        0  Not_testing
>   2        0        0  Not_testing
>   3        0        0  Not_testing
>   4        0        0  Not_testing
>   5        0        0  Not_testing
> Selective self-test flags (0x0):
> After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>
>
>



--
Saqib bin Sohail
University of Colorado at Boulder
(303) 786 0636
http://ucsu.colorado.edu/~sohail/index.html