SnapRAID / Discussion / Help: SMART logfail and End-to-End

I recently had my system lock up and when I rebooted I got a BIOS southbridge overheat warning (I was testing a new overclock and have since gone back to stock). When I rebooted one of my data drives was not showing up in Windows 10. I rebooted again to look into the BIOS and all seemed fine so loaded up Windows again and it was back.

I ran snapraid smart which gave me the below

   Temp  Power   Error   FP Size
      C OnDays   Count        TB  Serial           Device    Disk
 -----------------------------------------------------------------------
     29    171       0   6%  6.0  WD-WX11DB5NEY67  /dev/pd1  d0
     27   1119 logfail  35%  4.0  Z300HKZ4         /dev/pd5  d1
     26   1530       0  57%  2.0  WD-WCAZA5600902  /dev/pd2  d2
     28    164       0   5%  6.0  WD-WX31D9584RLT  /dev/pd0  d3
     30     47       0   4%  8.0  VKKVA7VY         /dev/pd3  parity
     21   1342       0  60%  0.8  S13UJ1KQ336842   /dev/pd4  -
      -      -       -  n/a    -  -                /dev/pd6  -

So I ran smartctl -H /dev/pd5 which gave me the below

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       188763976
  3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   084   084   020    Old_age   Always       -       16686
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   069   060   030    Pre-fail  Always       -       8984969
  9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       26872
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       155
183 Runtime_Bad_Block       0x0032   001   001   000    Old_age   Always       -       587
184 End-to-End_Error        0x0032   099   099   099    Old_age   Always   FAILING_NOW 1
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   098   000    Old_age   Always       -       8 8 13
189 High_Fly_Writes         0x003a   098   098   000    Old_age   Always       -       2
190 Airflow_Temperature_Cel 0x0022   073   054   045    Old_age   Always       -       27 (Min/Max 22/31)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       101
193 Load_Cycle_Count        0x0032   074   074   000    Old_age   Always       -       53531
194 Temperature_Celsius     0x0022   027   046   000    Old_age   Always       -       27 (0 13 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   193   000    Old_age   Always       -       10011
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       2929h+09m+01.793s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       38427343299
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       172175676171

How concerned should I be about this End-to-End_Error? I know it is the sign of a serious issue based on the below link however I feel that I know the cause (overclocked PC) so maybe I should have a plan to replace the drive but not need to do it urgently? What does everyone think?

https://kb.acronis.com/content/9119

Last edit: mrmessyau 2017-01-04

In your situation I would run snapraid scrub -p 100 -o 0

If scrub completes without errors it means that you at least have no corruption on the data when it is being read from disk.

After that I would copy some big files from another disk in the array to pd5.
Then run snapraid diff to confirm that snapraid identifies the copied files as copies (instead of added files) and run snapraid sync

If the sync also completes without errors I would write the entire thing off as a false positive caused by the overclocking.

If however either of these tests results in checksum errors reported by snapraid I would consider the disk as completely unreliable and replace it as soon as possible.

SMART logfail and End-to-End_Error

A backup program for disk arrays

Forums

Help

SMART logfail and End-to-End_Error

SMART logfail and End-to-End_Error

A backup program for disk arrays

Forums

Help

SMART logfail and End-to-End_Error document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

SMART logfail and End-to-End_Error