Menu

SMART logfail and End-to-End_Error

Help
mrmessyau
2017-01-04
2017-01-04
  • mrmessyau

    mrmessyau - 2017-01-04

    I recently had my system lock up and when I rebooted I got a BIOS southbridge overheat warning (I was testing a new overclock and have since gone back to stock). When I rebooted one of my data drives was not showing up in Windows 10. I rebooted again to look into the BIOS and all seemed fine so loaded up Windows again and it was back.

    I ran snapraid smart which gave me the below

       Temp  Power   Error   FP Size
          C OnDays   Count        TB  Serial           Device    Disk
     -----------------------------------------------------------------------
         29    171       0   6%  6.0  WD-WX11DB5NEY67  /dev/pd1  d0
         27   1119 logfail  35%  4.0  Z300HKZ4         /dev/pd5  d1
         26   1530       0  57%  2.0  WD-WCAZA5600902  /dev/pd2  d2
         28    164       0   5%  6.0  WD-WX31D9584RLT  /dev/pd0  d3
         30     47       0   4%  8.0  VKKVA7VY         /dev/pd3  parity
         21   1342       0  60%  0.8  S13UJ1KQ336842   /dev/pd4  -
          -      -       -  n/a    -  -                /dev/pd6  -
    

    So I ran smartctl -H /dev/pd5 which gave me the below

    === START OF READ SMART DATA SECTION ===
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       188763976
      3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   084   084   020    Old_age   Always       -       16686
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   069   060   030    Pre-fail  Always       -       8984969
      9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       26872
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       155
    183 Runtime_Bad_Block       0x0032   001   001   000    Old_age   Always       -       587
    184 End-to-End_Error        0x0032   099   099   099    Old_age   Always   FAILING_NOW 1
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    188 Command_Timeout         0x0032   100   098   000    Old_age   Always       -       8 8 13
    189 High_Fly_Writes         0x003a   098   098   000    Old_age   Always       -       2
    190 Airflow_Temperature_Cel 0x0022   073   054   045    Old_age   Always       -       27 (Min/Max 22/31)
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       101
    193 Load_Cycle_Count        0x0032   074   074   000    Old_age   Always       -       53531
    194 Temperature_Celsius     0x0022   027   046   000    Old_age   Always       -       27 (0 13 0 0 0)
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   200   193   000    Old_age   Always       -       10011
    240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       2929h+09m+01.793s
    241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       38427343299
    242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       172175676171
    

    How concerned should I be about this End-to-End_Error? I know it is the sign of a serious issue based on the below link however I feel that I know the cause (overclocked PC) so maybe I should have a plan to replace the drive but not need to do it urgently? What does everyone think?

    https://kb.acronis.com/content/9119

     

    Last edit: mrmessyau 2017-01-04
  • Leifi Plomeros

    Leifi Plomeros - 2017-01-04

    In your situation I would run snapraid scrub -p 100 -o 0

    If scrub completes without errors it means that you at least have no corruption on the data when it is being read from disk.

    After that I would copy some big files from another disk in the array to pd5.
    Then run snapraid diff to confirm that snapraid identifies the copied files as copies (instead of added files) and run snapraid sync

    If the sync also completes without errors I would write the entire thing off as a false positive caused by the overclocking.

    If however either of these tests results in checksum errors reported by snapraid I would consider the disk as completely unreliable and replace it as soon as possible.

     
    • mrmessyau

      mrmessyau - 2017-01-04

      Thanks Leifi,

      I'll give what you've suggested a try

       

      Last edit: mrmessyau 2017-01-04

Log in to post a comment.