Menu

RELEASE CANDIDATE for 8.0

Help
2015-04-05
2015-04-21
1 2 > >> (Page 1 of 2)
  • Andrea Mazzoleni

    Hi,

    I prepared a release candidate for 8.0 at: http://snapraid.sourceforge.net/rc/

    The full list of changes is: https://github.com/amadvance/snapraid/blob/master/HISTORY

    I'm mainly interested in comments on the new "up", "down" and "smart" commands. They are intended to spin-up, spin-down, and print a SMART report of the array.

    To have them working in Linux, you must have smartctl and hdparm already installed. In Windows, they are provided in the SnapRAID package. In both cases, to get full functionality you must run as root/Administrator.

    There is also a new "test-devices" command, that prints the disk mapping that SnapRAID see, with low level devices used by each disk in the array.

    These new commands don't make any change, so you can test them even still using SnapRAID 7.1.

    Ciao,
    Andrea

     
  • Leifi Plomeros

    Leifi Plomeros - 2015-04-05

    Works great on the motherboard SATA ports.

    D700, and the parity disks are correctly represented as different physical devices on low level with correct size! :)

    All other values also seem correct, including correctly identifing the system disk SSD.

    C:\Snapraid>snapraid smart
    SnapRAID SMART report:
    
       Temp  Power   Error   FP Size
          C OnDays   Count        TB  Serial           Device     Disk
     -----------------------------------------------------------------------
    
          -      -       -   0%    -  -                /dev/pd9   d100
         34    110       0   5%  6.0  WD-WXL1H644XT4T  /dev/pd4   d200
         33    253       0   5%  4.0  WD-WCC4E0900103  /dev/pd2   d300
          -      -       -   0%    -  -                /dev/pd7   d400
         35    260       0   5%  4.0  WD-WCC4E0876247  /dev/pd5   d500
         32    262       0   5%  4.0  WD-WCC4E0883103  /dev/pd3   d600
          -      -       -   0%    -  -                /dev/pd10  d700
          -      -       -   0%    -  -                /dev/pd8   d700
          -      -       -   0%    -  -                /dev/pd6   d800
         35    956      19  42%  2.0  S1UYJ1RZ515380   /dev/pd0   parity
          -      -       -   0%    -  -                /dev/pd11  parity
          -      -       -   0%    -  -                /dev/pd13  2-parity
          -      -       -   0%    -  -                /dev/pd12  2-parity
         42    216       0  SSD  0.5  S1DHNSAF405907K  /dev/pd1   -
    
    The FP column is the estimated probability (in percentage) that the disk
    is going to fail in the next year.
    
    Probability that at least one disk is going to fail in the next year is 52%.
    

    The other disks are connected to LSI 9211-8i which require Smartctl parameters: -d sat

    Any chance that you could allow passing of that parameter? Or even try both alternatives with and without the parameter and only present the successfull results in the table?

     
    • Andrea Mazzoleni

      Hi Leifi,

      I think that I can add the possibility to specify a manual "-d" option that should be applied to some specific disks.

      But to better understand the issue, could you please try the following commands, and report their output ?

      smartctl --scan-open -d pd
      smartctl --scan-open -d ata,pd
      smartctl --scan-open -d scsi,pd
      smartctl --scan-open -d usb,pd

      Thanks,
      Andrea

       
      • Leifi Plomeros

        Leifi Plomeros - 2015-04-06

        C:\Snapraid>smartctl --scan-open -d pd
        /dev/pd0 -d ata # /dev/pd0, ATA device
        /dev/pd1 -d ata # /dev/pd1, ATA device
        /dev/pd2 -d ata # /dev/pd2, ATA device
        /dev/pd3 -d ata # /dev/pd3, ATA device
        /dev/pd4 -d ata # /dev/pd4, ATA device
        /dev/pd5 -d ata # /dev/pd5, ATA device
        /dev/pd6 -d scsi # /dev/pd6, SCSI device
        /dev/pd7 -d scsi # /dev/pd7, SCSI device
        /dev/pd8 -d scsi # /dev/pd8, SCSI device
        /dev/pd9 -d scsi # /dev/pd9, SCSI device
        /dev/pd10 -d scsi # /dev/pd10, SCSI device
        /dev/pd11 -d scsi # /dev/pd11, SCSI device
        /dev/pd12 -d scsi # /dev/pd12, SCSI device
        /dev/pd13 -d scsi # /dev/pd13, SCSI device

        C:\Snapraid>smartctl --scan-open -d ata,pd
        /dev/pd0 -d ata # /dev/pd0, ATA device
        /dev/pd1 -d ata # /dev/pd1, ATA device
        /dev/pd2 -d ata # /dev/pd2, ATA device
        /dev/pd3 -d ata # /dev/pd3, ATA device
        /dev/pd4 -d ata # /dev/pd4, ATA device
        /dev/pd5 -d ata # /dev/pd5, ATA device

        C:\Snapraid>smartctl --scan-open -d scsi,pd
        /dev/pd6 -d scsi # /dev/pd6, SCSI device
        /dev/pd7 -d scsi # /dev/pd7, SCSI device
        /dev/pd8 -d scsi # /dev/pd8, SCSI device
        /dev/pd9 -d scsi # /dev/pd9, SCSI device
        /dev/pd10 -d scsi # /dev/pd10, SCSI device
        /dev/pd11 -d scsi # /dev/pd11, SCSI device
        /dev/pd12 -d scsi # /dev/pd12, SCSI device
        /dev/pd13 -d scsi # /dev/pd13, SCSI device
        C:\Snapraid>smartctl --scan-open -d usb,pd

        C:\Snapraid>smartctl -a -d sat /dev/pd6
        smartctl 6.3 2014-07-26 r3976 [i686-w64-mingw32-win7(64)-sp1] (sf-6.3-1)
        Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

        === START OF INFORMATION SECTION ===
        Model Family: Western Digital Red (AF)
        Device Model: WDC WD40EFRX-68WT0N0
        ...

         
        • Andrea Mazzoleni

          Hi Leifi,

          Please one more test. Please report the full output of these two commands. Note that the second one is expected to print the error code of the first one, so you need to run it just after.

          smartctl -a /dev/pd6 -r ioctl
          echo %errorlevel%

          Anyway, I'm implementing an auto retry with "-d sat" that should work most of the times.

          Thanks,
          Andrea

           
          • Leifi Plomeros

            Leifi Plomeros - 2015-04-07

            Hi,

            That would have been to easy... :/

            C:\Snapraid>smartctl -a /dev/pd6 -r ioctl
            smartctl 6.3 2014-07-26 r3976 [i686-w64-mingw32-win7(64)-sp1] (sf-6.3-1)
            Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

            [inquiry: 12 01 00 00 fc 00 ]
            [inquiry: 12 00 00 00 24 00 ]

            Probable ATA device behind a SAT layer
            Try an additional '-d ata' or '-d sat' argument.

            C:\Snapraid>echo %errorlevel%
            0

             
            • Andrea Mazzoleni

              Hi Leifi,

              Please redownload and retry now. It should work.

              Now with error 0 and 2, if no info at all is present, the "-d sat" alternative is automatically retried.

              Thanks,
              Andrea

               
              • Leifi Plomeros

                Leifi Plomeros - 2015-04-08

                It works!


                Temp Power Error FP Size
                C OnDays Count TB Serial Device Disk

                 38    321       0  58%  4.0  WD-WCC4E0266136  /dev/pd9   d100
                 35    113       0   5%  6.0  WD-WXL1H644XT4T  /dev/pd2   d200
                 37    256       0   5%  4.0  WD-WCC4E0900103  /dev/pd4   d300
                 37    160       0   5%  4.0  WD-WCC4EE7P2ZC0  /dev/pd7   d400
                 36    263       0   5%  4.0  WD-WCC4E0876247  /dev/pd3   d500
                 34    266       0   5%  4.0  WD-WCC4E0883103  /dev/pd5   d600
                 46    615       0   5%  2.0  MN1270FA0WSL1D   /dev/pd10  d700
                 35    972      21   5%  2.0  S1UYJ1RZ515272   /dev/pd8   d700
                 37    261       0   5%  4.0  WD-WCC4E0871186  /dev/pd6   d800
                 35    959      19   5%  2.0  S1UYJ1RZ515380   /dev/pd0   parity
                 36    357       0  n/k  2.0  WD-WCC1T0573778  /dev/pd11  parity
                 42    941   12017   5%  2.0  ML2220F31351EE   /dev/pd13  2-parity
                 39    944       0   5%  2.0  ML2220F30YL2SE   /dev/pd12  2-parity
                 44    220       0  SSD  0.5  S1DHNSAF405907K  /dev/pd1   -
                

                The FP column is the estimated probability (in percentage) that the disk
                is going to fail in the next year.

                Probability that at least one disk is going to fail in the next year is 75%.


                Thank you!

                Looks like it may be fan filter cleaning time... :)

                 
              • Leifi Plomeros

                Leifi Plomeros - 2015-04-08

                Is it possible to use similar logic for up and down?
                Failing to spin down with "-d sat" returns an error code.
                Failing to spin down without "-d sat" does not return an error code.
                Success always return "Device placed in STANDBY mode" text.

                In below examples /dev/pd1 is connected to motherboard SATA
                /dev/pd6 is connected to LSI9211-8i.

                C:\Snapraid>smartctl -d sat -s standby,now /dev/pd1
                Read Device Identity failed: IOCTL_SCSI_PASS_THROUGH_DIRECT failed, Error=1
                A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
                C:\Snapraid>echo %errorlevel%
                2

                C:\Snapraid>smartctl -s standby,now /dev/pd1
                Device placed in STANDBY mode
                C:\Snapraid>echo %errorlevel%
                0

                C:\Snapraid>smartctl -s standby,now /dev/pd6
                Probable ATA device behind a SAT layer
                Try an additional '-d ata' or '-d sat' argument.
                C:\Snapraid>echo %errorlevel%
                0

                C:\Snapraid>smartctl -d sat -s standby,now /dev/pd6
                Device placed in STANDBY mode
                C:\Snapraid>echo %errorlevel%
                0

                 
                • Andrea Mazzoleni

                  Hi Leifi,

                  Before I was using "hdparm" to spindown. But yep, your are correct. Using smartctl is likely a better option.

                  Just implemented it.

                  I've also added in the snapraid.conf file a new "smartctl" option that allow to configure special option for smartctl for each disk.
                  So, if you like, you can set the -d sat for the disks you know that it's needed, without having SnapRAID to retry the command two times.

                  Note that you can see the exact commands used, generating at log with "-l test.log". This may be useful in testing.

                  Thanks!
                  Andrea

                   
  • Leifi Plomeros

    Leifi Plomeros - 2015-04-05

    Snapraid up and down seems to be working for all motherboard SATAs as well

    C:\Snapraid>snapraid down
    Spindown...
    Spundown device '/dev/pd11' for disk 'parity' in 32 ms.
    Spundown device '/dev/pd9' for disk 'd100' in 47 ms.
    Spundown device '/dev/pd13' for disk '2-parity' in 47 ms.
    Spundown device '/dev/pd8' for disk 'd700' in 47 ms.
    Spundown device '/dev/pd6' for disk 'd800' in 47 ms.
    Spundown device '/dev/pd7' for disk 'd400' in 47 ms.
    Spundown device '/dev/pd12' for disk '2-parity' in 47 ms.
    Spundown device '/dev/pd10' for disk 'd700' in 47 ms.
    Spundown device '/dev/pd5' for disk 'd500' in 453 ms.
    Spundown device '/dev/pd3' for disk 'd600' in 453 ms.
    Spundown device '/dev/pd2' for disk 'd300' in 453 ms.
    Spundown device '/dev/pd4' for disk 'd200' in 640 ms.
    Spundown device '/dev/pd0' for disk 'parity' in 1186 ms.

    C:\Snapraid>snapraid up
    Spinup...
    Spunup device '/dev/volb4d94a7b-0a6d-45d9-a6fb-330825f1e449' for disk 'd300' in 15 ms.
    Spunup device '/dev/volb52f9f52-c942-460d-a572-25bf397b1347' for disk 'd400' in 31 ms.
    Spunup device '/dev/vol81c1c2fb-10d9-11e4-bccb-240a645537ee' for disk 'd700' in 31 ms.
    Spunup device '/dev/volf67a1f0f-9f08-11e3-8825-50e549ef3a2e' for disk '2-parity' in 78 ms.
    Spunup device '/dev/vol5f8a6c3a-9da8-45f3-88e6-b87ba7fba7e7' for disk 'd100' in 826 ms.
    Spunup device '/dev/vol6b374911-77c6-4d9f-802c-45ee11b28c80' for disk 'd800' in 826 ms.
    Spunup device '/dev/vol0adaea1e-57a5-4f98-a4c0-70ac2fd12fad' for disk 'd600' in 8346 ms.
    Spunup device '/dev/vol3a0a63f8-4f47-4253-a26a-be4f9114101c' for disk 'd500' in 8845 ms.
    Spunup device '/dev/vole34a0780-4043-4be3-a353-af5ac3b1f637' for disk 'd200' in 9750 ms.
    Spunup device '/dev/vol61c5833b-2886-11e4-9855-240a645537ee' for disk 'parity' in 9859 ms.

    I guess the device name on up command could be polished :)

     
  • rubylaser

    rubylaser - 2015-04-05

    Hello Andrea, this is working well, but how are the Failure Percentages calculated. /dev/sdk is at 100% to fail this year, and it's SMART values (other than age are all good).

    ~~~~~~
    root@backups:~# snapraid smart
    SnapRAID SMART report:

    Temp Power Error FP Size
    C OnDays Count TB Serial Device Disk


     29    632       0   5%  3.0  MJ0351YNG7XKZZ  /dev/sdg  d1
     23    443       0  12%  4.0  Z30031ZZ        /dev/sdc  d2
     24    443       0  13%  4.0  Z3002CZZ        /dev/sdj  d5
     24    631       0   6%  3.0  Z3100AN3        /dev/sdh  d6
     26    437       0 100%  2.0  6YD1R3YF        /dev/sdk  d8
     27    281       2  56%  3.0  44LY9ENGS       /dev/sde  d10
     24    264       0   6%  3.0  Z7P0027C        /dev/sdf  d11
     24    755       1  42%  3.0  MJ1313YNG1LMJC  /dev/sdd  d12
     25    199       0   5%  4.0  PL2331LAG9056J  /dev/sdb  parity
     27    190       0   5%  4.0  PK1334PCGXGYYS  /dev/sdi  2-parity
    
    /dev/sdk's SMART info....
    

    root@fileserver:~# smartctl -a /dev/sdk
    smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.18.6-aufs] (local build)
    Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF INFORMATION SECTION ===
    Model Family: Seagate Barracuda Green (AF)
    Device Model: ST2000DL003-9VT166
    Serial Number: 6YD1R3YF
    LU WWN Device Id: 5 000c50 0465861eb
    Firmware Version: CC3C
    User Capacity: 2,000,398,934,016 bytes [2.00 TB]
    Sector Sizes: 512 bytes logical, 4096 bytes physical
    Rotation Rate: 5900 rpm
    Device is: In smartctl database [for details use: -P show]
    ATA Version is: ATA8-ACS T13/1699-D revision 4
    SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is: Sun Apr 5 17:49:25 2015 EDT
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled

    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    General SMART Values:
    Offline data collection status: (0x82) Offline data collection activity
    was completed without error.
    Auto Offline Data Collection: Enabled.
    Self-test execution status: ( 0) The previous self-test routine completed
    without error or no self-test has ever
    been run.
    Total time to complete Offline
    data collection: ( 623) seconds.
    Offline data collection
    capabilities: (0x7b) SMART execute Offline immediate.
    Auto Offline data collection on/off support.
    Suspend Offline collection upon new
    command.
    Offline surface scan supported.
    Self-test supported.
    Conveyance Self-test supported.
    Selective Self-test supported.
    SMART capabilities: (0x0003) Saves SMART data before entering
    power-saving mode.
    Supports SMART auto save timer.
    Error logging capability: (0x01) Error logging supported.
    General Purpose Logging supported.
    Short self-test routine
    recommended polling time: ( 1) minutes.
    Extended self-test routine
    recommended polling time: ( 355) minutes.
    Conveyance self-test routine
    recommended polling time: ( 2) minutes.
    SCT capabilities: (0x30b7) SCT Status supported.
    SCT Feature Control supported.
    SCT Data Table supported.

    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
    1 Raw_Read_Error_Rate 0x000f 116 099 006 Pre-fail Always - 106740472
    3 Spin_Up_Time 0x0003 090 082 000 Pre-fail Always - 0
    4 Start_Stop_Count 0x0032 095 095 020 Old_age Always - 5636
    5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
    7 Seek_Error_Rate 0x000f 067 060 030 Pre-fail Always - 43007647350
    9 Power_On_Hours 0x0032 089 011 000 Old_age Always - 10489
    10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
    12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 146
    183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1
    184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
    187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
    188 Command_Timeout 0x0032 100 093 000 Old_age Always - 8590065690
    189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
    190 Airflow_Temperature_Cel 0x0022 074 060 045 Old_age Always - 26 (Min/Max 21/28)
    191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
    192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 108
    193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 6467
    194 Temperature_Celsius 0x0022 026 040 000 Old_age Always - 26 (0 13 0 0 0)
    195 Hardware_ECC_Recovered 0x001a 021 006 000 Old_age Always - 106740472
    197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
    198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
    199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
    240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 113266877544912
    241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1243065368
    242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3276093544

    SMART Error Log Version: 1
    No Errors Logged

    SMART Self-test log structure revision number 1
    Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

    1 Extended offline Completed without error 00% 9474 -

    2 Short offline Completed without error 00% 9466 -

    3 Short offline Completed without error 00% 9442 -

    4 Short offline Completed without error 00% 9418 -

    5 Short offline Completed without error 00% 9394 -

    6 Short offline Completed without error 00% 9370 -

    7 Short offline Completed without error 00% 9346 -

    8 Short offline Completed without error 00% 9322 -

    9 Extended offline Completed without error 00% 9306 -

    10 Short offline Completed without error 00% 9298 -

    11 Short offline Completed without error 00% 9274 -

    12 Short offline Completed without error 00% 9250 -

    13 Short offline Completed without error 00% 9209 -

    14 Short offline Completed without error 00% 9175 -

    15 Short offline Completed without error 00% 9151 -

    16 Short offline Completed without error 00% 9127 -

    17 Extended offline Completed without error 00% 9111 -

    18 Short offline Completed without error 00% 9103 -

    19 Short offline Completed without error 00% 9079 -

    20 Short offline Completed without error 00% 9055 -

    21 Short offline Completed without error 00% 9031 -

    SMART Selective self-test log data structure revision number 1
    SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
    1 0 0 Not_testing
    2 0 0 Not_testing
    3 0 0 Not_testing
    4 0 0 Not_testing
    5 0 0 Not_testing
    Selective self-test flags (0x0):
    After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    ~~~~~

    Also, up and down both work well. Finally, I also have IBM m1015 (flashed to IT mode, so it's an 9211-8i) connected to an Intel SAS expander on this box and had no problem with snapraid smart getting values without the -d ata option.

     

    Last edit: rubylaser 2015-04-05
    • Andrea Mazzoleni

      Hi rubylaser,

      The problem is the attribute 188. SnapRAID misread the value 8590065690, as in true, it should be masked to 16 bits, resulting in a value of 26.

      I've uploaded a new RC version that interpret the value in the correct way. Could you please retry ?

      To the failure probability should be less than 100%, but still high, as these timeout command errors are serious ones.

      Ciao,
      Andrea

       
      • rubylaser

        rubylaser - 2015-04-06

        Thanks Andrea. I tried the new RC, and as you said, the results are still terrible.

        root@backups:~# snapraid smart
        SnapRAID SMART report:
        
           Temp  Power   Error   FP Size
              C OnDays   Count        TB  Serial          Device    Disk
         -----------------------------------------------------------------------
             28    633       0   5%  3.0  MJ0351YNG7XK9A  /dev/sdg  d1
             18    444       0  12%  4.0  Z30031NY        /dev/sdc  d2
             18    444       0  13%  4.0  Z3002C2P        /dev/sdj  d5
             17    632       0   6%  3.0  Z3100AN3        /dev/sdh  d6
             24    438       0  87%  2.0  6YD1R3YF        /dev/sdk  d8
             21    282       2  56%  3.0  44LY9ENGS       /dev/sde  d10
             20    265       0   6%  3.0  Z7P0027C        /dev/sdf  d11
             22    756       1  42%  3.0  MJ1313YNG1LMJC  /dev/sdd  d12
             21    200       0   5%  4.0  PL2331LAG9056J  /dev/sdb  parity
             20    190       0   5%  4.0  PK1334PCGXGYYS  /dev/sdi  2-parity
        
        The FP column is the estimated probability (in percentage) that the disk
        is going to fail in the next year.
        
        Probability that at least one disk is going to fail in the next year is 98%.
        

        Looks like it's time to replace the old 2TB drive with a new 4TB one. Thanks again for your great work!

         

        Last edit: rubylaser 2015-04-06
        • Kevin

          Kevin - 2015-04-10

          This is slightly off topic of the main topic but will your script run Snapraid 8.0 out of the gate or will there need to be an update?

           
  • John

    John - 2015-04-06

    I've been using the 8 betas for quite a while, all fine.

    Thank you again for the "negative wasted" in snapraid status, that's very useful for full disks.

    test-devices could be helped by including the path from snapraid.conf

    up/down I've never used, I only imagine what they do but I leave the disk to timeout to sleep

    It is very good to include the SMART data, even if I don't know what triggers it precisely (I'm sure it is straightforward but I've no time to go through the source now). I do have one drive showing 100% failure next year :-)

     
  • Taishan Lin

    Taishan Lin - 2015-04-06

    8.0RC seems also changed fix command:

    ................................................
    C:\snapraidXU>snapraid -e fix
    Self test...
    Loading state from C:/cab/m42/SnapRAID.content...
    Scanning disk d0...
    ...................
    Scanning disk d20...
    Filtering...
    Using 6766 MiB of memory.
    Initializing...
    Fixing...
    100% completed, 12 MB processed in 0:08
    Everything OK
    .................................................

    Great!!! Only 12MB processed.
    After Fixing for 8 minutes, it shows:
    100% completed, 12 MB processed in 0:08 ...
    It would be nice if some progress indicator showing during those 8 minutes.
    I think, maybe, the 8 mins has something to do with the size(45GB) of that particular file with one bad 512KB block.

     
  • Taishan Lin

    Taishan Lin - 2015-04-06

      -      -       -    -    -  -                /dev/pd37  d0
      -      -       -    -    -  -                /dev/pd25  d1
      -      -       -    -    -  -                /dev/pd33  d2
      -      -       -    -    -  -                /dev/pd16  d3
      -      -       -    -    -  -                /dev/pd20  d4
      -      -       -    -    -  -                /dev/pd45  d5
      -      -       -    -    -  -                /dev/pd41  d6
      -      -       -    -    -  -                /dev/pd29  d7
      -      -       -    -    -  -                /dev/pd38  d8
      -      -       -    -    -  -                /dev/pd26  d9
      -      -       -    -    -  -                /dev/pd34  d10
      -      -       -    -    -  -                /dev/pd17  d11
      -      -       -    -    -  -                /dev/pd21  d12
      -      -       -    -    -  -                /dev/pd46  d13
      -      -       -    -    -  -                /dev/pd42  d14
      -      -       -    -    -  -                /dev/pd30  d15
     42     76       0   5%  4.0  WD-WCC4E1UVKZ58  /dev/pd4   d16
     43     39       0   5%  4.0  WD-WCC4E4ZANDTS  /dev/pd3   d17
     41     61       0 100%  4.0  Z3032VA6         /dev/pd0   d18
     40     63       0   6%  4.0  Z3032X8R         /dev/pd2   d19
     33    132       0   5%  4.0  WD-WCC4E3HPKNHU  /dev/pd5   d20
     30    362       -  42%    -  WD-WCC4E0084809  /dev/pd11  parity
     38    175       -   5%    -  WD-WCC4EFSNC4D2  /dev/pd12  2-parity
     36      4       -   6%    -  Z303802E         /dev/pd10  3-parity
     30    736       -  SSD  0.5  201210230052     /dev/pd1   -
      -      -       -   0% 28.0  -                /dev/pd6   -
      -      -       -   0% 28.0  -                /dev/pd7   -
      -      -       -   0% 21.0  -                /dev/pd8   -
      -      -       -    - 28.0  -                /dev/pd9   -
      -      -       -    -    -  -                /dev/pd13  -
     36    655       0  32%  3.0  W1F0P8A8         /dev/pd19  -
     41    919       0 100%  3.0  WD-WCAWZ1828327  /dev/pd24  -
     37     69       0   6%  3.0  W7300ZA2         /dev/pd32  -
     39    600       0  58%  3.0  W1F0CZM0         /dev/pd44  -
      -      -       -    -    -  -                /dev/pd47  -
      -      -       -    -    -  -                /dev/pd48  -
      -      -       -    -    -  -                /dev/pd49  -
      -      -       -    -    -  -                /dev/pd50  -
     40      3       - 100%    -  Z3037ZWK         /dev/pd51  -
    

    The FP column is the estimated probability (in percentage) that the disk
    is going to fail in the next year.

    Probability that at least one disk is going to fail in the next year is 100%.


    Some problems with smartctl,

    1. Disks housed in USB cages not shown.
    2. Seagate NAS HDD 4T ST4000VN000-1H4168: pd51 3 days old, FP 100%, its (size TB) missing
      but another 4 days old: /dev/pd10 FP 6%
    3. Also SG NAS HDD 4T: pd0 61 days old, FP 100%
     
    • Taishan Lin

      Taishan Lin - 2015-04-08

      new 8.0 RC:
      C:\snapraidXU>snapraid smart

        -      -       -  n/a    -  -                /dev/pd25  d0
        -      -       -  n/a    -  -                /dev/pd33  d1
        -      -       -  n/a    -  -                /dev/pd41  d2
        -      -       -  n/a    -  -                /dev/pd17  d3
        -      -       -  n/a    -  -                /dev/pd21  d4
        -      -       -  n/a    -  -                /dev/pd44  d5
        -      -       -  n/a    -  -                /dev/pd29  d6
        -      -       -  n/a    -  -                /dev/pd37  d7
        -      -       -  n/a    -  -                /dev/pd26  d8
        -      -       -  n/a    -  -                /dev/pd34  d9
        -      -       -  n/a    -  -                /dev/pd42  d10
        -      -       -  n/a    -  -                /dev/pd18  d11
        -      -       -  n/a    -  -                /dev/pd22  d12
        -      -       -  n/a    -  -                /dev/pd45  d13
        -      -       -  n/a    -  -                /dev/pd30  d14
        -      -       -  n/a    -  -                /dev/pd38  d15
       38     79       0   5%  4.0  WD-WCC4E1UVKZ58  /dev/pd4   d16
       39     41       0   5%  4.0  WD-WCC4E4ZANDTS  /dev/pd3   d17
       38     63       0  10%  4.0  Z3032VA6         /dev/pd0   d18
       37     65       0   6%  4.0  Z3032X8R         /dev/pd1   d19
       28    135       0   5%  4.0  WD-WCC4E3HPKNHU  /dev/pd5   d20
       24    364       -  n/k    -  WD-WCC4E0084809  /dev/pd46  parity
       38    178       -  n/k    -  WD-WCC4EFSNC4D2  /dev/pd13  2-parity
       35      6       -  n/k    -  Z303802E         /dev/pd10  3-parity
       30    738       -  SSD  0.5  201210230052     /dev/pd2   -
        -      -       -  n/k 28.0  -                /dev/pd6   -
        -      -       -  n/k 28.0  -                /dev/pd7   -
        -      -       -  n/k 21.0  -                /dev/pd8   -
        -      -       -  n/a 28.0  -                /dev/pd9   -
       30      5       -  n/k    -  Z3037ZWK         /dev/pd11  -
        -      -       -  n/a    -  -                /dev/pd14  -
       35    657       0  26%  3.0  W1F0P8A8         /dev/pd20  -
       33    922       0  n/k  3.0  WD-WCAWZ1828327  /dev/pd32  -
       34     72       0   6%  3.0  W7300ZA2         /dev/pd40  -
       37    603       0  24%  3.0  W1F0CZM0         /dev/pd43  -
        -      -       -  n/a    -  -                /dev/pd49  -
        -      -       -  n/a    -  -                /dev/pd50  -
        -      -       -  n/a    -  -                /dev/pd51  -
        -      -       -  n/a    -  -                /dev/pd52  -
      

      The FP column is the estimated probability (in percentage) that the disk
      is going to fail in the next year.

      Probability that at least one disk is going to fail in the next year is 26%.

       
      • Andrea Mazzoleni

        Hi Taishan,

        I just added a new "smartctl" option that allow to pass special configuration option to smartctl.

        You can first try to make smartctl to work manually with USB controller. See: https://www.smartmontools.org/wiki/Supported_USB-Devices

        Something like:

        smartctl -a -d usbjmicron /dev/pd8
        

        (note that usbjmicron is only an example, I don't know that enclosure you have)

        Then you can add the options in snapraid.conf. Like:

        smartctl d1 -d usbjmicron %s
        

        See the new manual about this new "smartctl" option.

        Ciao,
        Andrea

         
        • Taishan Lin

          Taishan Lin - 2015-04-10

          RC0409 64bit:

          configuration 1: added in conf file:
          smartctl d0 -d usbjmicron,0 %s
          smartctl d1 -d usbjmicron,0 %s
          smartctl d2 -d usbjmicron,0 %s
          smartctl d3 -d usbjmicron,0 %s
          smartctl d4 -d usbjmicron,0 %s
          smartctl d5 -d usbjmicron,0 %s
          smartctl d6 -d usbjmicron,0 %s
          smartctl d7 -d usbjmicron,0 %s
          smartctl d8 -d usbjmicron,0 %s
          smartctl d9 -d usbjmicron,0 %s
          smartctl d10 -d usbjmicron,0 %s
          smartctl d11 -d usbjmicron,0 %s
          smartctl d12 -d usbjmicron,0 %s
          smartctl parity -d usbjmicron,0 %s
          smartctl 2-parity -d usbjmicron,0 %s

          snapraid smart:


           34     81       0  10%  3.0  Z1F31CQD         /dev/pd24  d0
           33    924       0  26%  3.0  WD-WCAWZ1828327  /dev/pd32  d1
           36     28       0   6%  3.0  Z1F5PX1W         /dev/pd16  d2
           34     75       0   6%  3.0  W7300ZA2         /dev/pd40  d3
           35    660       0  26%  3.0  W1F0P8A8         /dev/pd20  d4
           36    605       0  24%  3.0  W1F0CZM0         /dev/pd44  d5
           33    537       0   5%  3.0  WD-WMC1T3957622  /dev/pd31  d6
           35    279       0   9%  3.0  Z1F55TF6         /dev/pd39  d7
           34    554       0   5%  3.0  WD-WMC1T4225948  /dev/pd19  d8
           36    555       0   5%  3.0  WD-WMC1T3955681  /dev/pd43  d9
           36    555       0   5%  3.0  WD-WMC1T4141933  /dev/pd27  d10
          
            -      -       -    -  3.0  WD-WMC1T3958547  /dev/pd35  d11
           36    553       0   5%  3.0  WD-WMC1T4318974  /dev/pd15  d12
           33    487       0  99%  3.0  WD-WMC1T4141687  /dev/pd23  parity
           37    553       0   5%  3.0  WD-WMC1T3958963  /dev/pd28  2-parity
           40     66       0  10%  4.0  Z3032VA6         /dev/pd0   -
           30    741       -  SSD  0.5  201210230052     /dev/pd1   -
           43     81       0   5%  4.0  WD-WCC4E1UVKZ58  /dev/pd2   -
           39     68       0   6%  4.0  Z3032X8R         /dev/pd3   -
           44     44       0   5%  4.0  WD-WCC4E4ZANDTS  /dev/pd4   -
           32    137       0   5%  4.0  WD-WCC4E3HPKNHU  /dev/pd5   -
            -      -       -    - 28.0  -                /dev/pd6   -
            -      -       -    - 28.0  -                /dev/pd7   -
            -      -       -    - 21.0  -                /dev/pd8   -
            -      -       -  n/a 28.0  -                /dev/pd9   -
           35      9       -   6%    -  Z303802E         /dev/pd10  -
           29    366       -  32%    -  WD-WCC4E0084809  /dev/pd11  -
           37    180       -   5%    -  WD-WCC4EFSNC4D2  /dev/pd13  -
            -      -       -  n/a    -  -                /dev/pd14  -
            -      -       -  n/a    -  -                /dev/pd49  -
            -      -       -  n/a    -  -                /dev/pd50  -
            -      -       -  n/a    -  -                /dev/pd51  -
            -      -       -  n/a    -  -                /dev/pd52  -
          

          The FP column is the estimated probability (in percentage) that the disk
          is going to fail in the next year.

          Probability that at least one disk is going to fail in the next year is 100%.

          The only thing strange is disk d11 /dev/pd35 it should be the same as d0 thru d12 and 2 parity disks. But some data missing.

          ( there are 4 HW raid and 5 drivepool virtual drives with device name only)

           
          • Taishan Lin

            Taishan Lin - 2015-04-10

            configuration 2: added in conf file:

            smartctl d0 -d usbjmicron,0 %s
            smartctl d1 -d usbjmicron,0 %s
            smartctl d2 -d usbjmicron,0 %s
            smartctl d3 -d usbjmicron,0 %s
            smartctl d4 -d usbjmicron,0 %s
            smartctl d5 -d usbjmicron,0 %s
            smartctl d6 -d usbjmicron,0 %s
            smartctl d7 -d usbjmicron,0 %s
            smartctl d8 -d usbjmicron,0 %s
            smartctl d9 -d usbjmicron,0 %s
            smartctl d10 -d usbjmicron,0 %s
            smartctl d11 -d usbjmicron,0 %s
            smartctl d12 -d usbjmicron,0 %s
            smartctl d13 -d usbjmicron,0 %s
            smartctl d14 -d usbjmicron,0 %s
            smartctl d15 -d usbjmicron,0 %s
            smartctl d16 -d ata %s
            smartctl d17 -d ata %s
            smartctl d18 -d ata %s
            smartctl d19 -d ata %s
            smartctl d20 -d ata %s
            smartctl parity -d sat %s
            smartctl 2-parity -d sat %s
            smartctl 3-parity -d sat %s

            snapraid smart
            SnapRAID SMART report:


             32    542       0  26%  4.0  W300CGLD         /dev/pd25  d0
             32    337       0   8%  4.0  Z301FDFK         /dev/pd33  d1
             33    336       0   8%  4.0  Z301FE0W         /dev/pd41  d2
             35    337 PREFAIL   7%  4.0  Z301FDRY         /dev/pd17  d3
             32    337 PREFAIL   7%  4.0  Z301G1SA         /dev/pd21  d4
             32    336 PREFAIL   8%  4.0  Z301G0AT         /dev/pd45  d5
             33    336 PREFAIL  13%  4.0  Z301FDC6         /dev/pd29  d6
             34    479       0  27%  4.0  Z300RS4D         /dev/pd37  d7
             30    288 PREFAIL  55%  4.0  Z301FE0J         /dev/pd26  d8
             32    391 PREFAIL  10%  4.0  Z300RSQX         /dev/pd34  d9
             31    397       0  13%  4.0  Z300MJ8X         /dev/pd42  d10
             36    486       0  33%  4.0  W300D6V8         /dev/pd18  d11
             36    139       0   5%  4.0  WD-WCC4E0ZY6933  /dev/pd22  d12
             35    182       0   5%  4.0  WD-WCC4ERZFHLXP  /dev/pd46  d13
             33     81       0   5%  4.0  WD-WCC4E1UVKEF5  /dev/pd30  d14
             33     47       0   5%  4.0  WD-WCC4E0NVZL5U  /dev/pd38  d15
             43     81       0   5%  4.0  WD-WCC4E1UVKZ58  /dev/pd2   d16
             42     44       0   5%  4.0  WD-WCC4E4ZANDTS  /dev/pd4   d17
             39     66       0  10%  4.0  Z3032VA6         /dev/pd0   d18
             38     68       0   6%  4.0  Z3032X8R         /dev/pd3   d19
             32    137       0   5%  4.0  WD-WCC4E3HPKNHU  /dev/pd5   d20
             29    366       0  32%  4.0  WD-WCC4E0084809  /dev/pd11  parity
             37    180       0   5%  4.0  WD-WCC4EFSNC4D2  /dev/pd13  2-parity
             35      9       0   6%  4.0  Z303802E         /dev/pd10  3-parity
             30    741       -  SSD  0.5  201210230052     /dev/pd1   -
            
              -      -       -    - 28.0  -                /dev/pd6   -
              -      -       -    - 28.0  -                /dev/pd7   -
              -      -       -    - 21.0  -                /dev/pd8   -
              -      -       -  n/a 28.0  -                /dev/pd9   -
              -      -       -  n/a    -  -                /dev/pd14  -
             35    660       0  26%  3.0  W1F0P8A8         /dev/pd20  -
             35    924       0  26%  3.0  WD-WCAWZ1828327  /dev/pd32  -
             34     75       0   6%  3.0  W7300ZA2         /dev/pd40  -
             37    605       0  24%  3.0  W1F0CZM0         /dev/pd44  -
              -      -       -  n/a    -  -                /dev/pd49  -
              -      -       -  n/a    -  -                /dev/pd50  -
              -      -       -  n/a    -  -                /dev/pd51  -
              -      -       -  n/a    -  -                /dev/pd52  -
            

            The FP column is the estimated probability (in percentage) that the disk
            is going to fail in the next year.

            Probability that at least one disk is going to fail in the next year is 97%.

            DANGER! SMART is reporting that one or more disks are FAILING!
            Please take immediate action!

             
  • Jessie Taylor

    Jessie Taylor - 2015-04-06

    I am not aware of any SMART parameters that allows one to calculate or even guesstimate the probability that a drive will die in the next year. There must be some misunderstanding about what the SMART data represents.

    The column about failure probability should be removed, since the numbers it reports are never going to be accurate.

     
  • Andrea Mazzoleni

    Hi Jessie,

    The failure probability is an estimation obtained correlating the SMART attributes with the 40000 disks data that Backblaze recently released.

    These are the Backblaze data files:
    https://www.backblaze.com/blog/hard-drive-data-feb2015/

    And here some more easy to read graphs for each attribute:
    https://www.backblaze.com/blog-smart-stats-2014-8.html

    Obviously, it's only an estimation that could be more or less accurate, but it's a simple way to keep an eye on all the SMART attributes, checking a single value.

    Ciao,
    Andrea

     
  • MQMan

    MQMan - 2015-04-07

    Not too sure I believe this:

    SnapRAID SMART report:

    Temp Power Error FP Size
    C OnDays Count TB Serial Device Disk


     26   2042       0 100%  2.0  5YD282NZ         /dev/sdh  disk1
     28   2478       0 100%  2.0  5YD26ZTT         /dev/sdc  disk2
     27   2267       0 100%  2.0  5YD1Y35Y         /dev/sdd  disk3
     25   2580       0 100%  2.0  5YD2468B         /dev/sde  disk4
     24     37       0  99%  2.0  W1H2Z5X6         /dev/sdf  disk5
     25     36       0  99%  2.0  W1H2YHVH         /dev/sdb  disk6
     26   2377       0 100%  2.0  5YD24HZP         /dev/sdg  parity
     25   1247   ERROR   5%  0.5  WD-WCAWF0473556  /dev/sda  -
    

    The FP column is the estimated probability (in percentage) that the disk
    is going to fail in the next year.

    Probability that at least one disk is going to fail in the next year is 100%.

    disk5 and disk6 are both brand new only added in the last month.

    Pasting the full SMART from these would be way TMI. What snapshots of data can I provide here.

    Cheers.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.

MongoDB Logo MongoDB