I'm mainly interested in comments on the new "up", "down" and "smart" commands. They are intended to spin-up, spin-down, and print a SMART report of the array.
To have them working in Linux, you must have smartctl and hdparm already installed. In Windows, they are provided in the SnapRAID package. In both cases, to get full functionality you must run as root/Administrator.
There is also a new "test-devices" command, that prints the disk mapping that SnapRAID see, with low level devices used by each disk in the array.
These new commands don't make any change, so you can test them even still using SnapRAID 7.1.
Ciao,
Andrea
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The other disks are connected to LSI 9211-8i which require Smartctl parameters: -d sat
Any chance that you could allow passing of that parameter? Or even try both alternatives with and without the parameter and only present the successfull results in the table?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
C:\Snapraid>smartctl --scan-open -d pd
/dev/pd0 -d ata # /dev/pd0, ATA device
/dev/pd1 -d ata # /dev/pd1, ATA device
/dev/pd2 -d ata # /dev/pd2, ATA device
/dev/pd3 -d ata # /dev/pd3, ATA device
/dev/pd4 -d ata # /dev/pd4, ATA device
/dev/pd5 -d ata # /dev/pd5, ATA device
/dev/pd6 -d scsi # /dev/pd6, SCSI device
/dev/pd7 -d scsi # /dev/pd7, SCSI device
/dev/pd8 -d scsi # /dev/pd8, SCSI device
/dev/pd9 -d scsi # /dev/pd9, SCSI device
/dev/pd10 -d scsi # /dev/pd10, SCSI device
/dev/pd11 -d scsi # /dev/pd11, SCSI device
/dev/pd12 -d scsi # /dev/pd12, SCSI device
/dev/pd13 -d scsi # /dev/pd13, SCSI device
C:\Snapraid>smartctl --scan-open -d ata,pd
/dev/pd0 -d ata # /dev/pd0, ATA device
/dev/pd1 -d ata # /dev/pd1, ATA device
/dev/pd2 -d ata # /dev/pd2, ATA device
/dev/pd3 -d ata # /dev/pd3, ATA device
/dev/pd4 -d ata # /dev/pd4, ATA device
/dev/pd5 -d ata # /dev/pd5, ATA device
C:\Snapraid>smartctl -a -d sat /dev/pd6
smartctl 6.3 2014-07-26 r3976 [i686-w64-mingw32-win7(64)-sp1] (sf-6.3-1)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (AF)
Device Model: WDC WD40EFRX-68WT0N0
...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Please one more test. Please report the full output of these two commands. Note that the second one is expected to print the error code of the first one, so you need to run it just after.
smartctl -a /dev/pd6 -r ioctl
echo %errorlevel%
Anyway, I'm implementing an auto retry with "-d sat" that should work most of the times.
Thanks,
Andrea
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is it possible to use similar logic for up and down?
Failing to spin down with "-d sat" returns an error code.
Failing to spin down without "-d sat" does not return an error code.
Success always return "Device placed in STANDBY mode" text.
In below examples /dev/pd1 is connected to motherboard SATA
/dev/pd6 is connected to LSI9211-8i.
C:\Snapraid>smartctl -d sat -s standby,now /dev/pd1
Read Device Identity failed: IOCTL_SCSI_PASS_THROUGH_DIRECT failed, Error=1
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
C:\Snapraid>echo %errorlevel%
2
C:\Snapraid>smartctl -s standby,now /dev/pd6
Probable ATA device behind a SAT layer
Try an additional '-d ata' or '-d sat' argument.
C:\Snapraid>echo %errorlevel%
0
C:\Snapraid>smartctl -d sat -s standby,now /dev/pd6
Device placed in STANDBY mode
C:\Snapraid>echo %errorlevel%
0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Before I was using "hdparm" to spindown. But yep, your are correct. Using smartctl is likely a better option.
Just implemented it.
I've also added in the snapraid.conf file a new "smartctl" option that allow to configure special option for smartctl for each disk.
So, if you like, you can set the -d sat for the disks you know that it's needed, without having SnapRAID to retry the command two times.
Note that you can see the exact commands used, generating at log with "-l test.log". This may be useful in testing.
Thanks!
Andrea
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Snapraid up and down seems to be working for all motherboard SATAs as well
C:\Snapraid>snapraid down
Spindown...
Spundown device '/dev/pd11' for disk 'parity' in 32 ms.
Spundown device '/dev/pd9' for disk 'd100' in 47 ms.
Spundown device '/dev/pd13' for disk '2-parity' in 47 ms.
Spundown device '/dev/pd8' for disk 'd700' in 47 ms.
Spundown device '/dev/pd6' for disk 'd800' in 47 ms.
Spundown device '/dev/pd7' for disk 'd400' in 47 ms.
Spundown device '/dev/pd12' for disk '2-parity' in 47 ms.
Spundown device '/dev/pd10' for disk 'd700' in 47 ms.
Spundown device '/dev/pd5' for disk 'd500' in 453 ms.
Spundown device '/dev/pd3' for disk 'd600' in 453 ms.
Spundown device '/dev/pd2' for disk 'd300' in 453 ms.
Spundown device '/dev/pd4' for disk 'd200' in 640 ms.
Spundown device '/dev/pd0' for disk 'parity' in 1186 ms.
C:\Snapraid>snapraid up
Spinup...
Spunup device '/dev/volb4d94a7b-0a6d-45d9-a6fb-330825f1e449' for disk 'd300' in 15 ms.
Spunup device '/dev/volb52f9f52-c942-460d-a572-25bf397b1347' for disk 'd400' in 31 ms.
Spunup device '/dev/vol81c1c2fb-10d9-11e4-bccb-240a645537ee' for disk 'd700' in 31 ms.
Spunup device '/dev/volf67a1f0f-9f08-11e3-8825-50e549ef3a2e' for disk '2-parity' in 78 ms.
Spunup device '/dev/vol5f8a6c3a-9da8-45f3-88e6-b87ba7fba7e7' for disk 'd100' in 826 ms.
Spunup device '/dev/vol6b374911-77c6-4d9f-802c-45ee11b28c80' for disk 'd800' in 826 ms.
Spunup device '/dev/vol0adaea1e-57a5-4f98-a4c0-70ac2fd12fad' for disk 'd600' in 8346 ms.
Spunup device '/dev/vol3a0a63f8-4f47-4253-a26a-be4f9114101c' for disk 'd500' in 8845 ms.
Spunup device '/dev/vole34a0780-4043-4be3-a353-af5ac3b1f637' for disk 'd200' in 9750 ms.
Spunup device '/dev/vol61c5833b-2886-11e4-9855-240a645537ee' for disk 'parity' in 9859 ms.
I guess the device name on up command could be polished :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello Andrea, this is working well, but how are the Failure Percentages calculated. /dev/sdk is at 100% to fail this year, and it's SMART values (other than age are all good).
root@fileserver:~# smartctl -a /dev/sdk
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.18.6-aufs] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda Green (AF)
Device Model: ST2000DL003-9VT166
Serial Number: 6YD1R3YF
LU WWN Device Id: 5 000c50 0465861eb
Firmware Version: CC3C
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Apr 5 17:49:25 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 623) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 355) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x30b7) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
1 Extended offline Completed without error 00% 9474 -
2 Short offline Completed without error 00% 9466 -
3 Short offline Completed without error 00% 9442 -
4 Short offline Completed without error 00% 9418 -
5 Short offline Completed without error 00% 9394 -
6 Short offline Completed without error 00% 9370 -
7 Short offline Completed without error 00% 9346 -
8 Short offline Completed without error 00% 9322 -
9 Extended offline Completed without error 00% 9306 -
10 Short offline Completed without error 00% 9298 -
11 Short offline Completed without error 00% 9274 -
12 Short offline Completed without error 00% 9250 -
13 Short offline Completed without error 00% 9209 -
14 Short offline Completed without error 00% 9175 -
15 Short offline Completed without error 00% 9151 -
16 Short offline Completed without error 00% 9127 -
17 Extended offline Completed without error 00% 9111 -
18 Short offline Completed without error 00% 9103 -
19 Short offline Completed without error 00% 9079 -
20 Short offline Completed without error 00% 9055 -
21 Short offline Completed without error 00% 9031 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
~~~~~
Also, up and down both work well. Finally, I also have IBM m1015 (flashed to IT mode, so it's an 9211-8i) connected to an Intel SAS expander on this box and had no problem with snapraid smart getting values without the -d ata option.
Last edit: rubylaser 2015-04-05
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've been using the 8 betas for quite a while, all fine.
Thank you again for the "negative wasted" in snapraid status, that's very useful for full disks.
test-devices could be helped by including the path from snapraid.conf
up/down I've never used, I only imagine what they do but I leave the disk to timeout to sleep
It is very good to include the SMART data, even if I don't know what triggers it precisely (I'm sure it is straightforward but I've no time to go through the source now). I do have one drive showing 100% failure next year :-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
................................................
C:\snapraidXU>snapraid -e fix
Self test...
Loading state from C:/cab/m42/SnapRAID.content...
Scanning disk d0...
...................
Scanning disk d20...
Filtering...
Using 6766 MiB of memory.
Initializing...
Fixing...
100% completed, 12 MB processed in 0:08
Everything OK
.................................................
Great!!! Only 12MB processed.
After Fixing for 8 minutes, it shows:
100% completed, 12 MB processed in 0:08 ...
It would be nice if some progress indicator showing during those 8 minutes.
I think, maybe, the 8 mins has something to do with the size(45GB) of that particular file with one bad 512KB block.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am not aware of any SMART parameters that allows one to calculate or even guesstimate the probability that a drive will die in the next year. There must be some misunderstanding about what the SMART data represents.
The column about failure probability should be removed, since the numbers it reports are never going to be accurate.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Obviously, it's only an estimation that could be more or less accurate, but it's a simple way to keep an eye on all the SMART attributes, checking a single value.
Ciao,
Andrea
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I prepared a release candidate for 8.0 at: http://snapraid.sourceforge.net/rc/
The full list of changes is: https://github.com/amadvance/snapraid/blob/master/HISTORY
I'm mainly interested in comments on the new "up", "down" and "smart" commands. They are intended to spin-up, spin-down, and print a SMART report of the array.
To have them working in Linux, you must have smartctl and hdparm already installed. In Windows, they are provided in the SnapRAID package. In both cases, to get full functionality you must run as root/Administrator.
There is also a new "test-devices" command, that prints the disk mapping that SnapRAID see, with low level devices used by each disk in the array.
These new commands don't make any change, so you can test them even still using SnapRAID 7.1.
Ciao,
Andrea
Works great on the motherboard SATA ports.
D700, and the parity disks are correctly represented as different physical devices on low level with correct size! :)
All other values also seem correct, including correctly identifing the system disk SSD.
The other disks are connected to LSI 9211-8i which require Smartctl parameters: -d sat
Any chance that you could allow passing of that parameter? Or even try both alternatives with and without the parameter and only present the successfull results in the table?
Hi Leifi,
I think that I can add the possibility to specify a manual "-d" option that should be applied to some specific disks.
But to better understand the issue, could you please try the following commands, and report their output ?
smartctl --scan-open -d pd
smartctl --scan-open -d ata,pd
smartctl --scan-open -d scsi,pd
smartctl --scan-open -d usb,pd
Thanks,
Andrea
C:\Snapraid>smartctl --scan-open -d pd
/dev/pd0 -d ata # /dev/pd0, ATA device
/dev/pd1 -d ata # /dev/pd1, ATA device
/dev/pd2 -d ata # /dev/pd2, ATA device
/dev/pd3 -d ata # /dev/pd3, ATA device
/dev/pd4 -d ata # /dev/pd4, ATA device
/dev/pd5 -d ata # /dev/pd5, ATA device
/dev/pd6 -d scsi # /dev/pd6, SCSI device
/dev/pd7 -d scsi # /dev/pd7, SCSI device
/dev/pd8 -d scsi # /dev/pd8, SCSI device
/dev/pd9 -d scsi # /dev/pd9, SCSI device
/dev/pd10 -d scsi # /dev/pd10, SCSI device
/dev/pd11 -d scsi # /dev/pd11, SCSI device
/dev/pd12 -d scsi # /dev/pd12, SCSI device
/dev/pd13 -d scsi # /dev/pd13, SCSI device
C:\Snapraid>smartctl --scan-open -d ata,pd
/dev/pd0 -d ata # /dev/pd0, ATA device
/dev/pd1 -d ata # /dev/pd1, ATA device
/dev/pd2 -d ata # /dev/pd2, ATA device
/dev/pd3 -d ata # /dev/pd3, ATA device
/dev/pd4 -d ata # /dev/pd4, ATA device
/dev/pd5 -d ata # /dev/pd5, ATA device
C:\Snapraid>smartctl --scan-open -d scsi,pd
/dev/pd6 -d scsi # /dev/pd6, SCSI device
/dev/pd7 -d scsi # /dev/pd7, SCSI device
/dev/pd8 -d scsi # /dev/pd8, SCSI device
/dev/pd9 -d scsi # /dev/pd9, SCSI device
/dev/pd10 -d scsi # /dev/pd10, SCSI device
/dev/pd11 -d scsi # /dev/pd11, SCSI device
/dev/pd12 -d scsi # /dev/pd12, SCSI device
/dev/pd13 -d scsi # /dev/pd13, SCSI device
C:\Snapraid>smartctl --scan-open -d usb,pd
C:\Snapraid>smartctl -a -d sat /dev/pd6
smartctl 6.3 2014-07-26 r3976 [i686-w64-mingw32-win7(64)-sp1] (sf-6.3-1)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (AF)
Device Model: WDC WD40EFRX-68WT0N0
...
Hi Leifi,
Please one more test. Please report the full output of these two commands. Note that the second one is expected to print the error code of the first one, so you need to run it just after.
smartctl -a /dev/pd6 -r ioctl
echo %errorlevel%
Anyway, I'm implementing an auto retry with "-d sat" that should work most of the times.
Thanks,
Andrea
Hi,
That would have been to easy... :/
C:\Snapraid>smartctl -a /dev/pd6 -r ioctl
smartctl 6.3 2014-07-26 r3976 [i686-w64-mingw32-win7(64)-sp1] (sf-6.3-1)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
[inquiry: 12 01 00 00 fc 00 ]
[inquiry: 12 00 00 00 24 00 ]
Probable ATA device behind a SAT layer
Try an additional '-d ata' or '-d sat' argument.
C:\Snapraid>echo %errorlevel%
0
Hi Leifi,
Please redownload and retry now. It should work.
Now with error 0 and 2, if no info at all is present, the "-d sat" alternative is automatically retried.
Thanks,
Andrea
It works!
Temp Power Error FP Size
C OnDays Count TB Serial Device Disk
The FP column is the estimated probability (in percentage) that the disk
is going to fail in the next year.
Probability that at least one disk is going to fail in the next year is 75%.
Thank you!
Looks like it may be fan filter cleaning time... :)
Is it possible to use similar logic for up and down?
Failing to spin down with "-d sat" returns an error code.
Failing to spin down without "-d sat" does not return an error code.
Success always return "Device placed in STANDBY mode" text.
In below examples /dev/pd1 is connected to motherboard SATA
/dev/pd6 is connected to LSI9211-8i.
C:\Snapraid>smartctl -d sat -s standby,now /dev/pd1
Read Device Identity failed: IOCTL_SCSI_PASS_THROUGH_DIRECT failed, Error=1
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
C:\Snapraid>echo %errorlevel%
2
C:\Snapraid>smartctl -s standby,now /dev/pd1
Device placed in STANDBY mode
C:\Snapraid>echo %errorlevel%
0
C:\Snapraid>smartctl -s standby,now /dev/pd6
Probable ATA device behind a SAT layer
Try an additional '-d ata' or '-d sat' argument.
C:\Snapraid>echo %errorlevel%
0
C:\Snapraid>smartctl -d sat -s standby,now /dev/pd6
Device placed in STANDBY mode
C:\Snapraid>echo %errorlevel%
0
Hi Leifi,
Before I was using "hdparm" to spindown. But yep, your are correct. Using smartctl is likely a better option.
Just implemented it.
I've also added in the snapraid.conf file a new "smartctl" option that allow to configure special option for smartctl for each disk.
So, if you like, you can set the -d sat for the disks you know that it's needed, without having SnapRAID to retry the command two times.
Note that you can see the exact commands used, generating at log with "-l test.log". This may be useful in testing.
Thanks!
Andrea
Snapraid up and down seems to be working for all motherboard SATAs as well
C:\Snapraid>snapraid down
Spindown...
Spundown device '/dev/pd11' for disk 'parity' in 32 ms.
Spundown device '/dev/pd9' for disk 'd100' in 47 ms.
Spundown device '/dev/pd13' for disk '2-parity' in 47 ms.
Spundown device '/dev/pd8' for disk 'd700' in 47 ms.
Spundown device '/dev/pd6' for disk 'd800' in 47 ms.
Spundown device '/dev/pd7' for disk 'd400' in 47 ms.
Spundown device '/dev/pd12' for disk '2-parity' in 47 ms.
Spundown device '/dev/pd10' for disk 'd700' in 47 ms.
Spundown device '/dev/pd5' for disk 'd500' in 453 ms.
Spundown device '/dev/pd3' for disk 'd600' in 453 ms.
Spundown device '/dev/pd2' for disk 'd300' in 453 ms.
Spundown device '/dev/pd4' for disk 'd200' in 640 ms.
Spundown device '/dev/pd0' for disk 'parity' in 1186 ms.
C:\Snapraid>snapraid up
Spinup...
Spunup device '/dev/volb4d94a7b-0a6d-45d9-a6fb-330825f1e449' for disk 'd300' in 15 ms.
Spunup device '/dev/volb52f9f52-c942-460d-a572-25bf397b1347' for disk 'd400' in 31 ms.
Spunup device '/dev/vol81c1c2fb-10d9-11e4-bccb-240a645537ee' for disk 'd700' in 31 ms.
Spunup device '/dev/volf67a1f0f-9f08-11e3-8825-50e549ef3a2e' for disk '2-parity' in 78 ms.
Spunup device '/dev/vol5f8a6c3a-9da8-45f3-88e6-b87ba7fba7e7' for disk 'd100' in 826 ms.
Spunup device '/dev/vol6b374911-77c6-4d9f-802c-45ee11b28c80' for disk 'd800' in 826 ms.
Spunup device '/dev/vol0adaea1e-57a5-4f98-a4c0-70ac2fd12fad' for disk 'd600' in 8346 ms.
Spunup device '/dev/vol3a0a63f8-4f47-4253-a26a-be4f9114101c' for disk 'd500' in 8845 ms.
Spunup device '/dev/vole34a0780-4043-4be3-a353-af5ac3b1f637' for disk 'd200' in 9750 ms.
Spunup device '/dev/vol61c5833b-2886-11e4-9855-240a645537ee' for disk 'parity' in 9859 ms.
I guess the device name on up command could be polished :)
Hello Andrea, this is working well, but how are the Failure Percentages calculated. /dev/sdk is at 100% to fail this year, and it's SMART values (other than age are all good).
~~~~~~
root@backups:~# snapraid smart
SnapRAID SMART report:
Temp Power Error FP Size
C OnDays Count TB Serial Device Disk
root@fileserver:~# smartctl -a /dev/sdk
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.18.6-aufs] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda Green (AF)
Device Model: ST2000DL003-9VT166
Serial Number: 6YD1R3YF
LU WWN Device Id: 5 000c50 0465861eb
Firmware Version: CC3C
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Apr 5 17:49:25 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 623) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 355) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x30b7) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 116 099 006 Pre-fail Always - 106740472
3 Spin_Up_Time 0x0003 090 082 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 095 095 020 Old_age Always - 5636
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 067 060 030 Pre-fail Always - 43007647350
9 Power_On_Hours 0x0032 089 011 000 Old_age Always - 10489
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 146
183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 093 000 Old_age Always - 8590065690
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 074 060 045 Old_age Always - 26 (Min/Max 21/28)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 108
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 6467
194 Temperature_Celsius 0x0022 026 040 000 Old_age Always - 26 (0 13 0 0 0)
195 Hardware_ECC_Recovered 0x001a 021 006 000 Old_age Always - 106740472
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 113266877544912
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1243065368
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3276093544
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
1 Extended offline Completed without error 00% 9474 -
2 Short offline Completed without error 00% 9466 -
3 Short offline Completed without error 00% 9442 -
4 Short offline Completed without error 00% 9418 -
5 Short offline Completed without error 00% 9394 -
6 Short offline Completed without error 00% 9370 -
7 Short offline Completed without error 00% 9346 -
8 Short offline Completed without error 00% 9322 -
9 Extended offline Completed without error 00% 9306 -
10 Short offline Completed without error 00% 9298 -
11 Short offline Completed without error 00% 9274 -
12 Short offline Completed without error 00% 9250 -
13 Short offline Completed without error 00% 9209 -
14 Short offline Completed without error 00% 9175 -
15 Short offline Completed without error 00% 9151 -
16 Short offline Completed without error 00% 9127 -
17 Extended offline Completed without error 00% 9111 -
18 Short offline Completed without error 00% 9103 -
19 Short offline Completed without error 00% 9079 -
20 Short offline Completed without error 00% 9055 -
21 Short offline Completed without error 00% 9031 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
~~~~~
Also, up and down both work well. Finally, I also have IBM m1015 (flashed to IT mode, so it's an 9211-8i) connected to an Intel SAS expander on this box and had no problem with snapraid smart getting values without the -d ata option.
Last edit: rubylaser 2015-04-05
Hi rubylaser,
The problem is the attribute 188. SnapRAID misread the value 8590065690, as in true, it should be masked to 16 bits, resulting in a value of 26.
I've uploaded a new RC version that interpret the value in the correct way. Could you please retry ?
To the failure probability should be less than 100%, but still high, as these timeout command errors are serious ones.
Ciao,
Andrea
Thanks Andrea. I tried the new RC, and as you said, the results are still terrible.
Looks like it's time to replace the old 2TB drive with a new 4TB one. Thanks again for your great work!
Last edit: rubylaser 2015-04-06
This is slightly off topic of the main topic but will your script run Snapraid 8.0 out of the gate or will there need to be an update?
I've been using the 8 betas for quite a while, all fine.
Thank you again for the "negative wasted" in snapraid status, that's very useful for full disks.
test-devices could be helped by including the path from snapraid.conf
up/down I've never used, I only imagine what they do but I leave the disk to timeout to sleep
It is very good to include the SMART data, even if I don't know what triggers it precisely (I'm sure it is straightforward but I've no time to go through the source now). I do have one drive showing 100% failure next year :-)
8.0RC seems also changed fix command:
................................................
C:\snapraidXU>snapraid -e fix
Self test...
Loading state from C:/cab/m42/SnapRAID.content...
Scanning disk d0...
...................
Scanning disk d20...
Filtering...
Using 6766 MiB of memory.
Initializing...
Fixing...
100% completed, 12 MB processed in 0:08
Everything OK
.................................................
Great!!! Only 12MB processed.
After Fixing for 8 minutes, it shows:
100% completed, 12 MB processed in 0:08 ...
It would be nice if some progress indicator showing during those 8 minutes.
I think, maybe, the 8 mins has something to do with the size(45GB) of that particular file with one bad 512KB block.
The FP column is the estimated probability (in percentage) that the disk
is going to fail in the next year.
Probability that at least one disk is going to fail in the next year is 100%.
Some problems with smartctl,
but another 4 days old: /dev/pd10 FP 6%
new 8.0 RC:
C:\snapraidXU>snapraid smart
The FP column is the estimated probability (in percentage) that the disk
is going to fail in the next year.
Probability that at least one disk is going to fail in the next year is 26%.
Hi Taishan,
I just added a new "smartctl" option that allow to pass special configuration option to smartctl.
You can first try to make smartctl to work manually with USB controller. See: https://www.smartmontools.org/wiki/Supported_USB-Devices
Something like:
(note that usbjmicron is only an example, I don't know that enclosure you have)
Then you can add the options in snapraid.conf. Like:
See the new manual about this new "smartctl" option.
Ciao,
Andrea
RC0409 64bit:
configuration 1: added in conf file:
smartctl d0 -d usbjmicron,0 %s
smartctl d1 -d usbjmicron,0 %s
smartctl d2 -d usbjmicron,0 %s
smartctl d3 -d usbjmicron,0 %s
smartctl d4 -d usbjmicron,0 %s
smartctl d5 -d usbjmicron,0 %s
smartctl d6 -d usbjmicron,0 %s
smartctl d7 -d usbjmicron,0 %s
smartctl d8 -d usbjmicron,0 %s
smartctl d9 -d usbjmicron,0 %s
smartctl d10 -d usbjmicron,0 %s
smartctl d11 -d usbjmicron,0 %s
smartctl d12 -d usbjmicron,0 %s
smartctl parity -d usbjmicron,0 %s
smartctl 2-parity -d usbjmicron,0 %s
snapraid smart:
The FP column is the estimated probability (in percentage) that the disk
is going to fail in the next year.
Probability that at least one disk is going to fail in the next year is 100%.
The only thing strange is disk d11 /dev/pd35 it should be the same as d0 thru d12 and 2 parity disks. But some data missing.
( there are 4 HW raid and 5 drivepool virtual drives with device name only)
configuration 2: added in conf file:
smartctl d0 -d usbjmicron,0 %s
smartctl d1 -d usbjmicron,0 %s
smartctl d2 -d usbjmicron,0 %s
smartctl d3 -d usbjmicron,0 %s
smartctl d4 -d usbjmicron,0 %s
smartctl d5 -d usbjmicron,0 %s
smartctl d6 -d usbjmicron,0 %s
smartctl d7 -d usbjmicron,0 %s
smartctl d8 -d usbjmicron,0 %s
smartctl d9 -d usbjmicron,0 %s
smartctl d10 -d usbjmicron,0 %s
smartctl d11 -d usbjmicron,0 %s
smartctl d12 -d usbjmicron,0 %s
smartctl d13 -d usbjmicron,0 %s
smartctl d14 -d usbjmicron,0 %s
smartctl d15 -d usbjmicron,0 %s
smartctl d16 -d ata %s
smartctl d17 -d ata %s
smartctl d18 -d ata %s
smartctl d19 -d ata %s
smartctl d20 -d ata %s
smartctl parity -d sat %s
smartctl 2-parity -d sat %s
smartctl 3-parity -d sat %s
snapraid smart
SnapRAID SMART report:
The FP column is the estimated probability (in percentage) that the disk
is going to fail in the next year.
Probability that at least one disk is going to fail in the next year is 97%.
DANGER! SMART is reporting that one or more disks are FAILING!
Please take immediate action!
I am not aware of any SMART parameters that allows one to calculate or even guesstimate the probability that a drive will die in the next year. There must be some misunderstanding about what the SMART data represents.
The column about failure probability should be removed, since the numbers it reports are never going to be accurate.
Hi Jessie,
The failure probability is an estimation obtained correlating the SMART attributes with the 40000 disks data that Backblaze recently released.
These are the Backblaze data files:
https://www.backblaze.com/blog/hard-drive-data-feb2015/
And here some more easy to read graphs for each attribute:
https://www.backblaze.com/blog-smart-stats-2014-8.html
Obviously, it's only an estimation that could be more or less accurate, but it's a simple way to keep an eye on all the SMART attributes, checking a single value.
Ciao,
Andrea
Not too sure I believe this:
SnapRAID SMART report:
Temp Power Error FP Size
C OnDays Count TB Serial Device Disk
The FP column is the estimated probability (in percentage) that the disk
is going to fail in the next year.
Probability that at least one disk is going to fail in the next year is 100%.
disk5 and disk6 are both brand new only added in the last month.
Pasting the full SMART from these would be way TMI. What snapshots of data can I provide here.
Cheers.