From: Justin P. <jp...@lu...> - 2006-06-10 10:23:39
|
SUMMARY: I pose the following question in the subject, as over the years running smartd and having failed disks, I have always first been alerted of bad sectors and such through dmesg or logcheck. Even with a bad disk I currently have, smartd does not pickup any errors, except those with the kernel writes to syslog. LKML INFO: I've cc'd the LKML to show that, when a disk is failing I had received similar stat errors, but those were due to buffer / or other disk issues. [4485617.826000] ata2: status=0x51 { DriveReady SeekComplete Error } [4485619.292000] ata2: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04 [4485619.292000] ata2: status=0x51 { DriveReady SeekComplete Error } [4485620.749000] ata2: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04 [4485620.749000] ata2: status=0x51 { DriveReady SeekComplete Error } [4494582.951000] ata2: command 0x25 timeout, stat 0x50 host_stat 0x22 [4494831.267000] ata2: command 0x25 timeout, stat 0x50 host_stat 0x22 -------------- Now for the problem and analysis: The Death and Diagnosis of a Dying Hard Drive - Is S.M.A.R.T. useful? 1] SMARTMONTOOLS: I pose the following question: Is running the smartd daemon with short and long S.M.A.R.T. tests enough? 2] FAILED HARD DRIVE: A Maxtor of course! (1.38 years old) ------------------------- snip ------------------------------------------------- Model Family: Maxtor DiamondMax 10 family Device Model: Maxtor 6B250S0 Serial Number: ******** (out of warranty on 02/19/2006) Firmware Version: BANC1B70 User Capacity: 251,000,193,024 bytes ------------------------- snip ------------------------------------------------- 3] DMESG DATA DUMP: Occured while [reading] a file from the HDD. ------------------------- snip ------------------------------------------------- ATA: abnormal status 0x80 on port 0xC807 ATA: abnormal status 0x80 on port 0xC807 ATA: abnormal status 0x80 on port 0xC807 ata2: command 0x25 timeout, stat 0x80 host_stat 0x21 ata2: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 0xb/47/00 ata2: status=0x80 { Busy } sd 2:0:0:0: SCSI error: return code = 0x8000002 sdc: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sdc, sector 130483823 ATA: abnormal status 0x80 on port 0xC807 ATA: abnormal status 0x80 on port 0xC807 ATA: abnormal status 0x80 on port 0xC807 ata2: command 0x25 timeout, stat 0x50 host_stat 0x21 ------------------------- snip ------------------------------------------------- 4] SMARTCTL-SHORT TEST: The short shows nothing wrong with the drive. ------------------------- snip ------------------------------------------------- # smartctl -d ata -t short /dev/sdc SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA _of_first_error # 1 Short offline Completed without error 00% 12097 - ------------------------- snip ------------------------------------------------- 5] SMARTCTL-LONG TEST: ------------------------- snip ------------------------------------------------- # smartctl -d ata -t short /dev/sdc SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA # 1 Extended offline Completed without error 00% 12099 - ------------------------- snip ------------------------------------------------- 6] TRY OTHER METHOD USE DD. ------------------------- snip ------------------------------------------------- # /usr/bin/time dd if=/dev/sdc bs=4096 | pipebench > /x6/failed_hdd.img # This also checked out but some interesting messages in dmesg: ata3: no sense translation for status: 0x51 ata3: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata3: status=0x51 { DriveReady SeekComplete Error } ------------------------- snip ------------------------------------------------- 7] CHECK WITH BADBLOCKS(READ-ONLY)...? ------------------------- snip ------------------------------------------------- # /usr/bin/time badblocks -b 512 -s -v /dev/sdc -b 512 -s -v /dev/sdhecking blocks 0 to 490234752 Checking for bad blocks (read-only test): done Pass completed, 0 bad blocks found. 5.56user 439.85system 1:31:29elapsed 8%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+230minor)pagefaults 0swaps # mount -a ------------------------- snip ------------------------------------------------- 8] CHECK WITH BADBLOCKS(READ+WRITE)...? ------------------------- snip ------------------------------------------------- # /usr/bin/time badblocks -b 512 -s -v -w /dev/sdc Checking for bad blocks in read-write mode >From block 0 to 490234752 Testing with pattern 0xaa: 369800128/ 490234752 ------------------------- snip ------------------------------------------------- After 12 hours of testing, FINALLY, it says I have a bad disk, see below. 233537658 233537659 233537660 233537661 233537662 233537663 done Testing with pattern 0x00: done Reading and comparing: done Pass completed, 26368 bad blocks found. 1496.54user 3582.18system 12:14:45elapsed 11%CPU (0avgtext+0avgdata 0maxresident)k0inputs+0outputs (2major+282minor)pagefaults 0swaps -- Also in dmesg: System Events =-=-=-=-=-=-= Jun 9 23:14:51 p34 smartd[32213]: Device: /dev/sdc, 1 Currently unreadable (pending) sectors Jun 9 23:44:52 p34 smartd[32213]: Device: /dev/sdc, 1 Currently unreadable (pending) sectors ------------------------- snip ------------------------------------------------- 9] Now review the SMART log again! ------------------------- snip ------------------------------------------------- Error 252 occurred at disk power-on lifetime: 11354 hours (473 days + 2 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 78 00 08 b0 19 eb e0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 00 00 08 b0 19 eb e0 00 01:39:12.107 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:10.649 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:09.191 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:07.716 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:06.258 NOP [Abort queued commands] Error 251 occurred at disk power-on lifetime: 11354 hours (473 days + 2 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 78 00 08 b0 19 eb e0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 00 00 08 b0 19 eb e0 00 01:39:10.649 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:09.191 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:07.716 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:06.258 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:04.791 NOP [Abort queued commands] ------------------------- snip ------------------------------------------------- 10] What about those self-tests, do they find anything now? Nope. ------------------------- snip ------------------------------------------------- # smartctl -d ata -t short /dev/sdc Nope. SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA _of_first_error # 1 Short offline Completed without error 00% 12116 - ------------------------- snip ------------------------------------------------- 11] What about the long test? Does not find anything. ------------------------- snip ------------------------------------------------- # smartctl -d ata -t long /dev/sdc SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA _of_first_error # 1 Extended offline Completed without error 00% 12117 - ------------------------- snip ------------------------------------------------- After all of this testing, I must pose the question to all of those who run smartd, is it worth running with scheduled short/long tests if they do not find the errors that badblocks did? Please advise. Thanks, Justin. |
From: Justin P. <jp...@lu...> - 2006-06-10 10:59:12
|
# smartctl -d ata -H /dev/sdc smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D SMART overall-health self-assessment test result: PASSED # The --all output below: smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ =3D=3D=3D START OF INFORMATION SECTION =3D=3D=3D Model Family: Maxtor DiamondMax 10 family Device Model: Maxtor 6B250S0 Serial Number: /* commented out */ Firmware Version: BANC1B70 User Capacity: 251,000,193,024 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is: Sat Jun 10 06:58:29 2006 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82)=09Offline data collection activity =09=09=09=09=09was completed without error. =09=09=09=09=09Auto Offline Data Collection: Enabled. Self-test execution status: ( 0)=09The previous self-test routine co= mpleted =09=09=09=09=09without error or no self-test has ever =09=09=09=09=09been run. Total time to complete Offline=20 data collection: =09=09 (2283) seconds. Offline data collection capabilities: =09=09=09 (0x5b) SMART execute Offline immediate. =09=09=09=09=09Auto Offline data collection on/off support. =09=09=09=09=09Suspend Offline collection upon new =09=09=09=09=09command. =09=09=09=09=09Offline surface scan supported. =09=09=09=09=09Self-test supported. =09=09=09=09=09No Conveyance Self-test supported. =09=09=09=09=09Selective Self-test supported. SMART capabilities: (0x0003)=09Saves SMART data before entering =09=09=09=09=09power-saving mode. =09=09=09=09=09Supports SMART auto save timer. Error logging capability: (0x01)=09Error logging supported. =09=09=09=09=09General Purpose Logging supported. Short self-test routine=20 recommended polling time: =09 ( 2) minutes. Extended self-test routine recommended polling time: =09 ( 109) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED = WHEN_FAILED RAW_VALUE 3 Spin_Up_Time 0x0027 184 182 063 Pre-fail Always = - 25693 4 Start_Stop_Count 0x0032 253 253 000 Old_age Always = - 91 5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always = - 0 6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline = - 0 7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always = - 0 8 Seek_Time_Performance 0x0027 252 237 187 Pre-fail Always = - 56515 9 Power_On_Minutes 0x0032 218 218 000 Old_age Always = - 103h+42m 10 Spin_Retry_Count 0x002b 249 248 157 Pre-fail Always = - 4 11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always = - 0 12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always = - 191 192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always = - 0 193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always = - 0 194 Temperature_Celsius 0x0032 038 253 000 Old_age Always = - 32 195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always = - 312 196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline = - 0 197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline = - 0 198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline = - 0 199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline = - 0 200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always = - 0 201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always = - 2 202 TA_Increase_Count 0x000a 253 252 000 Old_age Always = - 0 203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always = - 0 204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age Always = - 0 205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always = - 0 207 Spin_High_Current 0x002a 249 248 000 Old_age Always = - 4 208 Spin_Buzz 0x002a 253 252 000 Old_age Always = - 0 209 Offline_Seek_Performnce 0x0024 241 241 000 Old_age Offline = - 152 210 Unknown_Attribute 0x0032 253 252 000 Old_age Always = - 0 211 Unknown_Attribute 0x0032 253 252 000 Old_age Always = - 0 212 Unknown_Attribute 0x0032 253 253 000 Old_age Always = - 0 SMART Error Log Version: 1 ATA Error Count: 252 (device log contains only the most recent five errors) =09CR =3D Command Register [HEX] =09FR =3D Features Register [HEX] =09SC =3D Sector Count Register [HEX] =09SN =3D Sector Number Register [HEX] =09CL =3D Cylinder Low Register [HEX] =09CH =3D Cylinder High Register [HEX] =09DH =3D Device/Head Register [HEX] =09DC =3D Device Command Register [HEX] =09ER =3D Error register [HEX] =09ST =3D Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=3Ddays, hh=3Dhours, mm=3Dminutes, SS=3Dsec, and sss=3Dmillisec. It "wraps" after 49.710 days. Error 252 occurred at disk power-on lifetime: 11354 hours (473 days + 2 hou= rs) When the command that caused the error occurred, the device was in an un= known state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 78 00 08 b0 19 eb e0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 00 00 08 b0 19 eb e0 00 01:39:12.107 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:10.649 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:09.191 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:07.716 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:06.258 NOP [Abort queued commands] Error 251 occurred at disk power-on lifetime: 11354 hours (473 days + 2 hou= rs) When the command that caused the error occurred, the device was in an un= known state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 78 00 08 b0 19 eb e0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 00 00 08 b0 19 eb e0 00 01:39:10.649 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:09.191 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:07.716 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:06.258 NOP [Abort queued commands] 00 00 08 b0 19 eb e0 00 01:39:04.791 NOP [Abort queued commands] SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours)= LBA_of_first_error # 1 Extended offline Completed without error 00% 12117 = - # 2 Short offline Completed without error 00% 12116 = - # 3 Extended offline Completed without error 00% 12099 = - # 4 Short offline Completed without error 00% 12097 = - # 5 Short offline Completed without error 00% 12090 = - # 6 Short offline Completed without error 00% 12044 = - # 7 Short offline Completed without error 00% 12020 = - # 8 Short offline Completed without error 00% 11996 = - # 9 Short offline Completed without error 00% 11972 = - #10 Short offline Completed without error 00% 11949 = - #11 Short offline Completed without error 00% 11924 = - #12 Short offline Completed without error 00% 11900 = - #13 Short offline Completed without error 00% 11877 = - #14 Short offline Completed without error 00% 11853 = - #15 Short offline Completed without error 00% 11829 = - #16 Short offline Completed without error 00% 11806 = - #17 Short offline Completed without error 00% 11782 = - #18 Short offline Completed without error 00% 11758 = - #19 Short offline Completed without error 00% 11734 = - #20 Short offline Completed without error 00% 11711 = - #21 Short offline Completed without error 00% 11687 = - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. On Sat, 10 Jun 2006, Jan-Benedict Glaw wrote: > On Sat, 2006-06-10 06:23:32 -0400, Justin Piszcz <jp...@lu...= > wrote: >> SUMMARY: >> I pose the following question in the subject, as over the years running >> smartd and having failed disks, I have always first been alerted of bad >> sectors and such through dmesg or logcheck. Even with a bad disk I >> currently have, smartd does not pickup any errors, except those with the >> kernel writes to syslog. > > What do > > =09smartctl -H > =09smartctl --all > > tell you? > > MfG, JBG > > --=20 > Jan-Benedict Glaw jb...@lu... . +49-172-7608481 = _ O _ > "Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg= _ _ O > f=C3=BCr einen Freien Staat voll Freier B=C3=BCrger" | im Internet! | = im Irak! O O O > ret =3D do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCP= A)); > |
From: Volker K. <lis...@pa...> - 2006-06-10 11:47:04
|
> After all of this testing, I must pose the question to all of those who run > smartd, is it worth running with scheduled short/long tests if they do > not find the errors that badblocks did? Good analysis, with a wealth of detailed info. Question: is all this one and the same hard disk? The smart short and long self tests are surface tests, which probably also test a few other things internally. They do not test the outside electrical interface of the disk, not the IDE cable, not the IDE controller, not the mobo, yet all these things outside the disk can easily show up with badblocks, in dmesg, etc. Yes I do find it worthwhile to run smartd. See it as potentially alerting you to a problem. No guarantees it will, you'll never get that with anything, but its chances of giving you a false alert are very slim. Similar with badblocks. It may prove a disk is faulty, it'll *never* prove a disk is faultless. I've had a disk (also Maxtor) where large parts of the surface would not hold data for longer than a minute. Did badblocks find a problem? Nope, because to pass badblocks, the surface needs to hold data only for one millisecond. (Ok actual times made up, but you see the point.) Volker -- Volker Kuhlmann is list0570 with the domain in header http://volker.dnsalias.net/ Please do not CC list postings to me. |
From: Jan-Benedict G. <jb...@lu...> - 2006-06-10 10:51:47
|
On Sat, 2006-06-10 06:23:32 -0400, Justin Piszcz <jp...@lu...> = wrote: > SUMMARY: > I pose the following question in the subject, as over the years running= =20 > smartd and having failed disks, I have always first been alerted of bad= =20 > sectors and such through dmesg or logcheck. Even with a bad disk I=20 > currently have, smartd does not pickup any errors, except those with the= =20 > kernel writes to syslog. What do smartctl -H smartctl --all tell you? MfG, JBG --=20 Jan-Benedict Glaw jb...@lu... . +49-172-7608481 = _ O _ "Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg = _ _ O f=C3=BCr einen Freien Staat voll Freier B=C3=BCrger" | im Internet! | i= m Irak! O O O ret =3D do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA)= ); |
From: Bruce A. <ba...@gr...> - 2006-06-11 03:16:11
|
Justin, It's an unfortunate fact of life that SMART will not detect all disk failures. My research group in the U. Wisconsin - Milwaukee Physics Department runs two large computing clusters (approximately 2000 hard disks total). We run weekly extended self-tests with smartd. Our experience over about five years is that about 2/3 of drive failures can are predicted by smartd. The other 1/3 of failures have no warning. I am surprised that the extended self-test does not detect the bad sectors on your disk. Our experience is that the typical SYSLOG 'seek failure' error messages do correlate very well with the failing LBAs found via SMART self-tests. On your disk, it may be the case that these bad sectors are *sometimes* readable, or that the sequential scanning done during a SMART self-test do not provoke these errors. If you have some time to follow up, you could do some experiments with a recent release of dd using the 'direct' option to bypass the block layers in the Linux kernel. Cheers, Bruce |
From: Theodore T. <ty...@mi...> - 2006-06-11 13:00:10
|
On Sat, Jun 10, 2006 at 10:15:59PM -0500, Bruce Allen wrote: > I am surprised that the extended self-test does not detect the bad sectors > on your disk. Our experience is that the typical SYSLOG 'seek failure' > error messages do correlate very well with the failing LBAs found via > SMART self-tests. My guess is that it didn't detect the errors for the same reason that a read-only scan using badblocks didn't detect the problems, while a read/write scan did. What *did* surprise me a little is that after the bad block had been detected by badblocks -w and was remapped by the disk drive, but before it had been forcibly rewritten (so that now reads of the block would return errors to the OS) that the extended self-test didn't return an error. I guess as far as the disk was concerned, the block had been remapped, so everything was OK. The real question though is whether the disk continues to work OK from this point forward, or whether it is a prelude to an ever-increasing number of bad blocks. If it is the latter, and S.M.A.R.T. still didn't give any warning, then it would certainly be an indictment of that particular manufacturer's S.M.A.R.T. implementation. - Ted |
From: Bruce A. <ba...@gr...> - 2006-06-11 16:22:47
|
Theodore Tso wrote: > The real question though is whether the disk continues to work OK from > this point forward, or whether it is a prelude to an ever-increasing > number of bad blocks. If it is the latter, and S.M.A.R.T. still didn't > give any warning, then it would certainly be an indictment of that > particular manufacturer's S.M.A.R.T. implementation. I have a practical suggestion. Most recent disk drives have a new type of self-test option called 'selective self-tests'. This allows you to run a self-test on up to five user-defined ranges of LBAs. For example, if you suspect that LBA=12345678 is failing, then instead of having to wait an hour or two for the entire disk surface to be scanned, you can tell the disk to scan (say) the range LBA_1=12345000 to LBA_2=12345999 five times in a row, which takes only a few seconds. By repeating this process many times you can scan a trouble area on the disk a few thousands of times in an hour. For a couple of years, smartmontools smartctl has had the functionality to invoke these selective self-tests if the disk supports them. But (until just last week) it was awkward: it required a kernel built with TASKFILE support enabled, and only worked with (some of the) ide drivers. This has changed. Thanks to hard work by Doug Gilbert and Jeff Garzik to built a SAT (SCSI to ATA Translation) layer in libata and to put a SAT interface into smartmontools, anyone can easily access this functionality with any SATA disk that supports selective self-test via libata. Note: no smartmontools release incorporates this yet. You have to build from CVS. Here are the instructions (4 lines): cvs -d:pserver:ano...@sm...:/cvsroot/smartmontools login (when prompted for a password, just press Enter) cvs -d:pserver:ano...@sm...:/cvsroot/smartmontools co sm5 cd sm5 ./autogen.sh && ./configure && make Here is an example of running a selective self-test five times on the same range of LBAs as above: [slave0123 ~]# ./smartctl -d sat -t select,12345000-12345999 -t select,12345000-12345999 -t select,12345000-12345999 -t select,12345000-12345999 -t select,12345000-12345999 /dev/sda smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Selective self-test routine immediately in off-line mode". SPAN STARTING_LBA ENDING_LBA 0 12345000 12345999 1 12345000 12345999 2 12345000 12345999 3 12345000 12345999 4 12345000 12345999 Drive command "Execute SMART Selective self-test routine immediately in off-line mode" successful. Testing has begun. Wait a few seconds, then see the results of the selective self-testing: [slave0123 ~]# ./smartctl -d sat -l selective -l selftest /dev/sda smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Selective offline Completed without error 00% 1473 - # 2 Selective offline Completed without error 00% 1473 - # 3 Extended offline Completed without error 00% 1467 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 12345000 12345999 Not_testing 2 12345000 12345999 Not_testing 3 12345000 12345999 Not_testing 4 12345000 12345999 Not_testing 5 12345000 12345999 Not_testing Justin, I hope that this is of some help to you and others with similar issues. Cheers, Brucce |