From: Travis B. <tr...@be...> - 2004-11-11 20:38:07
|
Hello everyone, I strongly suspect my PowerBook's hard drive (IBM TravelStar 20GB) is developing problems. The simple solution is to just buy a new drive, but as a student, I can't afford to do that unless it's absolutely necessary. Incidentally, my AppleCare warranty expired three days before the problem started. The main symptom is that read/write tasks sometimes stall for 5-30 seconds, and the hard drive makes the same sequence of clicking and seeking noises over and over. The noises themselves are not too unusual, but the repetition of the same noise pattern, combined with the I/O delay, is suspicious (sounds like it's repeatedly trying to read the same block, or perhaps is recalibrating itself). None of the OS X disk utilities show any problems. From running smartctl, it seems that the drive is logging an UNC (unrecoverable) error on most of these stall/strange noise occasions. So far, I haven't actually gotten any I/O error messages from the OS, which leads me to think that the drive is eventually able to read the data. Looking at the smartctl output, I have a "raw" count of 38 for "Current_Pending_Sector"--it just increased from 37 after one of these stall incidents. The trick in the FAQ for forcing a write to the bad sector doesn't seem applicable to OS X. Running a long selftest finds nothing (offline and short tests are not supported). It also seems like errors are occuring more frequently than average. My drive shows a total of 439 errors over 6835 power-on hours. That's an average of 15.6 hours between errors. However, the last five errors were an average of 9.4 hours apart. So, is it game over for my drive? Am I living on borrowed time? Please be sure to email me at trbeals at berkeley followed by edu, as I'm not on the list. Thanks! -Travis Here's the output from smartctl -a disk0: smartctl version 5.33 [powerpc-apple-darwin7.6.0] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: IBM-IC25N020ATDA04-0 Serial Number: 63A63135398 Firmware Version: DA3AA72A User Capacity: 20,003,880,960 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 5 ATA Standard is: ATA/ATAPI-5 T13 1321D revision 3 Local Time is: Thu Nov 11 11:05:44 2004 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 645) seconds. Offline data collection capabilities: (0x1b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 27) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 085 085 062 Pre-fail Always - 16384012 2 Throughput_Performance 0x0005 100 100 040 Pre-fail Offline - 0 3 Spin_Up_Time 0x0007 142 142 033 Pre-fail Always - 1 4 Start_Stop_Count 0x0012 094 094 000 Old_age Always - 10363 5 Reallocated_Sector_Ct 0x0033 095 095 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 100 100 040 Pre-fail Offline - 0 9 Power_On_Hours 0x0012 085 085 000 Old_age Always - 6835 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 096 096 000 Old_age Always - 7153 191 G-Sense_Error_Rate 0x000a 098 098 000 Old_age Always - 262145 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 48 193 Load_Cycle_Count 0x0012 065 065 000 Old_age Always - 352433 194 Temperature_Celsius 0x0002 189 189 000 Old_age Always - 29 (Lifetime Min/Max 13/55) 196 Reallocated_Event_Count 0x0032 088 088 000 Old_age Always - 698 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 38 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 439 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 439 occurred at disk power-on lifetime: 6834 hours (284 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 36 4a e6 08 e0 Error: UNC 54 sectors at LBA = 0x0008e64a = 583242 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 40 40 e6 08 e0 00 00:00:06.800 READ DMA ca 00 00 16 4e 00 e0 00 00:00:06.800 WRITE DMA ca 00 05 a0 02 35 e1 00 00:00:06.800 WRITE DMA c8 00 35 28 36 22 e0 00 00:00:06.800 READ DMA ca 00 01 38 b7 60 e1 00 00:00:06.800 WRITE DMA Error 438 occurred at disk power-on lifetime: 6834 hours (284 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 36 4a e6 08 e0 Error: UNC 54 sectors at LBA = 0x0008e64a = 583242 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 40 40 e6 08 e0 00 00:00:02.700 READ DMA c8 00 03 f0 80 0d e2 00 00:00:02.600 READ DMA c8 00 10 d0 bd 03 e0 00 00:00:02.600 READ DMA c8 00 10 10 93 00 e0 00 00:00:02.500 READ DMA c8 00 03 00 7f 0d e2 00 00:00:02.500 READ DMA Error 437 occurred at disk power-on lifetime: 6825 hours (284 days + 9 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 40 c3 54 e0 Error: UNC 8 sectors at LBA = 0x0054c340 = 5555008 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 40 c3 54 e0 00 00:00:02.600 READ DMA c8 00 08 e0 13 52 e0 00 00:00:02.600 READ DMA c8 00 40 40 a3 52 e0 00 00:00:02.500 READ DMA c8 00 08 50 ab 3b e2 00 00:00:02.500 READ DMA c8 00 40 00 a3 52 e0 00 00:00:02.500 READ DMA Error 436 occurred at disk power-on lifetime: 6807 hours (283 days + 15 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 06 d1 2a 1d e0 Error: UNC 6 sectors at LBA = 0x001d2ad1 = 1911505 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 07 d0 2a 1d e0 00 00:00:49.000 READ DMA c8 00 08 28 99 0d e1 00 00:00:48.900 READ DMA c8 00 08 b8 90 15 e1 00 00:00:48.900 READ DMA c8 00 08 b8 2a 1d e0 00 00:00:41.200 READ DMA c8 00 08 28 97 0d e1 00 00:00:41.200 READ DMA Error 435 occurred at disk power-on lifetime: 6787 hours (282 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 1c a4 80 01 e0 Error: UNC 28 sectors at LBA = 0x000180a4 = 98468 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 20 a0 80 01 e0 00 00:00:45.900 READ DMA ef 03 22 00 00 00 a0 00 00:00:45.900 SET FEATURES [Set transfer mode] c8 00 20 a0 80 01 e0 00 00:00:14.900 READ DMA c8 00 20 30 ad 02 e0 00 00:00:14.800 READ DMA c8 00 20 d0 64 03 e0 00 00:00:14.600 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 6830 - Device does not support Selective Self Tests/Logging |
From: Ryan U. <nem...@ic...> - 2004-11-12 17:17:05
|
Hi, Based on this: > The main symptom is that read/write tasks sometimes stall for 5-30 > seconds, and the hard drive makes the same sequence of clicking and > seeking noises over and over. and this: > From running smartctl, it seems that the drive is logging an UNC > (unrecoverable) error on most of these stall/strange noise occasions. I would recommend backing up your data immediately. -- Ryan Underwood, <ne...@ic...> |
From: Geoffrey K. <ge...@ge...> - 2004-11-12 21:15:43
|
Travis Beals <tr...@be...> writes: > 5 Reallocated_Sector_Ct 0x0033 095 095 005 Pre-fail > Always - 0 > 196 Reallocated_Event_Count 0x0032 088 088 000 Old_age > Always - 698 > 197 Current_Pending_Sector 0x0022 100 100 000 Old_age > Always - 38 > # 1 Extended offline Completed without error 00% 6830 It looks like your drive is slowly developing bad sectors, which have been corrected either by rewriting the sector or by remapping it elsewhere. You've used up about 5% of the total available sectors for remapping, so there's lots of capacity to correct future problems; the drive will not completely fail for a long time. The extended test didn't find any problems besides the ones that the drive already knew about. This kind of drive tries to keep itself contiguous, so instead of just saying 'sector 1234 is now sector 456789 at the end of the disk', it tries to shift everything up one so that the former sector 1235 now holds sector 1234. I think that's why you have pending sectors even though the self-test is passing. I would suggest: - Keep up-to-date backups. It sounds like you haven't lost any data yet, but if your drive is slowly developing bad sectors then eventually you might. - Keep monitoring the drive. If the '095' in 'reallocated_sector_ct' drops below about '030', start shopping for a new drive. If the drive is reported with a failing SMART status, replace it as soon as possible. - Run self-tests at least once a week. |