From: Frode E. M. <fr...@fr...> - 2004-10-24 23:10:18
|
Hi. Just recently, during very heavy disk activity, I noticed two events in the winxp event log claiming "The device, \Device\Ide\IdePort0, did not respond within the timeout period.". There is also an event from yesterday claiming "An error was detected on device \Device\Harddisk0\D during a paging operation." This laptop drive is 2-3 months old and was replaced because the original drive became faulty. Booting into linux and running smartctl listed two IDNF error events. One of these is for LBA "0x0000003f", which I think is very strange - I thought the first 63 sectors were reserved for the MBR/partition table and cannot fanthom why winxp would want to write there. I'm also puzzled by the "Reallocated event count" being 18 (what does this attribute really mean?) while there is 0 raw read errors, 0 reallocated sectors, 0 current_pending_sectors, 0 offline uncorrectable and 0 udma crc errors. Did the drive really reallocate any sectors? (I later discovered smartctl was ported to cygwin, running that version provides the same results, except it appears the drive is now in the smartctl database (v5.33 vs v5.32) Performing a short self-test PASSES, and trying to read sector 63 by doing "hexdump -C /dev/hda" for a while seems to work perfectly. Is this a cause for concern? Please cc: me on any replies as I am not subscribed to the list. Here's the output from running smartctl under linux (Debian unstable) smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: HTS548060M9AT00 Serial Number: <removed> Firmware Version: MGBOA53A Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 6 ATA Standard is: ATA/ATAPI-6 T13 1410D revision 3a Local Time is: Sun Oct 24 22:47:49 2004 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 645) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 46) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 062 Pre-fail Always - 0 2 Throughput_Performance 0x0005 100 100 040 Pre-fail Offline - 0 3 Spin_Up_Time 0x0007 200 200 033 Pre-fail Always - 2 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 67 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 100 100 040 Pre-fail Offline - 0 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 22 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59 191 G-Sense_Error_Rate 0x000a 081 081 000 Old_age Always - 6357592 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 3 193 Load_Cycle_Count 0x0012 099 099 000 Old_age Always - 16583 194 Temperature_Celsius 0x0002 125 125 000 Old_age Always - 44 (Lifetime Min/Max 14/52) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 18 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 2 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2 occurred at disk power-on lifetime: 673 hours (28 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 10 51 02 3f 00 00 e0 Error: IDNF at LBA = 0x0000003f = 63 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ca d0 02 3f 00 00 e0 00 03:06:52.600 WRITE DMA ca d0 20 35 cb 6f e3 00 03:06:52.500 WRITE DMA ca d0 20 c5 51 67 e3 00 03:06:52.000 WRITE DMA c8 d0 20 75 da ba e3 00 03:06:51.700 READ DMA c8 d0 20 15 de e6 e3 00 03:06:51.500 READ DMA Error 1 occurred at disk power-on lifetime: 673 hours (28 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 10 51 08 6d 97 12 e3 Error: IDNF at LBA = 0x0312976d = 51550061 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ca 00 08 6d 97 12 e3 00 03:00:20.000 WRITE DMA c8 00 20 75 37 b9 e3 00 03:00:19.900 READ DMA c8 00 20 f5 56 d2 e3 00 03:00:19.800 READ DMA c8 00 20 a5 95 e3 e3 00 03:00:19.600 READ DMA c8 00 20 35 08 e1 e3 00 03:00:19.600 READ DMA SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Warning! SMART Selective Self-Test Log Structure error: invalid SMART checksum. SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. |
From: Bruce A. <ba...@gr...> - 2004-10-27 07:51:13
|
> Hi. Just recently, during very heavy disk activity, I noticed two events > in the winxp event log claiming "The device, \Device\Ide\IdePort0, did > not respond within the timeout period.". There is also an event from > yesterday claiming "An error was detected on device \Device\Harddisk0\D > during a paging operation." This laptop drive is 2-3 months old and was > replaced because the original drive became faulty. > > Booting into linux and running smartctl listed two IDNF error events. > One of these is for LBA "0x0000003f", which I think is very strange - I > thought the first 63 sectors were reserved for the MBR/partition table > and cannot fanthom why winxp would want to write there. I agree -- why would the OS want to write there unless you were trying to repartition the disk or install some other booter there? > I'm also puzzled by the "Reallocated event count" being 18 (what does > this attribute really mean?) while there is 0 raw read errors, 0 > reallocated sectors, 0 current_pending_sectors, 0 offline > uncorrectable and 0 udma crc errors. Did the drive really reallocate > any sectors? I think it means that there were 18 instances where it 'thought about' doing reallocation. But it didn't, in the end, decide to do this. > (I later discovered smartctl was ported to cygwin, running that version > provides the same results, except it appears the drive is now in the > smartctl database (v5.33 vs v5.32) > > Performing a short self-test PASSES, and trying to read sector 63 by > doing "hexdump -C /dev/hda" for a while seems to work perfectly. > > Is this a cause for concern? I'm not sure. As far as I can tell, the disk looks OK. For confidence, I suggest that you do a long self-test (-t long). > Please cc: me on any replies as I am not subscribed to the list. OK, done. Please continue to copy the list on any replies. Cheers, Bruce > Here's the output from running smartctl under linux (Debian unstable) > > smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > === START OF INFORMATION SECTION === > Device Model: HTS548060M9AT00 > Serial Number: <removed> > Firmware Version: MGBOA53A > Device is: Not in smartctl database [for details use: -P showall] > ATA Version is: 6 > ATA Standard is: ATA/ATAPI-6 T13 1410D revision 3a > Local Time is: Sun Oct 24 22:47:49 2004 CEST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x00) Offline data collection activity > was never started. > Auto Offline Data Collection: Disabled. > Self-test execution status: ( 0) The previous self-test routine > completed > without error or no self-test has ever > been run. > Total time to complete Offline > data collection: ( 645) seconds. > Offline data collection > capabilities: (0x5b) SMART execute Offline immediate. > Auto Offline data collection on/off support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > No Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 46) minutes. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000b 100 100 062 Pre-fail > Always - 0 > 2 Throughput_Performance 0x0005 100 100 040 Pre-fail > Offline - 0 > 3 Spin_Up_Time 0x0007 200 200 033 Pre-fail > Always - 2 > 4 Start_Stop_Count 0x0012 100 100 000 Old_age > Always - 67 > 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail > Always - 0 > 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail > Always - 0 > 8 Seek_Time_Performance 0x0005 100 100 040 Pre-fail > Offline - 0 > 9 Power_On_Hours 0x0012 100 100 000 Old_age > Always - 22 > 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail > Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age > Always - 59 > 191 G-Sense_Error_Rate 0x000a 081 081 000 Old_age Always > - 6357592 > 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always > - 3 > 193 Load_Cycle_Count 0x0012 099 099 000 Old_age Always > - 16583 > 194 Temperature_Celsius 0x0002 125 125 000 Old_age Always > - 44 (Lifetime Min/Max 14/52) > 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always > - 18 > 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always > - 0 > 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age > Offline - 0 > 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always > - 0 > > SMART Error Log Version: 1 > ATA Error Count: 2 > CR = Command Register [HEX] > FR = Features Register [HEX] > SC = Sector Count Register [HEX] > SN = Sector Number Register [HEX] > CL = Cylinder Low Register [HEX] > CH = Cylinder High Register [HEX] > DH = Device/Head Register [HEX] > DC = Device Command Register [HEX] > ER = Error register [HEX] > ST = Status register [HEX] > Powered_Up_Time is measured from power on, and printed as > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, > SS=sec, and sss=millisec. It "wraps" after 49.710 days. > > Error 2 occurred at disk power-on lifetime: 673 hours (28 days + 1 hours) > When the command that caused the error occurred, the device was > active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 10 51 02 3f 00 00 e0 Error: IDNF at LBA = 0x0000003f = 63 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > ca d0 02 3f 00 00 e0 00 03:06:52.600 WRITE DMA > ca d0 20 35 cb 6f e3 00 03:06:52.500 WRITE DMA > ca d0 20 c5 51 67 e3 00 03:06:52.000 WRITE DMA > c8 d0 20 75 da ba e3 00 03:06:51.700 READ DMA > c8 d0 20 15 de e6 e3 00 03:06:51.500 READ DMA > > Error 1 occurred at disk power-on lifetime: 673 hours (28 days + 1 hours) > When the command that caused the error occurred, the device was > active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 10 51 08 6d 97 12 e3 Error: IDNF at LBA = 0x0312976d = 51550061 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > ca 00 08 6d 97 12 e3 00 03:00:20.000 WRITE DMA > c8 00 20 75 37 b9 e3 00 03:00:19.900 READ DMA > c8 00 20 f5 56 d2 e3 00 03:00:19.800 READ DMA > c8 00 20 a5 95 e3 e3 00 03:00:19.600 READ DMA > c8 00 20 35 08 e1 e3 00 03:00:19.600 READ DMA > > SMART Self-test log structure revision number 1 > No self-tests have been logged. [To run self-tests, use: smartctl -t] > > > Warning! SMART Selective Self-Test Log Structure error: invalid SMART > checksum. > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute delay. > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > > |
From: Frode E. M. <fr...@fr...> - 2004-10-27 08:01:58
|
On Wed, Oct 27, 2004 at 02:51:07 -0500, Bruce Allen wrote: > > Booting into linux and running smartctl listed two IDNF error events. > > One of these is for LBA "0x0000003f", which I think is very strange - I > > thought the first 63 sectors were reserved for the MBR/partition table > > and cannot fanthom why winxp would want to write there. > I agree -- why would the OS want to write there unless you were trying to > repartition the disk or install some other booter there? I wasn't doing anything like that. Strange :) > > I'm also puzzled by the "Reallocated event count" being 18 (what does > > this attribute really mean?) while there is 0 raw read errors, 0 > > reallocated sectors, 0 current_pending_sectors, 0 offline > > uncorrectable and 0 udma crc errors. Did the drive really reallocate > > any sectors? > I think it means that there were 18 instances where it 'thought about' > doing reallocation. But it didn't, in the end, decide to do this. Ok, thanks for the explanation! > > Performing a short self-test PASSES, and trying to read sector 63 by > > doing "hexdump -C /dev/hda" for a while seems to work perfectly. > > Is this a cause for concern? > I'm not sure. As far as I can tell, the disk looks OK. For confidence, I > suggest that you do a long self-test (-t long). I'll do that. Thanks for the analysis! |