From: João C. M. L. <jo...@jo...> - 2008-11-26 05:37:22
|
Hi, I've already read the instructions at http://smartmontools.sourceforge.net/BadBlockHowTo.txt, but I still have problems with an Offline Uncorrectable sector in my home sata disk. Here is a full report: # smartctl -d ata -a /dev/sdf smartctl version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.8 family Device Model: ST3250823AS Serial Number: 3ND02SXR Firmware Version: 3.02 User Capacity: 250,059,350,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Nov 26 03:04:48 2008 BRST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 84) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 048 044 006 Pre-fail Always - 130922921 3 Spin_Up_Time 0x0003 098 098 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 732 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 1 7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail Always - 758914439 9 Power_On_Hours 0x0032 078 078 000 Old_age Always - 19733 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 839 194 Temperature_Celsius 0x0022 046 051 000 Old_age Always - 46 (Lifetime Min/Max 0/21) 195 Hardware_ECC_Recovered 0x001a 048 044 000 Old_age Always - 130922921 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 3 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 9 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 9 occurred at disk power-on lifetime: 7336 hours (305 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 3f 0e 17 ae e0 Error: UNC 63 sectors at LBA = 0x00ae170e = 11409166 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 80 cf 16 ae e0 00 00:12:06.015 READ DMA EXT 25 00 80 4f 16 ae e0 00 00:12:06.013 READ DMA EXT 25 00 80 cf 15 ae e0 00 00:12:06.011 READ DMA EXT 25 00 80 4f 15 ae e0 00 00:12:06.007 READ DMA EXT 25 00 40 0f 15 ae e0 00 00:12:06.007 READ DMA EXT Error 8 occurred at disk power-on lifetime: 7336 hours (305 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 f7 62 77 e0 Error: UNC 8 sectors at LBA = 0x007762f7 = 7824119 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 80 ef 62 77 e0 00 00:11:25.387 READ DMA EXT 25 00 80 6f 62 77 e0 00 00:11:25.386 READ DMA EXT 25 00 80 ef 61 77 e0 00 00:11:25.385 READ DMA EXT 25 00 80 6f 61 77 e0 00 00:11:25.376 READ DMA EXT 25 00 80 ef 60 77 e0 00 00:11:25.375 READ DMA EXT Error 7 occurred at disk power-on lifetime: 6733 hours (280 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 04 33 ed d7 e0 Error: UNC 4 sectors at LBA = 0x00d7ed33 = 14150963 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 20 2f ed d7 e0 00 02:08:44.983 READ DMA EXT 25 00 20 4f ed d7 e0 00 02:08:44.983 READ DMA EXT 25 00 11 6f ed d7 e0 00 02:08:45.476 READ DMA EXT 25 00 2f 80 ed d7 e0 00 02:08:45.137 READ DMA EXT 25 00 20 af ed d7 e0 00 02:08:45.035 READ DMA EXT Error 6 occurred at disk power-on lifetime: 3625 hours (151 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 18 67 ec a4 e0 Error: UNC 24 sectors at LBA = 0x00a4ec67 = 10808423 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 40 4f ec a4 e0 00 14:08:38.247 READ DMA EXT 25 00 08 ac af bb e0 00 14:08:38.205 READ DMA EXT 25 00 2f cf c0 20 e0 00 14:08:38.204 READ DMA EXT 25 00 15 0f ec a4 e0 00 14:08:38.203 READ DMA EXT 25 00 2e cf eb a4 e0 00 14:08:38.167 READ DMA EXT Error 5 occurred at disk power-on lifetime: 5085 hours (211 days + 21 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 1f 6e ed a0 e0 Error: UNC 31 sectors at LBA = 0x00a0ed6e = 10546542 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 20 4f ed a0 e0 00 1d+03:52:43.819 READ DMA EXT 25 00 60 af 0e 1a e0 00 1d+03:52:43.779 READ DMA EXT 25 00 40 ef fc 19 e0 00 1d+03:52:43.736 READ DMA EXT 25 00 40 af fc 19 e0 00 1d+03:52:43.736 READ DMA EXT 25 00 20 ef ed a0 e0 00 1d+03:52:43.709 READ DMA EXT SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 19723 - # 2 Extended offline Aborted by host 90% 19719 - # 3 Short offline Completed without error 00% 19718 - # 4 Extended offline Completed without error 00% 18950 - # 5 Extended offline Completed without error 00% 18947 - # 6 Extended offline Completed without error 00% 18939 - # 7 Extended offline Completed without error 00% 18927 - # 8 Extended offline Completed without error 00% 18738 - # 9 Short offline Completed without error 00% 18737 - #10 Extended offline Interrupted (host reset) 90% 10838 - #11 Short offline Completed without error 00% 10837 - #12 Short offline Completed without error 00% 10836 - #13 Extended offline Interrupted (host reset) 50% 9821 - #14 Extended offline Interrupted (host reset) 90% 262 - #15 Extended offline Interrupted (host reset) 70% 262 - #16 Short offline Completed without error 00% 258 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. I've already zeroed the whole disk (dd if=/dev/zero of=/dev/sdf), and made many self tests but the error did not go away. Is there another option to reset that flag? Maybe some kind of Smart Reset to clear all smart data and start from beginning? What do you recommend? I'll gladly help with any test you want me to try. I have some background in kernel programming and electrical engineering, just in case, although both are probably a little bit rusty now. My main job is Operating Systems Support in a big Brazilian ISP Provider. My system is currently a CentOS 5.2, x86_64, but I may "upgrade" it to Fedora 10 soon. Thanks in advance, Jonny |
From: Bruce A. <ba...@gr...> - 2008-12-16 13:12:06
|
Apparently some disk drives do not reset the pending sector counts to zero, even after they have been reallocated or have become readable again. Your disk shows one reallocated sector, so that's at least a sign that the disk has repaired the bad sector. On Wed, 26 Nov 2008, João Carlos Mendes Luís wrote: > Hi, > > I've already read the instructions at > http://smartmontools.sourceforge.net/BadBlockHowTo.txt, but I still have > problems with an Offline Uncorrectable sector in my home sata disk. > > Here is a full report: > > # smartctl -d ata -a /dev/sdf > smartctl version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 > Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > === START OF INFORMATION SECTION === > Model Family: Seagate Barracuda 7200.8 family > Device Model: ST3250823AS > Serial Number: 3ND02SXR > Firmware Version: 3.02 > User Capacity: 250,059,350,016 bytes > Device is: In smartctl database [for details use: -P show] > ATA Version is: 7 > ATA Standard is: Exact ATA specification draft version not indicated > Local Time is: Wed Nov 26 03:04:48 2008 BRST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x82) Offline data collection activity > was completed without error. > Auto Offline Data Collection: > Enabled. > Self-test execution status: ( 0) The previous self-test routine > completed > without error or no self-test > has ever > been run. > Total time to complete Offline > data collection: ( 430) seconds. > Offline data collection > capabilities: (0x5b) SMART execute Offline immediate. > Auto Offline data collection > on/off support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > No Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine > recommended polling time: ( 1) minutes. > Extended self-test routine > recommended polling time: ( 84) minutes. > > SMART Attributes Data Structure revision number: 10 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 048 044 006 Pre-fail > Always - 130922921 > 3 Spin_Up_Time 0x0003 098 098 000 Pre-fail > Always - 0 > 4 Start_Stop_Count 0x0032 100 100 020 Old_age > Always - 732 > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail > Always - 1 > 7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail > Always - 758914439 > 9 Power_On_Hours 0x0032 078 078 000 Old_age > Always - 19733 > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail > Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age > Always - 839 > 194 Temperature_Celsius 0x0022 046 051 000 Old_age > Always - 46 (Lifetime Min/Max 0/21) > 195 Hardware_ECC_Recovered 0x001a 048 044 000 Old_age > Always - 130922921 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age > Always - 1 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age > Offline - 1 > 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age > Always - 3 > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age > Offline - 0 > 202 TA_Increase_Count 0x0032 100 253 000 Old_age > Always - 0 > > SMART Error Log Version: 1 > ATA Error Count: 9 (device log contains only the most recent five errors) > CR = Command Register [HEX] > FR = Features Register [HEX] > SC = Sector Count Register [HEX] > SN = Sector Number Register [HEX] > CL = Cylinder Low Register [HEX] > CH = Cylinder High Register [HEX] > DH = Device/Head Register [HEX] > DC = Device Command Register [HEX] > ER = Error register [HEX] > ST = Status register [HEX] > Powered_Up_Time is measured from power on, and printed as > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, > SS=sec, and sss=millisec. It "wraps" after 49.710 days. > > Error 9 occurred at disk power-on lifetime: 7336 hours (305 days + 16 hours) > When the command that caused the error occurred, the device was active > or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 3f 0e 17 ae e0 Error: UNC 63 sectors at LBA = 0x00ae170e = 11409166 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > 25 00 80 cf 16 ae e0 00 00:12:06.015 READ DMA EXT > 25 00 80 4f 16 ae e0 00 00:12:06.013 READ DMA EXT > 25 00 80 cf 15 ae e0 00 00:12:06.011 READ DMA EXT > 25 00 80 4f 15 ae e0 00 00:12:06.007 READ DMA EXT > 25 00 40 0f 15 ae e0 00 00:12:06.007 READ DMA EXT > > Error 8 occurred at disk power-on lifetime: 7336 hours (305 days + 16 hours) > When the command that caused the error occurred, the device was active > or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 08 f7 62 77 e0 Error: UNC 8 sectors at LBA = 0x007762f7 = 7824119 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > 25 00 80 ef 62 77 e0 00 00:11:25.387 READ DMA EXT > 25 00 80 6f 62 77 e0 00 00:11:25.386 READ DMA EXT > 25 00 80 ef 61 77 e0 00 00:11:25.385 READ DMA EXT > 25 00 80 6f 61 77 e0 00 00:11:25.376 READ DMA EXT > 25 00 80 ef 60 77 e0 00 00:11:25.375 READ DMA EXT > > Error 7 occurred at disk power-on lifetime: 6733 hours (280 days + 13 hours) > When the command that caused the error occurred, the device was active > or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 04 33 ed d7 e0 Error: UNC 4 sectors at LBA = 0x00d7ed33 = 14150963 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > 25 00 20 2f ed d7 e0 00 02:08:44.983 READ DMA EXT > 25 00 20 4f ed d7 e0 00 02:08:44.983 READ DMA EXT > 25 00 11 6f ed d7 e0 00 02:08:45.476 READ DMA EXT > 25 00 2f 80 ed d7 e0 00 02:08:45.137 READ DMA EXT > 25 00 20 af ed d7 e0 00 02:08:45.035 READ DMA EXT > > Error 6 occurred at disk power-on lifetime: 3625 hours (151 days + 1 hours) > When the command that caused the error occurred, the device was active > or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 18 67 ec a4 e0 Error: UNC 24 sectors at LBA = 0x00a4ec67 = 10808423 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > 25 00 40 4f ec a4 e0 00 14:08:38.247 READ DMA EXT > 25 00 08 ac af bb e0 00 14:08:38.205 READ DMA EXT > 25 00 2f cf c0 20 e0 00 14:08:38.204 READ DMA EXT > 25 00 15 0f ec a4 e0 00 14:08:38.203 READ DMA EXT > 25 00 2e cf eb a4 e0 00 14:08:38.167 READ DMA EXT > > Error 5 occurred at disk power-on lifetime: 5085 hours (211 days + 21 hours) > When the command that caused the error occurred, the device was active > or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 1f 6e ed a0 e0 Error: UNC 31 sectors at LBA = 0x00a0ed6e = 10546542 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > 25 00 20 4f ed a0 e0 00 1d+03:52:43.819 READ DMA EXT > 25 00 60 af 0e 1a e0 00 1d+03:52:43.779 READ DMA EXT > 25 00 40 ef fc 19 e0 00 1d+03:52:43.736 READ DMA EXT > 25 00 40 af fc 19 e0 00 1d+03:52:43.736 READ DMA EXT > 25 00 20 ef ed a0 e0 00 1d+03:52:43.709 READ DMA EXT > > SMART Self-test log structure revision number 1 > Num Test_Description Status Remaining > LifeTime(hours) LBA_of_first_error > # 1 Extended offline Completed without error 00% > 19723 - > # 2 Extended offline Aborted by host 90% > 19719 - > # 3 Short offline Completed without error 00% > 19718 - > # 4 Extended offline Completed without error 00% > 18950 - > # 5 Extended offline Completed without error 00% > 18947 - > # 6 Extended offline Completed without error 00% > 18939 - > # 7 Extended offline Completed without error 00% > 18927 - > # 8 Extended offline Completed without error 00% > 18738 - > # 9 Short offline Completed without error 00% > 18737 - > #10 Extended offline Interrupted (host reset) 90% > 10838 - > #11 Short offline Completed without error 00% > 10837 - > #12 Short offline Completed without error 00% > 10836 - > #13 Extended offline Interrupted (host reset) 50% > 9821 - > #14 Extended offline Interrupted (host reset) 90% > 262 - > #15 Extended offline Interrupted (host reset) 70% > 262 - > #16 Short offline Completed without error 00% > 258 - > > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute delay. > > I've already zeroed the whole disk (dd if=/dev/zero of=/dev/sdf), and > made many self tests but the error did not go away. > > Is there another option to reset that flag? Maybe some kind of Smart > Reset to clear all smart data and start from beginning? > > What do you recommend? I'll gladly help with any test you want me to > try. I have some background in kernel programming and electrical > engineering, just in case, although both are probably a little bit rusty > now. My main job is Operating Systems Support in a big Brazilian ISP > Provider. > > My system is currently a CentOS 5.2, x86_64, but I may "upgrade" it to > Fedora 10 soon. > > Thanks in advance, > > Jonny > > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. > The future of the web can't happen without you. Join us at MIX09 to help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |
From: Bruce A. <ba...@gr...> - 2008-12-18 02:44:01
|
Yes, we need a patch to smartd, so that this number is only reported if it CHANGES. On Wed, 17 Dec 2008, João Carlos Mendes Luís wrote: > I don't remember seeing anythong about this the in the FAQ. Sorry if it's > there. > > The biggest problem for me is that keep receiving emails with this "error". > Maybe we should then have another option to configure smartd. Something like > "Do not report pending and unreadable sectors if less than N". As far as I > could see, we can only test it for non zero. Disabling this report > completely is not very safe > > > ..... > The following warning/error was logged by the smartd daemon: > > Device: /dev/sdf, 1 Offline uncorrectable sectors > > For details see host's SYSLOG (default: /var/log/messages). > .... > The following warning/error was logged by the smartd daemon: > > Device: /dev/sdf, 1 Currently unreadable (pending) sectors > > For details see host's SYSLOG (default: /var/log/messages). > ..... > > > Bruce Allen wrote: >> Apparently some disk drives do not reset the pending sector counts to >> zero, even after they have been reallocated or have become readable again. >> Your disk shows one reallocated sector, so that's at least a sign that the >> disk has repaired the bad sector. >> >> On Wed, 26 Nov 2008, João Carlos Mendes Luís wrote: >> >> > Hi, >> > >> > I've already read the instructions at >> > http://smartmontools.sourceforge.net/BadBlockHowTo.txt, but I still have >> > problems with an Offline Uncorrectable sector in my home sata disk. >> > >> > Here is a full report: >> > >> > # smartctl -d ata -a /dev/sdf >> > smartctl version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 >> > Bruce Allen >> > Home page is http://smartmontools.sourceforge.net/ >> > >> > === START OF INFORMATION SECTION === >> > Model Family: Seagate Barracuda 7200.8 family >> > Device Model: ST3250823AS >> > Serial Number: 3ND02SXR >> > Firmware Version: 3.02 >> > User Capacity: 250,059,350,016 bytes >> > Device is: In smartctl database [for details use: -P show] >> > ATA Version is: 7 >> > ATA Standard is: Exact ATA specification draft version not indicated >> > Local Time is: Wed Nov 26 03:04:48 2008 BRST >> > SMART support is: Available - device has SMART capability. >> > SMART support is: Enabled >> > >> > === START OF READ SMART DATA SECTION === >> > SMART overall-health self-assessment test result: PASSED >> > >> > General SMART Values: >> > Offline data collection status: (0x82) Offline data collection activity >> > was completed without error. >> > Auto Offline Data Collection: >> > Enabled. >> > Self-test execution status: ( 0) The previous self-test routine >> > completed >> > without error or no self-test >> > has ever >> > been run. >> > Total time to complete Offline >> > data collection: ( 430) seconds. >> > Offline data collection >> > capabilities: (0x5b) SMART execute Offline immediate. >> > Auto Offline data collection >> > on/off support. >> > Suspend Offline collection upon >> > new >> > command. >> > Offline surface scan supported. >> > Self-test supported. >> > No Conveyance Self-test >> > supported. >> > Selective Self-test supported. >> > SMART capabilities: (0x0003) Saves SMART data before entering >> > power-saving mode. >> > Supports SMART auto save timer. >> > Error logging capability: (0x01) Error logging supported. >> > General Purpose Logging >> > supported. >> > Short self-test routine >> > recommended polling time: ( 1) minutes. >> > Extended self-test routine >> > recommended polling time: ( 84) minutes. >> > >> > SMART Attributes Data Structure revision number: 10 >> > Vendor Specific SMART Attributes with Thresholds: >> > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE >> > UPDATED WHEN_FAILED RAW_VALUE >> > 1 Raw_Read_Error_Rate 0x000f 048 044 006 Pre-fail >> > Always - 130922921 >> > 3 Spin_Up_Time 0x0003 098 098 000 Pre-fail >> > Always - 0 >> > 4 Start_Stop_Count 0x0032 100 100 020 Old_age >> > Always - 732 >> > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail >> > Always - 1 >> > 7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail >> > Always - 758914439 >> > 9 Power_On_Hours 0x0032 078 078 000 Old_age >> > Always - 19733 >> > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail >> > Always - 0 >> > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age >> > Always - 839 >> > 194 Temperature_Celsius 0x0022 046 051 000 Old_age >> > Always - 46 (Lifetime Min/Max 0/21) >> > 195 Hardware_ECC_Recovered 0x001a 048 044 000 Old_age >> > Always - 130922921 >> > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age >> > Always - 1 >> > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age >> > Offline - 1 >> > 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age >> > Always - 3 >> > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age >> > Offline - 0 >> > 202 TA_Increase_Count 0x0032 100 253 000 Old_age >> > Always - 0 >> > >> > SMART Error Log Version: 1 >> > ATA Error Count: 9 (device log contains only the most recent five >> > errors) >> > CR = Command Register [HEX] >> > FR = Features Register [HEX] >> > SC = Sector Count Register [HEX] >> > SN = Sector Number Register [HEX] >> > CL = Cylinder Low Register [HEX] >> > CH = Cylinder High Register [HEX] >> > DH = Device/Head Register [HEX] >> > DC = Device Command Register [HEX] >> > ER = Error register [HEX] >> > ST = Status register [HEX] >> > Powered_Up_Time is measured from power on, and printed as >> > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, >> > SS=sec, and sss=millisec. It "wraps" after 49.710 days. >> > >> > Error 9 occurred at disk power-on lifetime: 7336 hours (305 days + 16 >> > hours) >> > When the command that caused the error occurred, the device was active >> > or idle. >> > >> > After command completion occurred, registers were: >> > ER ST SC SN CL CH DH >> > -- -- -- -- -- -- -- >> > 40 51 3f 0e 17 ae e0 Error: UNC 63 sectors at LBA = 0x00ae170e = >> > 11409166 >> > >> > Commands leading to the command that caused the error were: >> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >> > -- -- -- -- -- -- -- -- ---------------- -------------------- >> > 25 00 80 cf 16 ae e0 00 00:12:06.015 READ DMA EXT >> > 25 00 80 4f 16 ae e0 00 00:12:06.013 READ DMA EXT >> > 25 00 80 cf 15 ae e0 00 00:12:06.011 READ DMA EXT >> > 25 00 80 4f 15 ae e0 00 00:12:06.007 READ DMA EXT >> > 25 00 40 0f 15 ae e0 00 00:12:06.007 READ DMA EXT >> > >> > Error 8 occurred at disk power-on lifetime: 7336 hours (305 days + 16 >> > hours) >> > When the command that caused the error occurred, the device was active >> > or idle. >> > >> > After command completion occurred, registers were: >> > ER ST SC SN CL CH DH >> > -- -- -- -- -- -- -- >> > 40 51 08 f7 62 77 e0 Error: UNC 8 sectors at LBA = 0x007762f7 = >> > 7824119 >> > >> > Commands leading to the command that caused the error were: >> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >> > -- -- -- -- -- -- -- -- ---------------- -------------------- >> > 25 00 80 ef 62 77 e0 00 00:11:25.387 READ DMA EXT >> > 25 00 80 6f 62 77 e0 00 00:11:25.386 READ DMA EXT >> > 25 00 80 ef 61 77 e0 00 00:11:25.385 READ DMA EXT >> > 25 00 80 6f 61 77 e0 00 00:11:25.376 READ DMA EXT >> > 25 00 80 ef 60 77 e0 00 00:11:25.375 READ DMA EXT >> > >> > Error 7 occurred at disk power-on lifetime: 6733 hours (280 days + 13 >> > hours) >> > When the command that caused the error occurred, the device was active >> > or idle. >> > >> > After command completion occurred, registers were: >> > ER ST SC SN CL CH DH >> > -- -- -- -- -- -- -- >> > 40 51 04 33 ed d7 e0 Error: UNC 4 sectors at LBA = 0x00d7ed33 = >> > 14150963 >> > >> > Commands leading to the command that caused the error were: >> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >> > -- -- -- -- -- -- -- -- ---------------- -------------------- >> > 25 00 20 2f ed d7 e0 00 02:08:44.983 READ DMA EXT >> > 25 00 20 4f ed d7 e0 00 02:08:44.983 READ DMA EXT >> > 25 00 11 6f ed d7 e0 00 02:08:45.476 READ DMA EXT >> > 25 00 2f 80 ed d7 e0 00 02:08:45.137 READ DMA EXT >> > 25 00 20 af ed d7 e0 00 02:08:45.035 READ DMA EXT >> > >> > Error 6 occurred at disk power-on lifetime: 3625 hours (151 days + 1 >> > hours) >> > When the command that caused the error occurred, the device was active >> > or idle. >> > >> > After command completion occurred, registers were: >> > ER ST SC SN CL CH DH >> > -- -- -- -- -- -- -- >> > 40 51 18 67 ec a4 e0 Error: UNC 24 sectors at LBA = 0x00a4ec67 = >> > 10808423 >> > >> > Commands leading to the command that caused the error were: >> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >> > -- -- -- -- -- -- -- -- ---------------- -------------------- >> > 25 00 40 4f ec a4 e0 00 14:08:38.247 READ DMA EXT >> > 25 00 08 ac af bb e0 00 14:08:38.205 READ DMA EXT >> > 25 00 2f cf c0 20 e0 00 14:08:38.204 READ DMA EXT >> > 25 00 15 0f ec a4 e0 00 14:08:38.203 READ DMA EXT >> > 25 00 2e cf eb a4 e0 00 14:08:38.167 READ DMA EXT >> > >> > Error 5 occurred at disk power-on lifetime: 5085 hours (211 days + 21 >> > hours) >> > When the command that caused the error occurred, the device was active >> > or idle. >> > >> > After command completion occurred, registers were: >> > ER ST SC SN CL CH DH >> > -- -- -- -- -- -- -- >> > 40 51 1f 6e ed a0 e0 Error: UNC 31 sectors at LBA = 0x00a0ed6e = >> > 10546542 >> > >> > Commands leading to the command that caused the error were: >> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >> > -- -- -- -- -- -- -- -- ---------------- -------------------- >> > 25 00 20 4f ed a0 e0 00 1d+03:52:43.819 READ DMA EXT >> > 25 00 60 af 0e 1a e0 00 1d+03:52:43.779 READ DMA EXT >> > 25 00 40 ef fc 19 e0 00 1d+03:52:43.736 READ DMA EXT >> > 25 00 40 af fc 19 e0 00 1d+03:52:43.736 READ DMA EXT >> > 25 00 20 ef ed a0 e0 00 1d+03:52:43.709 READ DMA EXT >> > >> > SMART Self-test log structure revision number 1 >> > Num Test_Description Status Remaining >> > LifeTime(hours) LBA_of_first_error >> > # 1 Extended offline Completed without error 00% >> > 19723 - >> > # 2 Extended offline Aborted by host 90% >> > 19719 - >> > # 3 Short offline Completed without error 00% >> > 19718 - >> > # 4 Extended offline Completed without error 00% >> > 18950 - >> > # 5 Extended offline Completed without error 00% >> > 18947 - >> > # 6 Extended offline Completed without error 00% >> > 18939 - >> > # 7 Extended offline Completed without error 00% >> > 18927 - >> > # 8 Extended offline Completed without error 00% >> > 18738 - >> > # 9 Short offline Completed without error 00% >> > 18737 - >> > #10 Extended offline Interrupted (host reset) 90% >> > 10838 - >> > #11 Short offline Completed without error 00% >> > 10837 - >> > #12 Short offline Completed without error 00% >> > 10836 - >> > #13 Extended offline Interrupted (host reset) 50% >> > 9821 - >> > #14 Extended offline Interrupted (host reset) 90% >> > 262 - >> > #15 Extended offline Interrupted (host reset) 70% >> > 262 - >> > #16 Short offline Completed without error 00% >> > 258 - >> > >> > SMART Selective self-test log data structure revision number 1 >> > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS >> > 1 0 0 Not_testing >> > 2 0 0 Not_testing >> > 3 0 0 Not_testing >> > 4 0 0 Not_testing >> > 5 0 0 Not_testing >> > Selective self-test flags (0x0): >> > After scanning selected spans, do NOT read-scan remainder of disk. >> > If Selective self-test is pending on power-up, resume after 0 minute >> > delay. >> > >> > I've already zeroed the whole disk (dd if=/dev/zero of=/dev/sdf), and >> > made many self tests but the error did not go away. >> > >> > Is there another option to reset that flag? Maybe some kind of Smart >> > Reset to clear all smart data and start from beginning? >> > >> > What do you recommend? I'll gladly help with any test you want me to >> > try. I have some background in kernel programming and electrical >> > engineering, just in case, although both are probably a little bit rusty >> > now. My main job is Operating Systems Support in a big Brazilian ISP >> > Provider. >> > >> > My system is currently a CentOS 5.2, x86_64, but I may "upgrade" it to >> > Fedora 10 soon. >> > >> > Thanks in advance, >> > >> > Jonny >> > >> > >> > ------------------------------------------------------------------------------ >> > >> > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >> > Nevada. >> > The future of the web can't happen without you. Join us at MIX09 to >> > help >> > pave the way to the Next Web now. Learn more and register at >> > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >> > >>> _______________________________________________ >> > Smartmontools-support mailing list >> > Sma...@li... >> > https://lists.sourceforge.net/lists/listinfo/smartmontools-support >> > > > |
From: João C. M. L. <jo...@jo...> - 2008-12-18 12:55:00
|
The problem with "report on changes" is that the value could change while smartd is not running (for example, on a reboot, due to a crash), and the admins would never know that the problem is there and needs fixing. Also, I personally like applications that send email many times, until somebody notices and fixes the problem. Bruce Allen wrote: > Yes, we need a patch to smartd, so that this number is only reported > if it CHANGES. > > > On Wed, 17 Dec 2008, João Carlos Mendes Luís wrote: > >> I don't remember seeing anythong about this the in the FAQ. Sorry if >> it's there. >> >> The biggest problem for me is that keep receiving emails with this >> "error". Maybe we should then have another option to configure >> smartd. Something like "Do not report pending and unreadable sectors >> if less than N". As far as I could see, we can only test it for non >> zero. Disabling this report completely is not very safe >> >> >> ..... >> The following warning/error was logged by the smartd daemon: >> >> Device: /dev/sdf, 1 Offline uncorrectable sectors >> >> For details see host's SYSLOG (default: /var/log/messages). >> .... >> The following warning/error was logged by the smartd daemon: >> >> Device: /dev/sdf, 1 Currently unreadable (pending) sectors >> >> For details see host's SYSLOG (default: /var/log/messages). >> ..... >> >> >> Bruce Allen wrote: >>> Apparently some disk drives do not reset the pending sector counts to >>> zero, even after they have been reallocated or have become readable >>> again. >>> Your disk shows one reallocated sector, so that's at least a sign >>> that the >>> disk has repaired the bad sector. >>> >>> On Wed, 26 Nov 2008, João Carlos Mendes Luís wrote: >>> >>> > Hi, >>> > > I've already read the instructions at >>> > http://smartmontools.sourceforge.net/BadBlockHowTo.txt, but I >>> still have >>> > problems with an Offline Uncorrectable sector in my home sata disk. >>> > > Here is a full report: >>> > > # smartctl -d ata -a /dev/sdf >>> > smartctl version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 >>> > Bruce Allen >>> > Home page is http://smartmontools.sourceforge.net/ >>> > > === START OF INFORMATION SECTION === >>> > Model Family: Seagate Barracuda 7200.8 family >>> > Device Model: ST3250823AS >>> > Serial Number: 3ND02SXR >>> > Firmware Version: 3.02 >>> > User Capacity: 250,059,350,016 bytes >>> > Device is: In smartctl database [for details use: -P show] >>> > ATA Version is: 7 >>> > ATA Standard is: Exact ATA specification draft version not >>> indicated >>> > Local Time is: Wed Nov 26 03:04:48 2008 BRST >>> > SMART support is: Available - device has SMART capability. >>> > SMART support is: Enabled >>> > > === START OF READ SMART DATA SECTION === >>> > SMART overall-health self-assessment test result: PASSED >>> > > General SMART Values: >>> > Offline data collection status: (0x82) Offline data collection >>> activity >>> > was completed without error. >>> > Auto Offline Data Collection: >>> > Enabled. >>> > Self-test execution status: ( 0) The previous self-test >>> routine >>> > completed >>> > without error or no self-test >>> > has ever >>> > been run. >>> > Total time to complete Offline >>> > data collection: ( 430) seconds. >>> > Offline data collection >>> > capabilities: (0x5b) SMART execute Offline >>> immediate. >>> > Auto Offline data collection >>> > on/off support. >>> > Suspend Offline collection >>> upon > new >>> > command. >>> > Offline surface scan >>> supported. >>> > Self-test supported. >>> > No Conveyance Self-test > >>> supported. >>> > Selective Self-test >>> supported. >>> > SMART capabilities: (0x0003) Saves SMART data before >>> entering >>> > power-saving mode. >>> > Supports SMART auto save >>> timer. >>> > Error logging capability: (0x01) Error logging supported. >>> > General Purpose Logging > >>> supported. >>> > Short self-test routine >>> > recommended polling time: ( 1) minutes. >>> > Extended self-test routine >>> > recommended polling time: ( 84) minutes. >>> > > SMART Attributes Data Structure revision number: 10 >>> > Vendor Specific SMART Attributes with Thresholds: >>> > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE >>> > UPDATED WHEN_FAILED RAW_VALUE >>> > 1 Raw_Read_Error_Rate 0x000f 048 044 006 Pre-fail >>> > Always - 130922921 >>> > 3 Spin_Up_Time 0x0003 098 098 000 Pre-fail >>> > Always - 0 >>> > 4 Start_Stop_Count 0x0032 100 100 020 Old_age >>> > Always - 732 >>> > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail >>> > Always - 1 >>> > 7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail >>> > Always - 758914439 >>> > 9 Power_On_Hours 0x0032 078 078 000 Old_age >>> > Always - 19733 >>> > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail >>> > Always - 0 >>> > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age >>> > Always - 839 >>> > 194 Temperature_Celsius 0x0022 046 051 000 Old_age >>> > Always - 46 (Lifetime Min/Max 0/21) >>> > 195 Hardware_ECC_Recovered 0x001a 048 044 000 Old_age >>> > Always - 130922921 >>> > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age >>> > Always - 1 >>> > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age >>> > Offline - 1 >>> > 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age >>> > Always - 3 >>> > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age >>> > Offline - 0 >>> > 202 TA_Increase_Count 0x0032 100 253 000 Old_age >>> > Always - 0 >>> > > SMART Error Log Version: 1 >>> > ATA Error Count: 9 (device log contains only the most recent five >>> > errors) >>> > CR = Command Register [HEX] >>> > FR = Features Register [HEX] >>> > SC = Sector Count Register [HEX] >>> > SN = Sector Number Register [HEX] >>> > CL = Cylinder Low Register [HEX] >>> > CH = Cylinder High Register [HEX] >>> > DH = Device/Head Register [HEX] >>> > DC = Device Command Register [HEX] >>> > ER = Error register [HEX] >>> > ST = Status register [HEX] >>> > Powered_Up_Time is measured from power on, and printed as >>> > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, >>> > SS=sec, and sss=millisec. It "wraps" after 49.710 days. >>> > > Error 9 occurred at disk power-on lifetime: 7336 hours (305 >>> days + 16 > hours) >>> > When the command that caused the error occurred, the device was >>> active >>> > or idle. >>> > > After command completion occurred, registers were: >>> > ER ST SC SN CL CH DH >>> > -- -- -- -- -- -- -- >>> > 40 51 3f 0e 17 ae e0 Error: UNC 63 sectors at LBA = 0x00ae170e >>> = > 11409166 >>> > > Commands leading to the command that caused the error were: >>> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>> > -- -- -- -- -- -- -- -- ---------------- -------------------- >>> > 25 00 80 cf 16 ae e0 00 00:12:06.015 READ DMA EXT >>> > 25 00 80 4f 16 ae e0 00 00:12:06.013 READ DMA EXT >>> > 25 00 80 cf 15 ae e0 00 00:12:06.011 READ DMA EXT >>> > 25 00 80 4f 15 ae e0 00 00:12:06.007 READ DMA EXT >>> > 25 00 40 0f 15 ae e0 00 00:12:06.007 READ DMA EXT >>> > > Error 8 occurred at disk power-on lifetime: 7336 hours (305 >>> days + 16 > hours) >>> > When the command that caused the error occurred, the device was >>> active >>> > or idle. >>> > > After command completion occurred, registers were: >>> > ER ST SC SN CL CH DH >>> > -- -- -- -- -- -- -- >>> > 40 51 08 f7 62 77 e0 Error: UNC 8 sectors at LBA = 0x007762f7 = >>> > 7824119 >>> > > Commands leading to the command that caused the error were: >>> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>> > -- -- -- -- -- -- -- -- ---------------- -------------------- >>> > 25 00 80 ef 62 77 e0 00 00:11:25.387 READ DMA EXT >>> > 25 00 80 6f 62 77 e0 00 00:11:25.386 READ DMA EXT >>> > 25 00 80 ef 61 77 e0 00 00:11:25.385 READ DMA EXT >>> > 25 00 80 6f 61 77 e0 00 00:11:25.376 READ DMA EXT >>> > 25 00 80 ef 60 77 e0 00 00:11:25.375 READ DMA EXT >>> > > Error 7 occurred at disk power-on lifetime: 6733 hours (280 >>> days + 13 > hours) >>> > When the command that caused the error occurred, the device was >>> active >>> > or idle. >>> > > After command completion occurred, registers were: >>> > ER ST SC SN CL CH DH >>> > -- -- -- -- -- -- -- >>> > 40 51 04 33 ed d7 e0 Error: UNC 4 sectors at LBA = 0x00d7ed33 = >>> > 14150963 >>> > > Commands leading to the command that caused the error were: >>> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>> > -- -- -- -- -- -- -- -- ---------------- -------------------- >>> > 25 00 20 2f ed d7 e0 00 02:08:44.983 READ DMA EXT >>> > 25 00 20 4f ed d7 e0 00 02:08:44.983 READ DMA EXT >>> > 25 00 11 6f ed d7 e0 00 02:08:45.476 READ DMA EXT >>> > 25 00 2f 80 ed d7 e0 00 02:08:45.137 READ DMA EXT >>> > 25 00 20 af ed d7 e0 00 02:08:45.035 READ DMA EXT >>> > > Error 6 occurred at disk power-on lifetime: 3625 hours (151 >>> days + 1 > hours) >>> > When the command that caused the error occurred, the device was >>> active >>> > or idle. >>> > > After command completion occurred, registers were: >>> > ER ST SC SN CL CH DH >>> > -- -- -- -- -- -- -- >>> > 40 51 18 67 ec a4 e0 Error: UNC 24 sectors at LBA = 0x00a4ec67 >>> = > 10808423 >>> > > Commands leading to the command that caused the error were: >>> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>> > -- -- -- -- -- -- -- -- ---------------- -------------------- >>> > 25 00 40 4f ec a4 e0 00 14:08:38.247 READ DMA EXT >>> > 25 00 08 ac af bb e0 00 14:08:38.205 READ DMA EXT >>> > 25 00 2f cf c0 20 e0 00 14:08:38.204 READ DMA EXT >>> > 25 00 15 0f ec a4 e0 00 14:08:38.203 READ DMA EXT >>> > 25 00 2e cf eb a4 e0 00 14:08:38.167 READ DMA EXT >>> > > Error 5 occurred at disk power-on lifetime: 5085 hours (211 >>> days + 21 > hours) >>> > When the command that caused the error occurred, the device was >>> active >>> > or idle. >>> > > After command completion occurred, registers were: >>> > ER ST SC SN CL CH DH >>> > -- -- -- -- -- -- -- >>> > 40 51 1f 6e ed a0 e0 Error: UNC 31 sectors at LBA = 0x00a0ed6e >>> = > 10546542 >>> > > Commands leading to the command that caused the error were: >>> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>> > -- -- -- -- -- -- -- -- ---------------- -------------------- >>> > 25 00 20 4f ed a0 e0 00 1d+03:52:43.819 READ DMA EXT >>> > 25 00 60 af 0e 1a e0 00 1d+03:52:43.779 READ DMA EXT >>> > 25 00 40 ef fc 19 e0 00 1d+03:52:43.736 READ DMA EXT >>> > 25 00 40 af fc 19 e0 00 1d+03:52:43.736 READ DMA EXT >>> > 25 00 20 ef ed a0 e0 00 1d+03:52:43.709 READ DMA EXT >>> > > SMART Self-test log structure revision number 1 >>> > Num Test_Description Status Remaining >>> > LifeTime(hours) LBA_of_first_error >>> > # 1 Extended offline Completed without error 00% >>> > 19723 - >>> > # 2 Extended offline Aborted by host 90% >>> > 19719 - >>> > # 3 Short offline Completed without error 00% >>> > 19718 - >>> > # 4 Extended offline Completed without error 00% >>> > 18950 - >>> > # 5 Extended offline Completed without error 00% >>> > 18947 - >>> > # 6 Extended offline Completed without error 00% >>> > 18939 - >>> > # 7 Extended offline Completed without error 00% >>> > 18927 - >>> > # 8 Extended offline Completed without error 00% >>> > 18738 - >>> > # 9 Short offline Completed without error 00% >>> > 18737 - >>> > #10 Extended offline Interrupted (host reset) 90% >>> > 10838 - >>> > #11 Short offline Completed without error 00% >>> > 10837 - >>> > #12 Short offline Completed without error 00% >>> > 10836 - >>> > #13 Extended offline Interrupted (host reset) 50% >>> > 9821 - >>> > #14 Extended offline Interrupted (host reset) 90% >>> > 262 - >>> > #15 Extended offline Interrupted (host reset) 70% >>> > 262 - >>> > #16 Short offline Completed without error 00% >>> > 258 - >>> > > SMART Selective self-test log data structure revision number 1 >>> > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS >>> > 1 0 0 Not_testing >>> > 2 0 0 Not_testing >>> > 3 0 0 Not_testing >>> > 4 0 0 Not_testing >>> > 5 0 0 Not_testing >>> > Selective self-test flags (0x0): >>> > After scanning selected spans, do NOT read-scan remainder of disk. >>> > If Selective self-test is pending on power-up, resume after 0 >>> minute > delay. >>> > > I've already zeroed the whole disk (dd if=/dev/zero >>> of=/dev/sdf), and >>> > made many self tests but the error did not go away. >>> > > Is there another option to reset that flag? Maybe some kind of >>> Smart >>> > Reset to clear all smart data and start from beginning? >>> > > What do you recommend? I'll gladly help with any test you want >>> me to >>> > try. I have some background in kernel programming and electrical >>> > engineering, just in case, although both are probably a little >>> bit rusty >>> > now. My main job is Operating Systems Support in a big Brazilian >>> ISP >>> > Provider. >>> > > My system is currently a CentOS 5.2, x86_64, but I may >>> "upgrade" it to >>> > Fedora 10 soon. >>> > > Thanks in advance, >>> > > Jonny >>> > > > >>> ------------------------------------------------------------------------------ >>> > > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las >>> Vegas, > Nevada. >>> > The future of the web can't happen without you. Join us at MIX09 >>> to > help >>> > pave the way to the Next Web now. Learn more and register at >>> > >>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>> > >>>> _______________________________________________ >>> > Smartmontools-support mailing list >>> > Sma...@li... >>> > https://lists.sourceforge.net/lists/listinfo/smartmontools-support >>> > >> >> |
From: João C. M. L. <jo...@jo...> - 2008-12-17 18:15:13
|
I don't remember seeing anythong about this the in the FAQ. Sorry if it's there. The biggest problem for me is that keep receiving emails with this "error". Maybe we should then have another option to configure smartd. Something like "Do not report pending and unreadable sectors if less than N". As far as I could see, we can only test it for non zero. Disabling this report completely is not very safe ..... The following warning/error was logged by the smartd daemon: Device: /dev/sdf, 1 Offline uncorrectable sectors For details see host's SYSLOG (default: /var/log/messages). .... The following warning/error was logged by the smartd daemon: Device: /dev/sdf, 1 Currently unreadable (pending) sectors For details see host's SYSLOG (default: /var/log/messages). ..... Bruce Allen wrote: > Apparently some disk drives do not reset the pending sector counts to > zero, even after they have been reallocated or have become readable > again. > Your disk shows one reallocated sector, so that's at least a sign that > the disk has repaired the bad sector. > > On Wed, 26 Nov 2008, João Carlos Mendes Luís wrote: > >> Hi, >> >> I've already read the instructions at >> http://smartmontools.sourceforge.net/BadBlockHowTo.txt, but I still have >> problems with an Offline Uncorrectable sector in my home sata disk. >> >> Here is a full report: >> >> # smartctl -d ata -a /dev/sdf >> smartctl version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 >> Bruce Allen >> Home page is http://smartmontools.sourceforge.net/ >> >> === START OF INFORMATION SECTION === >> Model Family: Seagate Barracuda 7200.8 family >> Device Model: ST3250823AS >> Serial Number: 3ND02SXR >> Firmware Version: 3.02 >> User Capacity: 250,059,350,016 bytes >> Device is: In smartctl database [for details use: -P show] >> ATA Version is: 7 >> ATA Standard is: Exact ATA specification draft version not indicated >> Local Time is: Wed Nov 26 03:04:48 2008 BRST >> SMART support is: Available - device has SMART capability. >> SMART support is: Enabled >> >> === START OF READ SMART DATA SECTION === >> SMART overall-health self-assessment test result: PASSED >> >> General SMART Values: >> Offline data collection status: (0x82) Offline data collection activity >> was completed without error. >> Auto Offline Data Collection: >> Enabled. >> Self-test execution status: ( 0) The previous self-test routine >> completed >> without error or no self-test >> has ever >> been run. >> Total time to complete Offline >> data collection: ( 430) seconds. >> Offline data collection >> capabilities: (0x5b) SMART execute Offline immediate. >> Auto Offline data collection >> on/off support. >> Suspend Offline collection >> upon new >> command. >> Offline surface scan supported. >> Self-test supported. >> No Conveyance Self-test >> supported. >> Selective Self-test supported. >> SMART capabilities: (0x0003) Saves SMART data before entering >> power-saving mode. >> Supports SMART auto save timer. >> Error logging capability: (0x01) Error logging supported. >> General Purpose Logging >> supported. >> Short self-test routine >> recommended polling time: ( 1) minutes. >> Extended self-test routine >> recommended polling time: ( 84) minutes. >> >> SMART Attributes Data Structure revision number: 10 >> Vendor Specific SMART Attributes with Thresholds: >> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE >> UPDATED WHEN_FAILED RAW_VALUE >> 1 Raw_Read_Error_Rate 0x000f 048 044 006 Pre-fail >> Always - 130922921 >> 3 Spin_Up_Time 0x0003 098 098 000 Pre-fail >> Always - 0 >> 4 Start_Stop_Count 0x0032 100 100 020 Old_age >> Always - 732 >> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail >> Always - 1 >> 7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail >> Always - 758914439 >> 9 Power_On_Hours 0x0032 078 078 000 Old_age >> Always - 19733 >> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail >> Always - 0 >> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age >> Always - 839 >> 194 Temperature_Celsius 0x0022 046 051 000 Old_age >> Always - 46 (Lifetime Min/Max 0/21) >> 195 Hardware_ECC_Recovered 0x001a 048 044 000 Old_age >> Always - 130922921 >> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age >> Always - 1 >> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age >> Offline - 1 >> 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age >> Always - 3 >> 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age >> Offline - 0 >> 202 TA_Increase_Count 0x0032 100 253 000 Old_age >> Always - 0 >> >> SMART Error Log Version: 1 >> ATA Error Count: 9 (device log contains only the most recent five >> errors) >> CR = Command Register [HEX] >> FR = Features Register [HEX] >> SC = Sector Count Register [HEX] >> SN = Sector Number Register [HEX] >> CL = Cylinder Low Register [HEX] >> CH = Cylinder High Register [HEX] >> DH = Device/Head Register [HEX] >> DC = Device Command Register [HEX] >> ER = Error register [HEX] >> ST = Status register [HEX] >> Powered_Up_Time is measured from power on, and printed as >> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, >> SS=sec, and sss=millisec. It "wraps" after 49.710 days. >> >> Error 9 occurred at disk power-on lifetime: 7336 hours (305 days + 16 >> hours) >> When the command that caused the error occurred, the device was active >> or idle. >> >> After command completion occurred, registers were: >> ER ST SC SN CL CH DH >> -- -- -- -- -- -- -- >> 40 51 3f 0e 17 ae e0 Error: UNC 63 sectors at LBA = 0x00ae170e = >> 11409166 >> >> Commands leading to the command that caused the error were: >> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >> -- -- -- -- -- -- -- -- ---------------- -------------------- >> 25 00 80 cf 16 ae e0 00 00:12:06.015 READ DMA EXT >> 25 00 80 4f 16 ae e0 00 00:12:06.013 READ DMA EXT >> 25 00 80 cf 15 ae e0 00 00:12:06.011 READ DMA EXT >> 25 00 80 4f 15 ae e0 00 00:12:06.007 READ DMA EXT >> 25 00 40 0f 15 ae e0 00 00:12:06.007 READ DMA EXT >> >> Error 8 occurred at disk power-on lifetime: 7336 hours (305 days + 16 >> hours) >> When the command that caused the error occurred, the device was active >> or idle. >> >> After command completion occurred, registers were: >> ER ST SC SN CL CH DH >> -- -- -- -- -- -- -- >> 40 51 08 f7 62 77 e0 Error: UNC 8 sectors at LBA = 0x007762f7 = >> 7824119 >> >> Commands leading to the command that caused the error were: >> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >> -- -- -- -- -- -- -- -- ---------------- -------------------- >> 25 00 80 ef 62 77 e0 00 00:11:25.387 READ DMA EXT >> 25 00 80 6f 62 77 e0 00 00:11:25.386 READ DMA EXT >> 25 00 80 ef 61 77 e0 00 00:11:25.385 READ DMA EXT >> 25 00 80 6f 61 77 e0 00 00:11:25.376 READ DMA EXT >> 25 00 80 ef 60 77 e0 00 00:11:25.375 READ DMA EXT >> >> Error 7 occurred at disk power-on lifetime: 6733 hours (280 days + 13 >> hours) >> When the command that caused the error occurred, the device was active >> or idle. >> >> After command completion occurred, registers were: >> ER ST SC SN CL CH DH >> -- -- -- -- -- -- -- >> 40 51 04 33 ed d7 e0 Error: UNC 4 sectors at LBA = 0x00d7ed33 = >> 14150963 >> >> Commands leading to the command that caused the error were: >> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >> -- -- -- -- -- -- -- -- ---------------- -------------------- >> 25 00 20 2f ed d7 e0 00 02:08:44.983 READ DMA EXT >> 25 00 20 4f ed d7 e0 00 02:08:44.983 READ DMA EXT >> 25 00 11 6f ed d7 e0 00 02:08:45.476 READ DMA EXT >> 25 00 2f 80 ed d7 e0 00 02:08:45.137 READ DMA EXT >> 25 00 20 af ed d7 e0 00 02:08:45.035 READ DMA EXT >> >> Error 6 occurred at disk power-on lifetime: 3625 hours (151 days + 1 >> hours) >> When the command that caused the error occurred, the device was active >> or idle. >> >> After command completion occurred, registers were: >> ER ST SC SN CL CH DH >> -- -- -- -- -- -- -- >> 40 51 18 67 ec a4 e0 Error: UNC 24 sectors at LBA = 0x00a4ec67 = >> 10808423 >> >> Commands leading to the command that caused the error were: >> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >> -- -- -- -- -- -- -- -- ---------------- -------------------- >> 25 00 40 4f ec a4 e0 00 14:08:38.247 READ DMA EXT >> 25 00 08 ac af bb e0 00 14:08:38.205 READ DMA EXT >> 25 00 2f cf c0 20 e0 00 14:08:38.204 READ DMA EXT >> 25 00 15 0f ec a4 e0 00 14:08:38.203 READ DMA EXT >> 25 00 2e cf eb a4 e0 00 14:08:38.167 READ DMA EXT >> >> Error 5 occurred at disk power-on lifetime: 5085 hours (211 days + 21 >> hours) >> When the command that caused the error occurred, the device was active >> or idle. >> >> After command completion occurred, registers were: >> ER ST SC SN CL CH DH >> -- -- -- -- -- -- -- >> 40 51 1f 6e ed a0 e0 Error: UNC 31 sectors at LBA = 0x00a0ed6e = >> 10546542 >> >> Commands leading to the command that caused the error were: >> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >> -- -- -- -- -- -- -- -- ---------------- -------------------- >> 25 00 20 4f ed a0 e0 00 1d+03:52:43.819 READ DMA EXT >> 25 00 60 af 0e 1a e0 00 1d+03:52:43.779 READ DMA EXT >> 25 00 40 ef fc 19 e0 00 1d+03:52:43.736 READ DMA EXT >> 25 00 40 af fc 19 e0 00 1d+03:52:43.736 READ DMA EXT >> 25 00 20 ef ed a0 e0 00 1d+03:52:43.709 READ DMA EXT >> >> SMART Self-test log structure revision number 1 >> Num Test_Description Status Remaining >> LifeTime(hours) LBA_of_first_error >> # 1 Extended offline Completed without error 00% >> 19723 - >> # 2 Extended offline Aborted by host 90% >> 19719 - >> # 3 Short offline Completed without error 00% >> 19718 - >> # 4 Extended offline Completed without error 00% >> 18950 - >> # 5 Extended offline Completed without error 00% >> 18947 - >> # 6 Extended offline Completed without error 00% >> 18939 - >> # 7 Extended offline Completed without error 00% >> 18927 - >> # 8 Extended offline Completed without error 00% >> 18738 - >> # 9 Short offline Completed without error 00% >> 18737 - >> #10 Extended offline Interrupted (host reset) 90% >> 10838 - >> #11 Short offline Completed without error 00% >> 10837 - >> #12 Short offline Completed without error 00% >> 10836 - >> #13 Extended offline Interrupted (host reset) 50% >> 9821 - >> #14 Extended offline Interrupted (host reset) 90% >> 262 - >> #15 Extended offline Interrupted (host reset) 70% >> 262 - >> #16 Short offline Completed without error 00% >> 258 - >> >> SMART Selective self-test log data structure revision number 1 >> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS >> 1 0 0 Not_testing >> 2 0 0 Not_testing >> 3 0 0 Not_testing >> 4 0 0 Not_testing >> 5 0 0 Not_testing >> Selective self-test flags (0x0): >> After scanning selected spans, do NOT read-scan remainder of disk. >> If Selective self-test is pending on power-up, resume after 0 minute >> delay. >> >> I've already zeroed the whole disk (dd if=/dev/zero of=/dev/sdf), and >> made many self tests but the error did not go away. >> >> Is there another option to reset that flag? Maybe some kind of Smart >> Reset to clear all smart data and start from beginning? >> >> What do you recommend? I'll gladly help with any test you want me to >> try. I have some background in kernel programming and electrical >> engineering, just in case, although both are probably a little bit rusty >> now. My main job is Operating Systems Support in a big Brazilian ISP >> Provider. >> >> My system is currently a CentOS 5.2, x86_64, but I may "upgrade" it to >> Fedora 10 soon. >> >> Thanks in advance, >> >> Jonny >> >> >> ------------------------------------------------------------------------------ >> >> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >> Nevada. >> The future of the web can't happen without you. Join us at MIX09 to >> help >> pave the way to the Next Web now. Learn more and register at >> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >> >> _______________________________________________ >> Smartmontools-support mailing list >> Sma...@li... >> https://lists.sourceforge.net/lists/listinfo/smartmontools-support >> |
From: Bruce A. <ba...@gr...> - 2008-12-18 11:20:15
|
You can turn this off this reporting with -C 0 and/or -U 0 in the smartd config file. See smartd man page. On Wed, 17 Dec 2008, João Carlos Mendes Luís wrote: > I don't remember seeing anythong about this the in the FAQ. Sorry if > it's there. > > The biggest problem for me is that keep receiving emails with this > "error". Maybe we should then have another option to configure smartd. > Something like "Do not report pending and unreadable sectors if less > than N". As far as I could see, we can only test it for non zero. > Disabling this report completely is not very safe > > > ..... > The following warning/error was logged by the smartd daemon: > > Device: /dev/sdf, 1 Offline uncorrectable sectors > > For details see host's SYSLOG (default: /var/log/messages). > .... > The following warning/error was logged by the smartd daemon: > > Device: /dev/sdf, 1 Currently unreadable (pending) sectors > > For details see host's SYSLOG (default: /var/log/messages). > ..... > > > Bruce Allen wrote: >> Apparently some disk drives do not reset the pending sector counts to >> zero, even after they have been reallocated or have become readable >> again. >> Your disk shows one reallocated sector, so that's at least a sign that >> the disk has repaired the bad sector. >> >> On Wed, 26 Nov 2008, João Carlos Mendes Luís wrote: >> >>> Hi, >>> >>> I've already read the instructions at >>> http://smartmontools.sourceforge.net/BadBlockHowTo.txt, but I still have >>> problems with an Offline Uncorrectable sector in my home sata disk. >>> >>> Here is a full report: >>> >>> # smartctl -d ata -a /dev/sdf >>> smartctl version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 >>> Bruce Allen >>> Home page is http://smartmontools.sourceforge.net/ >>> >>> === START OF INFORMATION SECTION === >>> Model Family: Seagate Barracuda 7200.8 family >>> Device Model: ST3250823AS >>> Serial Number: 3ND02SXR >>> Firmware Version: 3.02 >>> User Capacity: 250,059,350,016 bytes >>> Device is: In smartctl database [for details use: -P show] >>> ATA Version is: 7 >>> ATA Standard is: Exact ATA specification draft version not indicated >>> Local Time is: Wed Nov 26 03:04:48 2008 BRST >>> SMART support is: Available - device has SMART capability. >>> SMART support is: Enabled >>> >>> === START OF READ SMART DATA SECTION === >>> SMART overall-health self-assessment test result: PASSED >>> >>> General SMART Values: >>> Offline data collection status: (0x82) Offline data collection activity >>> was completed without error. >>> Auto Offline Data Collection: >>> Enabled. >>> Self-test execution status: ( 0) The previous self-test routine >>> completed >>> without error or no self-test >>> has ever >>> been run. >>> Total time to complete Offline >>> data collection: ( 430) seconds. >>> Offline data collection >>> capabilities: (0x5b) SMART execute Offline immediate. >>> Auto Offline data collection >>> on/off support. >>> Suspend Offline collection >>> upon new >>> command. >>> Offline surface scan supported. >>> Self-test supported. >>> No Conveyance Self-test >>> supported. >>> Selective Self-test supported. >>> SMART capabilities: (0x0003) Saves SMART data before entering >>> power-saving mode. >>> Supports SMART auto save timer. >>> Error logging capability: (0x01) Error logging supported. >>> General Purpose Logging >>> supported. >>> Short self-test routine >>> recommended polling time: ( 1) minutes. >>> Extended self-test routine >>> recommended polling time: ( 84) minutes. >>> >>> SMART Attributes Data Structure revision number: 10 >>> Vendor Specific SMART Attributes with Thresholds: >>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE >>> UPDATED WHEN_FAILED RAW_VALUE >>> 1 Raw_Read_Error_Rate 0x000f 048 044 006 Pre-fail >>> Always - 130922921 >>> 3 Spin_Up_Time 0x0003 098 098 000 Pre-fail >>> Always - 0 >>> 4 Start_Stop_Count 0x0032 100 100 020 Old_age >>> Always - 732 >>> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail >>> Always - 1 >>> 7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail >>> Always - 758914439 >>> 9 Power_On_Hours 0x0032 078 078 000 Old_age >>> Always - 19733 >>> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail >>> Always - 0 >>> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age >>> Always - 839 >>> 194 Temperature_Celsius 0x0022 046 051 000 Old_age >>> Always - 46 (Lifetime Min/Max 0/21) >>> 195 Hardware_ECC_Recovered 0x001a 048 044 000 Old_age >>> Always - 130922921 >>> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age >>> Always - 1 >>> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age >>> Offline - 1 >>> 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age >>> Always - 3 >>> 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age >>> Offline - 0 >>> 202 TA_Increase_Count 0x0032 100 253 000 Old_age >>> Always - 0 >>> >>> SMART Error Log Version: 1 >>> ATA Error Count: 9 (device log contains only the most recent five >>> errors) >>> CR = Command Register [HEX] >>> FR = Features Register [HEX] >>> SC = Sector Count Register [HEX] >>> SN = Sector Number Register [HEX] >>> CL = Cylinder Low Register [HEX] >>> CH = Cylinder High Register [HEX] >>> DH = Device/Head Register [HEX] >>> DC = Device Command Register [HEX] >>> ER = Error register [HEX] >>> ST = Status register [HEX] >>> Powered_Up_Time is measured from power on, and printed as >>> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, >>> SS=sec, and sss=millisec. It "wraps" after 49.710 days. >>> >>> Error 9 occurred at disk power-on lifetime: 7336 hours (305 days + 16 >>> hours) >>> When the command that caused the error occurred, the device was active >>> or idle. >>> >>> After command completion occurred, registers were: >>> ER ST SC SN CL CH DH >>> -- -- -- -- -- -- -- >>> 40 51 3f 0e 17 ae e0 Error: UNC 63 sectors at LBA = 0x00ae170e = >>> 11409166 >>> >>> Commands leading to the command that caused the error were: >>> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>> -- -- -- -- -- -- -- -- ---------------- -------------------- >>> 25 00 80 cf 16 ae e0 00 00:12:06.015 READ DMA EXT >>> 25 00 80 4f 16 ae e0 00 00:12:06.013 READ DMA EXT >>> 25 00 80 cf 15 ae e0 00 00:12:06.011 READ DMA EXT >>> 25 00 80 4f 15 ae e0 00 00:12:06.007 READ DMA EXT >>> 25 00 40 0f 15 ae e0 00 00:12:06.007 READ DMA EXT >>> >>> Error 8 occurred at disk power-on lifetime: 7336 hours (305 days + 16 >>> hours) >>> When the command that caused the error occurred, the device was active >>> or idle. >>> >>> After command completion occurred, registers were: >>> ER ST SC SN CL CH DH >>> -- -- -- -- -- -- -- >>> 40 51 08 f7 62 77 e0 Error: UNC 8 sectors at LBA = 0x007762f7 = >>> 7824119 >>> >>> Commands leading to the command that caused the error were: >>> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>> -- -- -- -- -- -- -- -- ---------------- -------------------- >>> 25 00 80 ef 62 77 e0 00 00:11:25.387 READ DMA EXT >>> 25 00 80 6f 62 77 e0 00 00:11:25.386 READ DMA EXT >>> 25 00 80 ef 61 77 e0 00 00:11:25.385 READ DMA EXT >>> 25 00 80 6f 61 77 e0 00 00:11:25.376 READ DMA EXT >>> 25 00 80 ef 60 77 e0 00 00:11:25.375 READ DMA EXT >>> >>> Error 7 occurred at disk power-on lifetime: 6733 hours (280 days + 13 >>> hours) >>> When the command that caused the error occurred, the device was active >>> or idle. >>> >>> After command completion occurred, registers were: >>> ER ST SC SN CL CH DH >>> -- -- -- -- -- -- -- >>> 40 51 04 33 ed d7 e0 Error: UNC 4 sectors at LBA = 0x00d7ed33 = >>> 14150963 >>> >>> Commands leading to the command that caused the error were: >>> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>> -- -- -- -- -- -- -- -- ---------------- -------------------- >>> 25 00 20 2f ed d7 e0 00 02:08:44.983 READ DMA EXT >>> 25 00 20 4f ed d7 e0 00 02:08:44.983 READ DMA EXT >>> 25 00 11 6f ed d7 e0 00 02:08:45.476 READ DMA EXT >>> 25 00 2f 80 ed d7 e0 00 02:08:45.137 READ DMA EXT >>> 25 00 20 af ed d7 e0 00 02:08:45.035 READ DMA EXT >>> >>> Error 6 occurred at disk power-on lifetime: 3625 hours (151 days + 1 >>> hours) >>> When the command that caused the error occurred, the device was active >>> or idle. >>> >>> After command completion occurred, registers were: >>> ER ST SC SN CL CH DH >>> -- -- -- -- -- -- -- >>> 40 51 18 67 ec a4 e0 Error: UNC 24 sectors at LBA = 0x00a4ec67 = >>> 10808423 >>> >>> Commands leading to the command that caused the error were: >>> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>> -- -- -- -- -- -- -- -- ---------------- -------------------- >>> 25 00 40 4f ec a4 e0 00 14:08:38.247 READ DMA EXT >>> 25 00 08 ac af bb e0 00 14:08:38.205 READ DMA EXT >>> 25 00 2f cf c0 20 e0 00 14:08:38.204 READ DMA EXT >>> 25 00 15 0f ec a4 e0 00 14:08:38.203 READ DMA EXT >>> 25 00 2e cf eb a4 e0 00 14:08:38.167 READ DMA EXT >>> >>> Error 5 occurred at disk power-on lifetime: 5085 hours (211 days + 21 >>> hours) >>> When the command that caused the error occurred, the device was active >>> or idle. >>> >>> After command completion occurred, registers were: >>> ER ST SC SN CL CH DH >>> -- -- -- -- -- -- -- >>> 40 51 1f 6e ed a0 e0 Error: UNC 31 sectors at LBA = 0x00a0ed6e = >>> 10546542 >>> >>> Commands leading to the command that caused the error were: >>> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>> -- -- -- -- -- -- -- -- ---------------- -------------------- >>> 25 00 20 4f ed a0 e0 00 1d+03:52:43.819 READ DMA EXT >>> 25 00 60 af 0e 1a e0 00 1d+03:52:43.779 READ DMA EXT >>> 25 00 40 ef fc 19 e0 00 1d+03:52:43.736 READ DMA EXT >>> 25 00 40 af fc 19 e0 00 1d+03:52:43.736 READ DMA EXT >>> 25 00 20 ef ed a0 e0 00 1d+03:52:43.709 READ DMA EXT >>> >>> SMART Self-test log structure revision number 1 >>> Num Test_Description Status Remaining >>> LifeTime(hours) LBA_of_first_error >>> # 1 Extended offline Completed without error 00% >>> 19723 - >>> # 2 Extended offline Aborted by host 90% >>> 19719 - >>> # 3 Short offline Completed without error 00% >>> 19718 - >>> # 4 Extended offline Completed without error 00% >>> 18950 - >>> # 5 Extended offline Completed without error 00% >>> 18947 - >>> # 6 Extended offline Completed without error 00% >>> 18939 - >>> # 7 Extended offline Completed without error 00% >>> 18927 - >>> # 8 Extended offline Completed without error 00% >>> 18738 - >>> # 9 Short offline Completed without error 00% >>> 18737 - >>> #10 Extended offline Interrupted (host reset) 90% >>> 10838 - >>> #11 Short offline Completed without error 00% >>> 10837 - >>> #12 Short offline Completed without error 00% >>> 10836 - >>> #13 Extended offline Interrupted (host reset) 50% >>> 9821 - >>> #14 Extended offline Interrupted (host reset) 90% >>> 262 - >>> #15 Extended offline Interrupted (host reset) 70% >>> 262 - >>> #16 Short offline Completed without error 00% >>> 258 - >>> >>> SMART Selective self-test log data structure revision number 1 >>> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS >>> 1 0 0 Not_testing >>> 2 0 0 Not_testing >>> 3 0 0 Not_testing >>> 4 0 0 Not_testing >>> 5 0 0 Not_testing >>> Selective self-test flags (0x0): >>> After scanning selected spans, do NOT read-scan remainder of disk. >>> If Selective self-test is pending on power-up, resume after 0 minute >>> delay. >>> >>> I've already zeroed the whole disk (dd if=/dev/zero of=/dev/sdf), and >>> made many self tests but the error did not go away. >>> >>> Is there another option to reset that flag? Maybe some kind of Smart >>> Reset to clear all smart data and start from beginning? >>> >>> What do you recommend? I'll gladly help with any test you want me to >>> try. I have some background in kernel programming and electrical >>> engineering, just in case, although both are probably a little bit rusty >>> now. My main job is Operating Systems Support in a big Brazilian ISP >>> Provider. >>> >>> My system is currently a CentOS 5.2, x86_64, but I may "upgrade" it to >>> Fedora 10 soon. >>> >>> Thanks in advance, >>> >>> Jonny >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >>> Nevada. >>> The future of the web can't happen without you. Join us at MIX09 to >>> help >>> pave the way to the Next Web now. Learn more and register at >>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>> >>> _______________________________________________ >>> Smartmontools-support mailing list >>> Sma...@li... >>> https://lists.sourceforge.net/lists/listinfo/smartmontools-support >>> > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. > The future of the web can't happen without you. Join us at MIX09 to help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |
From: João C. M. L. <jo...@jo...> - 2008-12-18 13:03:47
|
I've seen those options, thanks. But if I turn it off I will not see if it goes above 1, which means I have another error. My best solution, if a patch to smartd does not come in a reasonable time, is to disable it in smartd, and do my own script in cron with smartctl. That's very easy, but yet only a workaround. Bruce Allen wrote: > You can turn this off this reporting with -C 0 and/or -U 0 in the > smartd config file. See smartd man page. > > > On Wed, 17 Dec 2008, João Carlos Mendes Luís wrote: > >> I don't remember seeing anythong about this the in the FAQ. Sorry if >> it's there. >> >> The biggest problem for me is that keep receiving emails with this >> "error". Maybe we should then have another option to configure smartd. >> Something like "Do not report pending and unreadable sectors if less >> than N". As far as I could see, we can only test it for non zero. >> Disabling this report completely is not very safe >> >> >> ..... >> The following warning/error was logged by the smartd daemon: >> >> Device: /dev/sdf, 1 Offline uncorrectable sectors >> >> For details see host's SYSLOG (default: /var/log/messages). >> .... >> The following warning/error was logged by the smartd daemon: >> >> Device: /dev/sdf, 1 Currently unreadable (pending) sectors >> >> For details see host's SYSLOG (default: /var/log/messages). >> ..... >> >> >> Bruce Allen wrote: >>> Apparently some disk drives do not reset the pending sector counts to >>> zero, even after they have been reallocated or have become readable >>> again. >>> Your disk shows one reallocated sector, so that's at least a sign that >>> the disk has repaired the bad sector. >>> >>> On Wed, 26 Nov 2008, João Carlos Mendes Luís wrote: >>> >>>> Hi, >>>> >>>> I've already read the instructions at >>>> http://smartmontools.sourceforge.net/BadBlockHowTo.txt, but I still >>>> have >>>> problems with an Offline Uncorrectable sector in my home sata disk. >>>> >>>> Here is a full report: >>>> >>>> # smartctl -d ata -a /dev/sdf >>>> smartctl version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 >>>> Bruce Allen >>>> Home page is http://smartmontools.sourceforge.net/ >>>> >>>> === START OF INFORMATION SECTION === >>>> Model Family: Seagate Barracuda 7200.8 family >>>> Device Model: ST3250823AS >>>> Serial Number: 3ND02SXR >>>> Firmware Version: 3.02 >>>> User Capacity: 250,059,350,016 bytes >>>> Device is: In smartctl database [for details use: -P show] >>>> ATA Version is: 7 >>>> ATA Standard is: Exact ATA specification draft version not indicated >>>> Local Time is: Wed Nov 26 03:04:48 2008 BRST >>>> SMART support is: Available - device has SMART capability. >>>> SMART support is: Enabled >>>> >>>> === START OF READ SMART DATA SECTION === >>>> SMART overall-health self-assessment test result: PASSED >>>> >>>> General SMART Values: >>>> Offline data collection status: (0x82) Offline data collection >>>> activity >>>> was completed without error. >>>> Auto Offline Data Collection: >>>> Enabled. >>>> Self-test execution status: ( 0) The previous self-test routine >>>> completed >>>> without error or no self-test >>>> has ever >>>> been run. >>>> Total time to complete Offline >>>> data collection: ( 430) seconds. >>>> Offline data collection >>>> capabilities: (0x5b) SMART execute Offline >>>> immediate. >>>> Auto Offline data collection >>>> on/off support. >>>> Suspend Offline collection >>>> upon new >>>> command. >>>> Offline surface scan supported. >>>> Self-test supported. >>>> No Conveyance Self-test >>>> supported. >>>> Selective Self-test supported. >>>> SMART capabilities: (0x0003) Saves SMART data before >>>> entering >>>> power-saving mode. >>>> Supports SMART auto save timer. >>>> Error logging capability: (0x01) Error logging supported. >>>> General Purpose Logging >>>> supported. >>>> Short self-test routine >>>> recommended polling time: ( 1) minutes. >>>> Extended self-test routine >>>> recommended polling time: ( 84) minutes. >>>> >>>> SMART Attributes Data Structure revision number: 10 >>>> Vendor Specific SMART Attributes with Thresholds: >>>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE >>>> UPDATED WHEN_FAILED RAW_VALUE >>>> 1 Raw_Read_Error_Rate 0x000f 048 044 006 Pre-fail >>>> Always - 130922921 >>>> 3 Spin_Up_Time 0x0003 098 098 000 Pre-fail >>>> Always - 0 >>>> 4 Start_Stop_Count 0x0032 100 100 020 Old_age >>>> Always - 732 >>>> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail >>>> Always - 1 >>>> 7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail >>>> Always - 758914439 >>>> 9 Power_On_Hours 0x0032 078 078 000 Old_age >>>> Always - 19733 >>>> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail >>>> Always - 0 >>>> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age >>>> Always - 839 >>>> 194 Temperature_Celsius 0x0022 046 051 000 Old_age >>>> Always - 46 (Lifetime Min/Max 0/21) >>>> 195 Hardware_ECC_Recovered 0x001a 048 044 000 Old_age >>>> Always - 130922921 >>>> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age >>>> Always - 1 >>>> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age >>>> Offline - 1 >>>> 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age >>>> Always - 3 >>>> 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age >>>> Offline - 0 >>>> 202 TA_Increase_Count 0x0032 100 253 000 Old_age >>>> Always - 0 >>>> >>>> SMART Error Log Version: 1 >>>> ATA Error Count: 9 (device log contains only the most recent five >>>> errors) >>>> CR = Command Register [HEX] >>>> FR = Features Register [HEX] >>>> SC = Sector Count Register [HEX] >>>> SN = Sector Number Register [HEX] >>>> CL = Cylinder Low Register [HEX] >>>> CH = Cylinder High Register [HEX] >>>> DH = Device/Head Register [HEX] >>>> DC = Device Command Register [HEX] >>>> ER = Error register [HEX] >>>> ST = Status register [HEX] >>>> Powered_Up_Time is measured from power on, and printed as >>>> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, >>>> SS=sec, and sss=millisec. It "wraps" after 49.710 days. >>>> >>>> Error 9 occurred at disk power-on lifetime: 7336 hours (305 days + 16 >>>> hours) >>>> When the command that caused the error occurred, the device was >>>> active >>>> or idle. >>>> >>>> After command completion occurred, registers were: >>>> ER ST SC SN CL CH DH >>>> -- -- -- -- -- -- -- >>>> 40 51 3f 0e 17 ae e0 Error: UNC 63 sectors at LBA = 0x00ae170e = >>>> 11409166 >>>> >>>> Commands leading to the command that caused the error were: >>>> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>>> -- -- -- -- -- -- -- -- ---------------- -------------------- >>>> 25 00 80 cf 16 ae e0 00 00:12:06.015 READ DMA EXT >>>> 25 00 80 4f 16 ae e0 00 00:12:06.013 READ DMA EXT >>>> 25 00 80 cf 15 ae e0 00 00:12:06.011 READ DMA EXT >>>> 25 00 80 4f 15 ae e0 00 00:12:06.007 READ DMA EXT >>>> 25 00 40 0f 15 ae e0 00 00:12:06.007 READ DMA EXT >>>> >>>> Error 8 occurred at disk power-on lifetime: 7336 hours (305 days + 16 >>>> hours) >>>> When the command that caused the error occurred, the device was >>>> active >>>> or idle. >>>> >>>> After command completion occurred, registers were: >>>> ER ST SC SN CL CH DH >>>> -- -- -- -- -- -- -- >>>> 40 51 08 f7 62 77 e0 Error: UNC 8 sectors at LBA = 0x007762f7 = >>>> 7824119 >>>> >>>> Commands leading to the command that caused the error were: >>>> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>>> -- -- -- -- -- -- -- -- ---------------- -------------------- >>>> 25 00 80 ef 62 77 e0 00 00:11:25.387 READ DMA EXT >>>> 25 00 80 6f 62 77 e0 00 00:11:25.386 READ DMA EXT >>>> 25 00 80 ef 61 77 e0 00 00:11:25.385 READ DMA EXT >>>> 25 00 80 6f 61 77 e0 00 00:11:25.376 READ DMA EXT >>>> 25 00 80 ef 60 77 e0 00 00:11:25.375 READ DMA EXT >>>> >>>> Error 7 occurred at disk power-on lifetime: 6733 hours (280 days + 13 >>>> hours) >>>> When the command that caused the error occurred, the device was >>>> active >>>> or idle. >>>> >>>> After command completion occurred, registers were: >>>> ER ST SC SN CL CH DH >>>> -- -- -- -- -- -- -- >>>> 40 51 04 33 ed d7 e0 Error: UNC 4 sectors at LBA = 0x00d7ed33 = >>>> 14150963 >>>> >>>> Commands leading to the command that caused the error were: >>>> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>>> -- -- -- -- -- -- -- -- ---------------- -------------------- >>>> 25 00 20 2f ed d7 e0 00 02:08:44.983 READ DMA EXT >>>> 25 00 20 4f ed d7 e0 00 02:08:44.983 READ DMA EXT >>>> 25 00 11 6f ed d7 e0 00 02:08:45.476 READ DMA EXT >>>> 25 00 2f 80 ed d7 e0 00 02:08:45.137 READ DMA EXT >>>> 25 00 20 af ed d7 e0 00 02:08:45.035 READ DMA EXT >>>> >>>> Error 6 occurred at disk power-on lifetime: 3625 hours (151 days + 1 >>>> hours) >>>> When the command that caused the error occurred, the device was >>>> active >>>> or idle. >>>> >>>> After command completion occurred, registers were: >>>> ER ST SC SN CL CH DH >>>> -- -- -- -- -- -- -- >>>> 40 51 18 67 ec a4 e0 Error: UNC 24 sectors at LBA = 0x00a4ec67 = >>>> 10808423 >>>> >>>> Commands leading to the command that caused the error were: >>>> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>>> -- -- -- -- -- -- -- -- ---------------- -------------------- >>>> 25 00 40 4f ec a4 e0 00 14:08:38.247 READ DMA EXT >>>> 25 00 08 ac af bb e0 00 14:08:38.205 READ DMA EXT >>>> 25 00 2f cf c0 20 e0 00 14:08:38.204 READ DMA EXT >>>> 25 00 15 0f ec a4 e0 00 14:08:38.203 READ DMA EXT >>>> 25 00 2e cf eb a4 e0 00 14:08:38.167 READ DMA EXT >>>> >>>> Error 5 occurred at disk power-on lifetime: 5085 hours (211 days + 21 >>>> hours) >>>> When the command that caused the error occurred, the device was >>>> active >>>> or idle. >>>> >>>> After command completion occurred, registers were: >>>> ER ST SC SN CL CH DH >>>> -- -- -- -- -- -- -- >>>> 40 51 1f 6e ed a0 e0 Error: UNC 31 sectors at LBA = 0x00a0ed6e = >>>> 10546542 >>>> >>>> Commands leading to the command that caused the error were: >>>> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>>> -- -- -- -- -- -- -- -- ---------------- -------------------- >>>> 25 00 20 4f ed a0 e0 00 1d+03:52:43.819 READ DMA EXT >>>> 25 00 60 af 0e 1a e0 00 1d+03:52:43.779 READ DMA EXT >>>> 25 00 40 ef fc 19 e0 00 1d+03:52:43.736 READ DMA EXT >>>> 25 00 40 af fc 19 e0 00 1d+03:52:43.736 READ DMA EXT >>>> 25 00 20 ef ed a0 e0 00 1d+03:52:43.709 READ DMA EXT >>>> >>>> SMART Self-test log structure revision number 1 >>>> Num Test_Description Status Remaining >>>> LifeTime(hours) LBA_of_first_error >>>> # 1 Extended offline Completed without error 00% >>>> 19723 - >>>> # 2 Extended offline Aborted by host 90% >>>> 19719 - >>>> # 3 Short offline Completed without error 00% >>>> 19718 - >>>> # 4 Extended offline Completed without error 00% >>>> 18950 - >>>> # 5 Extended offline Completed without error 00% >>>> 18947 - >>>> # 6 Extended offline Completed without error 00% >>>> 18939 - >>>> # 7 Extended offline Completed without error 00% >>>> 18927 - >>>> # 8 Extended offline Completed without error 00% >>>> 18738 - >>>> # 9 Short offline Completed without error 00% >>>> 18737 - >>>> #10 Extended offline Interrupted (host reset) 90% >>>> 10838 - >>>> #11 Short offline Completed without error 00% >>>> 10837 - >>>> #12 Short offline Completed without error 00% >>>> 10836 - >>>> #13 Extended offline Interrupted (host reset) 50% >>>> 9821 - >>>> #14 Extended offline Interrupted (host reset) 90% >>>> 262 - >>>> #15 Extended offline Interrupted (host reset) 70% >>>> 262 - >>>> #16 Short offline Completed without error 00% >>>> 258 - >>>> >>>> SMART Selective self-test log data structure revision number 1 >>>> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS >>>> 1 0 0 Not_testing >>>> 2 0 0 Not_testing >>>> 3 0 0 Not_testing >>>> 4 0 0 Not_testing >>>> 5 0 0 Not_testing >>>> Selective self-test flags (0x0): >>>> After scanning selected spans, do NOT read-scan remainder of disk. >>>> If Selective self-test is pending on power-up, resume after 0 minute >>>> delay. >>>> >>>> I've already zeroed the whole disk (dd if=/dev/zero of=/dev/sdf), and >>>> made many self tests but the error did not go away. >>>> >>>> Is there another option to reset that flag? Maybe some kind of Smart >>>> Reset to clear all smart data and start from beginning? >>>> >>>> What do you recommend? I'll gladly help with any test you want me to >>>> try. I have some background in kernel programming and electrical >>>> engineering, just in case, although both are probably a little bit >>>> rusty >>>> now. My main job is Operating Systems Support in a big Brazilian ISP >>>> Provider. >>>> >>>> My system is currently a CentOS 5.2, x86_64, but I may "upgrade" it to >>>> Fedora 10 soon. >>>> >>>> Thanks in advance, >>>> >>>> Jonny >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> >>>> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >>>> Nevada. >>>> The future of the web can't happen without you. Join us at MIX09 to >>>> help >>>> pave the way to the Next Web now. Learn more and register at >>>> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >>>> >>>> >>>> _______________________________________________ >>>> Smartmontools-support mailing list >>>> Sma...@li... >>>> https://lists.sourceforge.net/lists/listinfo/smartmontools-support >>>> >> >> ------------------------------------------------------------------------------ >> >> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, >> Nevada. >> The future of the web can't happen without you. Join us at MIX09 to >> help >> pave the way to the Next Web now. Learn more and register at >> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ >> >> _______________________________________________ >> Smartmontools-support mailing list >> Sma...@li... >> https://lists.sourceforge.net/lists/listinfo/smartmontools-support >> |