From: Skye S. <sk...@fl...> - 2011-07-10 01:16:59
|
Synopsis: I am at wits end and could use some pointers as to what I am doing wrong or if I just need to buy a new drive. I realize this may not the perfect forum for this question, and would be happy with just a pointer to the right place. I have been getting SMART errors on a backup drive on my Fedora 12 file server. I have tried the instructions in "Bad block HOWTO for smartmontools" without avail. I have visited many websites and have not found anything more illuminating to my problem. The drive is a backup drive of my data drive using rdiff-backup once a night. It is a Western Digital SATA 1T full size drive. It is my /dev/sdc drive and has only one partition /dev/sdc1 The file system used to be ext4, but since the instructions for fixing blocks only called out ext2/3 I formatted the drive to ext3 and used the following procedures to no avail. Details: I get the following in my email each day: --------------------- Smartd Begin ------------------------ Currently unreadable (pending) sectors detected: /dev/sdc [SAT] - 48 Time(s) 44 unreadable sectors detected Offline uncorrectable sectors detected: /dev/sdc [SAT] - 48 Time(s) 30 offline uncorrectable sectors detected ---------------------- Smartd End ------------------------- This ends up in /var/log/messages each day: Jul 9 19:53:58 tux smartd[1658]: Device: /dev/sdc [SAT], 44 Currently unreadable (pending) sectors Jul 9 19:53:58 tux smartd[1658]: Device: /dev/sdc [SAT], 30 Offline uncorrectable sectors (changed -165) The steps I took to try to fix these problems: 1) Get SMART info [root@tux ~]# smartctl -d ata -a /dev/sdc smartctl 5.39.1 2010-01-28 r3054 [i386-redhat-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: WDC WD10EARS-00Y5B1 Serial Number: WD-WMAV51375649 Firmware Version: 80.00A80 User Capacity: 1,000,204,886,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Tue Jun 28 20:13:46 2011 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled smartctl 5.39.1 2010-01-28 r3054 [i386-redhat-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: WDC WD10EARS-00Y5B1 Serial Number: WD-WMAV51375649 Firmware Version: 80.00A80 User Capacity: 1,000,204,886,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Tue Jun 28 20:14:25 2011 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (21300) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 245) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3031) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 130 126 021 Pre-fail Always - 6475 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 619 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 9119 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 143 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 71 193 Load_Cycle_Count 0x0032 197 197 000 Old_age Always - 9117 194 Temperature_Celsius 0x0022 111 108 000 Old_age Always - 36 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 199 199 000 Old_age Always - 253 198 Offline_Uncorrectable 0x0030 199 199 000 Old_age Offline - 195 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 199 199 000 Old_age Offline - 291 SMART Error Log Version: 1 ATA Error Count: 805 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 805 occurred at disk power-on lifetime: 9119 hours (379 days + 23 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 67 6f 24 e1 Error: UNC 8 sectors at LBA = 0x01246f67 = 19165031 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 67 6f 24 e1 08 00:39:36.070 READ DMA ec 00 00 00 00 00 a0 08 00:39:36.061 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 00:39:36.058 SET FEATURES [Set transfer mode] Error 804 occurred at disk power-on lifetime: 9119 hours (379 days + 23 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 67 6f 24 e1 Error: UNC 8 sectors at LBA = 0x01246f67 = 19165031 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 67 6f 24 e1 08 00:39:33.506 READ DMA b0 d5 01 09 4f c2 00 08 00:39:33.494 SMART READ LOG b0 d5 01 06 4f c2 00 08 00:39:33.490 SMART READ LOG b0 d5 01 01 4f c2 00 08 00:39:33.485 SMART READ LOG b0 d1 01 01 4f c2 00 08 00:39:33.477 SMART READ ATTRIBUTE THRESHOLDS [OBS-4] Error 803 occurred at disk power-on lifetime: 9119 hours (379 days + 23 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 67 6f 24 e1 Error: UNC 8 sectors at LBA = 0x01246f67 = 19165031 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 67 6f 24 e1 08 00:39:30.754 READ DMA ec 00 00 00 00 00 a0 08 00:39:30.746 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 00:39:30.746 SET FEATURES [Set transfer mode] Error 802 occurred at disk power-on lifetime: 9119 hours (379 days + 23 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 67 6f 24 e1 Error: UNC 8 sectors at LBA = 0x01246f67 = 19165031 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 67 6f 24 e1 08 00:39:28.178 READ DMA ec 00 00 00 00 00 a0 08 00:39:28.169 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 00:39:28.166 SET FEATURES [Set transfer mode] Error 801 occurred at disk power-on lifetime: 9119 hours (379 days + 23 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 67 6f 24 e1 Error: UNC 8 sectors at LBA = 0x01246f67 = 19165031 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 67 6f 24 e1 08 00:39:25.615 READ DMA ec 00 00 00 00 00 a0 08 00:39:25.607 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 00:39:25.607 SET FEATURES [Set transfer mode] SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 6491 2777760 # 2 Short offline Completed: read failure 40% 6312 2773712 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. [root@tux]# smartclt -l selftest /dev/sd smartctl 5.39.1 2010-01-28 r3054 [i386-redhat-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 6491 2777760 # 2 Short offline Completed: read failure 40% 6312 2773712 [root@tux]# smartctl -l selftest /dev/sdc smartctl 5.39.1 2010-01-28 r3054 [i386-redhat-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 6491 2777760 # 2 Short offline Completed: read failure 40% 6312 2773712 2) Get the bloack size [root@tux]# dumpe2fs /dev/sdc | grep "Block size" dumpe2fs 1.41.9 (22-Aug-2009) Block size: 4096 3) LBA of bad chunk is 2773712 4) LBA of start of partition is (63) [root@tux]# # LBA of start of dev/sdc is: [root@tux]# fdisk -lu /dev/sdc Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors Units = sectors of 1 * 512 = 512 bytes Disk identifier: 0x0e30349b Device Boot Start End Blocks Id System /dev/sdc1 63 1953520064 976760001 83 Linux 5) Compute offset (2773712-63)*512/4096 = 346706.125 6) Use DD to nuke single block at 3460706 [root@tux]# dd if=/dev/zero of=/dev/sdc bs=4096 count=1 seek=346706 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 5.0141e-05 s, 81.7 MB/s 7) Nuke the block at the other error location (347212) [root@tux]# dd if=/dev/zero of=/dev/sdc bs=4096 count=1 seek=347212 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 4.8141e-05 s, 85.1 MB/s 8) At this point I rebooted the system and I still get the errors on boot up and once a day. -- -Skye Sweeney |
From: Alex S. <ml...@os...> - 2011-07-10 09:35:40
|
My recommendation is to put this drive in trashcan/RMA ASAP. It does make a sense to repair the drive if you have 1-2 pending sectors, but in your case i think drive will die soon. And you don`t need to dumpe2fs to find a bad block, you already have it in the short/long test report. On 07/10/2011 02:48 AM, Skye Sweeney wrote: > > Jul 9 19:53:58 tux smartd[1658]: Device: /dev/sdc [SAT], 44 Currently > unreadable (pending) sectors > Jul 9 19:53:58 tux smartd[1658]: Device: /dev/sdc [SAT], 30 Offline > uncorrectable sectors (changed -165) |
From: Tim S. <ti...@bu...> - 2011-07-10 13:35:01
|
On 10/07/11 10:35, Alex Samorukov wrote: > My recommendation is to put this drive in trashcan/RMA ASAP. It does > make a sense to repair the drive if you have 1-2 pending sectors, but in > your case i think drive will die soon. Probably but not definitely. I've had a drive get a bad run of hundreds of sectors in one location on the drive (maybe a bit of contamination scratched a track or something), but then has gone on for years later with no further problems. I've also had chassis vibration cause bad writes (which are then UNC sectors), but there was no physical problem at all, and rewriting the drive caused them to be reused without being reallocated. That having been said, the look of the SMART output shows bad sectors in at least two different locations. Tim. |
From: Tim S. <ti...@bu...> - 2011-07-10 13:30:11
|
On 10/07/11 01:48, Skye Sweeney wrote: > Synopsis: > > I am at wits end and could use some pointers as to what I am doing > wrong or if I just need to buy a new drive. I realize this may not the > perfect forum for this question, and would be happy with just a > pointer to the right place. > > I have been getting SMART errors on a backup drive on my Fedora 12 > file server. I have tried the instructions in "Bad block HOWTO for > smartmontools" without avail. I have visited many websites and have > not found anything more illuminating to my problem. > > The drive is a backup drive of my data drive using rdiff-backup once a > night. It is a Western Digital SATA 1T full size drive. It is my > /dev/sdc drive and has only one partition /dev/sdc1 > > The file system used to be ext4, but since the instructions for fixing > blocks only called out ext2/3 I formatted the drive to ext3 and used > the following procedures to no avail. > > Details: > > I get the following in my email each day: > > --------------------- Smartd Begin ------------------------ > > > Currently unreadable (pending) sectors detected: > /dev/sdc [SAT] - 48 Time(s) > 44 unreadable sectors detected > > Offline uncorrectable sectors detected: > /dev/sdc [SAT] - 48 Time(s) > 30 offline uncorrectable sectors detected > > ---------------------- Smartd End ------------------------- > > This ends up in /var/log/messages each day: > > Jul 9 19:53:58 tux smartd[1658]: Device: /dev/sdc [SAT], 44 Currently > unreadable (pending) sectors > Jul 9 19:53:58 tux smartd[1658]: Device: /dev/sdc [SAT], 30 Offline > uncorrectable sectors (changed -165) > > The steps I took to try to fix these problems: > > > 1) Get SMART info > > [root@tux ~]# smartctl -d ata -a /dev/sdc > smartctl 5.39.1 2010-01-28 r3054 [i386-redhat-linux-gnu] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF INFORMATION SECTION === > Device Model: WDC WD10EARS-00Y5B1 > Serial Number: WD-WMAV51375649 > Firmware Version: 80.00A80 > User Capacity: 1,000,204,886,016 bytes > Device is: Not in smartctl database [for details use: -P showall] > ATA Version is: 8 > ATA Standard is: Exact ATA specification draft version not indicated > Local Time is: Tue Jun 28 20:13:46 2011 EDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > smartctl 5.39.1 2010-01-28 r3054 [i386-redhat-linux-gnu] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF INFORMATION SECTION === > Device Model: WDC WD10EARS-00Y5B1 > Serial Number: WD-WMAV51375649 > Firmware Version: 80.00A80 > User Capacity: 1,000,204,886,016 bytes > Device is: Not in smartctl database [for details use: -P showall] > ATA Version is: 8 > ATA Standard is: Exact ATA specification draft version not indicated > Local Time is: Tue Jun 28 20:14:25 2011 EDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x84) Offline data collection > activity > was suspended by an interrupting command from host. > Auto Offline Data Collection: Enabled. > Self-test execution status: ( 0) The previous self-test > routine completed > without error or no self-test has ever > been run. > Total time to complete Offline > data collection: (21300) seconds. > Offline data collection > capabilities: (0x7b) SMART execute Offline immediate. > Auto Offline data collection on/off support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before > entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 245) minutes. > Conveyance self-test routine > recommended polling time: ( 5) minutes. > SCT capabilities: (0x3031) SCT Status supported. > SCT Feature Control supported. > SCT Data Table supported. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail > Always - 0 > 3 Spin_Up_Time 0x0027 130 126 021 Pre-fail > Always - 6475 > 4 Start_Stop_Count 0x0032 100 100 000 Old_age > Always - 619 > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail > Always - 0 > 7 Seek_Error_Rate 0x002e 100 253 000 Old_age > Always - 0 > 9 Power_On_Hours 0x0032 088 088 000 Old_age > Always - 9119 > 10 Spin_Retry_Count 0x0032 100 100 000 Old_age > Always - 0 > 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age > Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age > Always - 143 > 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age > Always - 71 > 193 Load_Cycle_Count 0x0032 197 197 000 Old_age > Always - 9117 > 194 Temperature_Celsius 0x0022 111 108 000 Old_age > Always - 36 > 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age > Always - 0 > 197 Current_Pending_Sector 0x0032 199 199 000 Old_age > Always - 253 > 198 Offline_Uncorrectable 0x0030 199 199 000 Old_age > Offline - 195 > 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age > Always - 0 > 200 Multi_Zone_Error_Rate 0x0008 199 199 000 Old_age > Offline - 291 > > SMART Error Log Version: 1 > ATA Error Count: 805 (device log contains only the most recent five > errors) > CR = Command Register [HEX] > FR = Features Register [HEX] > SC = Sector Count Register [HEX] > SN = Sector Number Register [HEX] > CL = Cylinder Low Register [HEX] > CH = Cylinder High Register [HEX] > DH = Device/Head Register [HEX] > DC = Device Command Register [HEX] > ER = Error register [HEX] > ST = Status register [HEX] > Powered_Up_Time is measured from power on, and printed as > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, > SS=sec, and sss=millisec. It "wraps" after 49.710 days. > > Error 805 occurred at disk power-on lifetime: 9119 hours (379 days + > 23 hours) > When the command that caused the error occurred, the device was > active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 08 67 6f 24 e1 Error: UNC 8 sectors at LBA = 0x01246f67 = > 19165031 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 67 6f 24 e1 08 00:39:36.070 READ DMA > ec 00 00 00 00 00 a0 08 00:39:36.061 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 08 00:39:36.058 SET FEATURES [Set > transfer mode] > > Error 804 occurred at disk power-on lifetime: 9119 hours (379 days + > 23 hours) > When the command that caused the error occurred, the device was > active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 08 67 6f 24 e1 Error: UNC 8 sectors at LBA = 0x01246f67 = > 19165031 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 67 6f 24 e1 08 00:39:33.506 READ DMA > b0 d5 01 09 4f c2 00 08 00:39:33.494 SMART READ LOG > b0 d5 01 06 4f c2 00 08 00:39:33.490 SMART READ LOG > b0 d5 01 01 4f c2 00 08 00:39:33.485 SMART READ LOG > b0 d1 01 01 4f c2 00 08 00:39:33.477 SMART READ ATTRIBUTE > THRESHOLDS [OBS-4] > > Error 803 occurred at disk power-on lifetime: 9119 hours (379 days + > 23 hours) > When the command that caused the error occurred, the device was > active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 08 67 6f 24 e1 Error: UNC 8 sectors at LBA = 0x01246f67 = > 19165031 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 67 6f 24 e1 08 00:39:30.754 READ DMA > ec 00 00 00 00 00 a0 08 00:39:30.746 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 08 00:39:30.746 SET FEATURES [Set > transfer mode] > > Error 802 occurred at disk power-on lifetime: 9119 hours (379 days + > 23 hours) > When the command that caused the error occurred, the device was > active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 08 67 6f 24 e1 Error: UNC 8 sectors at LBA = 0x01246f67 = > 19165031 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 67 6f 24 e1 08 00:39:28.178 READ DMA > ec 00 00 00 00 00 a0 08 00:39:28.169 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 08 00:39:28.166 SET FEATURES [Set > transfer mode] > > Error 801 occurred at disk power-on lifetime: 9119 hours (379 days + > 23 hours) > When the command that caused the error occurred, the device was > active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 08 67 6f 24 e1 Error: UNC 8 sectors at LBA = 0x01246f67 = > 19165031 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 67 6f 24 e1 08 00:39:25.615 READ DMA > ec 00 00 00 00 00 a0 08 00:39:25.607 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 08 00:39:25.607 SET FEATURES [Set > transfer mode] > > SMART Self-test log structure revision number 1 > Num Test_Description Status Remaining > LifeTime(hours) LBA_of_first_error > # 1 Extended offline Completed: read failure 90% > 6491 2777760 > # 2 Short offline Completed: read failure 40% > 6312 2773712 > > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute > delay. > > [root@tux]# smartclt -l selftest /dev/sd > smartctl 5.39.1 2010-01-28 r3054 [i386-redhat-linux-gnu] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF READ SMART DATA SECTION === > SMART Self-test log structure revision number 1 > Num Test_Description Status Remaining > LifeTime(hours) LBA_of_first_error > # 1 Extended offline Completed: read failure 90% > 6491 2777760 > # 2 Short offline Completed: read failure 40% > 6312 2773712 > > [root@tux]# smartctl -l selftest /dev/sdc > smartctl 5.39.1 2010-01-28 r3054 [i386-redhat-linux-gnu] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF READ SMART DATA SECTION === > SMART Self-test log structure revision number 1 > Num Test_Description Status Remaining > LifeTime(hours) LBA_of_first_error > # 1 Extended offline Completed: read failure 90% > 6491 2777760 > # 2 Short offline Completed: read failure 40% > 6312 2773712 > > > > > 2) Get the bloack size > [root@tux]# dumpe2fs /dev/sdc | grep "Block size" > dumpe2fs 1.41.9 (22-Aug-2009) > Block size: 4096 As has already been said, you don't really need to touch the e2fs tools in this case, because you have the lba direct from the drive.... > > > 3) LBA of bad chunk is 2773712 > > 4) LBA of start of partition is (63) > > [root@tux]# # LBA of start of dev/sdc is: > [root@tux]# fdisk -lu /dev/sdc > > Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors > Units = sectors of 1 * 512 = 512 bytes > Disk identifier: 0x0e30349b > > Device Boot Start End Blocks Id System > /dev/sdc1 63 1953520064 976760001 83 Linux > > > 5) Compute offset > > (2773712-63)*512/4096 = 346706.125 This would be an offset into the first partition tho (because you subtracted the start lba of the first partition)? Whereas you are writing to the whole-disk device with your dd command, not to the partition device so you're going to miss it by writing to the point 63 sectors too early. dd if=/dev/zero of=/dev/sdc bs=512 count=1 seek=2773712 would get that first block, but if you're going to, or have already reformatted the drive anyway, perhaps you'd be better off just writing zeros to the whole disk? If you haven't actually nuked the entire disk contents already, then perhaps you'd better do a read check to verify that the sector in question is in-fact bad already before writing over it: dd if=/dev/sdc of=/dev/null bs=512 count=1 skip=2773712 If that fails, then the sector is still bad / unreadable. Tim. |
From: Geoff K. <ge...@ge...> - 2011-07-10 21:15:03
|
On 09/07/2011, at 5:48 PM, Skye Sweeney wrote: > Synopsis: > > I am at wits end and could use some pointers as to what I am doing wrong or if I just need to buy a new drive. I realize this may not the perfect forum for this question, and would be happy with just a pointer to the right place. > > I have been getting SMART errors on a backup drive on my Fedora 12 file server. I have tried the instructions in "Bad block HOWTO for smartmontools" without avail. I have visited many websites and have not found anything more illuminating to my problem. One thing that may not be obvious is that your drive doesn't have one or two bad blocks, it had 198 Offline_Uncorrectable 0x0030 199 199 000 Old_age Offline - 195 nearly two hundred and this has been reduced to Jul 9 19:53:58 tux smartd[1658]: Device: /dev/sdc [SAT], 44 Currently unreadable (pending) sectors Jul 9 19:53:58 tux smartd[1658]: Device: /dev/sdc [SAT], 30 Offline uncorrectable sectors (changed -165) 74 now. (This may not be as bad as it sounds, it might be that there's just a run of bad blocks due to a scratch or a defect on the disk.) The commands you're using could have fixed at most 16. I would suggest, if there's nothing valuable on the disk, to just write zeros to the whole disk, with dd if=/dev/zero of=/dev/sdc bs=1m or similar. Obviously THIS WILL ERASE EVERYTHING ON THE DISK and so you might want to double-check there's really nothing on it you want. |
From: Skye S. \(FLL-Freak\) <sk...@fl...> - 2011-07-12 01:34:27
|
Geoff, and Tim, I have spent the time since my posting reviewing your suggestings and implementing them. Since I have a robot that burns all new data to DVD disc every night, I was able to nuke this live backup disc without significant danger. I took the suggestiong to use DD to write all zeros to the whole (/dev/sdc) disk. I then re partitioned and formated the drive. I then copied the prime drive to this backup disc. I finally rebooted and now have fewer errors. I am left with an email from the machine: Device: /dev/sdc [SAT], 30 Offline uncorrectable sectors Is having 30 Offline uncorrectable sectors a "Bad Thing"? Should I be buying a replacement disc? Or does it mean that "30 sectors are bad and will not be used anymore, so relax!"? Thanks for the help. I like the fact that you were able to point out errors that I had made. Nice to learn something. -Skye > Synopsis: > > I am at wits end and could use some pointers as to what I am doing wrong > or if I just need to buy a new drive. I realize this may not the perfect > forum for this question, and would be happy with just a pointer to the > right place. > > I have been getting SMART errors on a backup drive on my Fedora 12 file > server. I have tried the instructions in "Bad block HOWTO for > smartmontools" without avail. I have visited many websites and have not > found anything more illuminating to my problem. [Trim] |
From: David R. <dr...@gm...> - 2011-07-12 03:19:01
|
On Mon, Jul 11, 2011 at 6:34 PM, Skye Sweeney (FLL-Freak) <sk...@fl...> wrote: > Is having 30 Offline uncorrectable sectors a "Bad Thing"? Should I be buying > a replacement disc? Or does it mean that "30 sectors are bad and will not be > used anymore, so relax!"? Typically if you still see 30 Offline uncorrectable sectors after writing zeros to the disc, that means that it was not able to write to all sectors on the disc. Try a long smart test - I suspect it will fail. You might try writing zeros to the disc again, but I suspect this disk is on the way out and shouldn't be trusted to store data reliably unless you test it significantly. -Dave |
From: Skye S. \(FLL-Freak\) <sk...@fl...> - 2011-07-14 00:01:51
|
As suggested I reran a long test and had another uncorrectable sector pop up. I will assume that the failure rate will simply pick up speed with time and that it is time for a new drive. Thank you all for the education and the help. -Skye ----- Original Message ----- From: "Tim Small" <ti...@bu...> To: "Alex Samorukov" <ml...@os...> Cc: "Skye Sweeney" <sk...@fl...>; <sma...@li...> Sent: Sunday, July 10, 2011 9:34 AM Subject: Re: [smartmontools-support] Problem clearing SMART errors on WD 1T drive > On 10/07/11 10:35, Alex Samorukov wrote: >> My recommendation is to put this drive in trashcan/RMA ASAP. It does >> make a sense to repair the drive if you have 1-2 pending sectors, but in >> your case i think drive will die soon. > > Probably but not definitely. I've had a drive get a bad run of hundreds > of sectors in one location on the drive (maybe a bit of contamination > scratched a track or something), but then has gone on for years later > with no further problems. > > I've also had chassis vibration cause bad writes (which are then UNC > sectors), but there was no physical problem at all, and rewriting the > drive caused them to be reused without being reallocated. > > That having been said, the look of the SMART output shows bad sectors in > at least two different locations. > > > Tim. > |