From: Evren Y. <yur...@is...> - 2007-06-06 23:42:50
|
Manfred Schwarb wrote: >> Justin Piszcz wrote: >>> Agree here, after you go through the disk with badblocks it should remap >>> it and you should be OK. >> I dont understand, this should have been done by the disk firmware >> automatically. Once you rewrite to the same sector it might start >> working and disk doesnt reallocate this sector. After a while the error >> will re-appear and this time the user will loose data. >> > > Evren, > please don't spread FUD and BS. If you write over the sector which cant be read earlier, it can start to work(one can read from it again) for a while. The disk will not re-allocate the sector unless a special program (probably from the disk vendor) is used. I guess I should have said 'after a while the error might re-appear' instead of 'will re-appear'...sorry about that... > [readded Ivan to reveiver list] > > you are right, > 1) having broken sectors on a new drive is not good sign. > 2) a disk with broken sectors has an increased likelyhood to break > completely in some near or distant future. > 3) if you want to be on the very safe side, replace it. > > but: > 1) your citation below is not correct. On all disks I know (and > I have disks of almost all vendors) it is possible to reallocate > broken sectors by writing to them. For how to do it, see > http://smartmontools.sourceforge.net/badblockhowto.html. > There are disks where the correct firmware log updates do only work > when done with vendor software (i.e. moving the pending sectors count > to reallocated sector count and the like). But reallocation works. > I have a Fujitsu thing that shows this behaviour. > This text should be corrected eventually. > > 2) Bruce is correct, unfortunatly. > This reallocation mechanism is for a reason, and disk manufacturer > rely on it. > > 3) An example: I have a Samsung disk which showed 9 broken sectors > in the first week of its life. The disk is now over 2 years old and > works flawless in a heavy-duty 24h environment, without any further > broken sectors. > > 4) your scandisk example: yes this happens. But this is mostly because > it is likely that when you have a broken sector, also adjacent sectors > are damaged. Disks are physical, and the reason of damage is often > physical, i.e. damaged recording substrate. > > > For the mentioned case, it would be wise to stress test the disk, > either it breaks completely and you have no problem to get it replaced, > or it stays sane and you can try to use the disk. > > A simple example of a stress test would be e.g. > http://www.mail-archive.com/lin...@vg.../msg00745.html > > > regards, > Manfred > > > >> Also writing over the entire disk surface will not result in sector >> allocation anyway unless done with a special disk tool, please read the >> smartmontools web page >> >> http://smartmontools.sourceforge.net/ >> "These utilities have an important role to fill. If your disk has bad >> sectors (for example, as revealed by running self-tests with >> smartmontools) and the disk is not able to recover the data from those >> sectors, then the disk will not automatically reallocate those damaged >> sectors from its set of spare sectors, because forcing the reallocation >> to take place may entail some loss of data. Because the commands that >> force such reallocation are Vendor Specific, most manufactuers provide a >> utility for this purpose. It may cause data loss but can repair damaged >> sectors (at least, until it runs out of replacement sectors)." >> >> From experience I can say that even if this is done, it might miss >> certain sectors and then the problem may come back. Havent you ever had >> the thing that you scandisk the drive and mark bad sectors, then after a >> while there comes more... >> >> Thanks, >> Evren >> >>> Justin. >>> >>> On Tue, 5 Jun 2007, Bruce Allen wrote: >>> >>>> Hi Evren, >>>> >>>> I don't think that the disk is faulty (the disk SMART status is >>>> 'healthy'). The disk has an unreadable data block. This is >>>> (unfortunately) quite normal for modern disks and does not mean that >> the >>>> disk is 'faulty'. >>>> >>>> Cheers, >>>> Bruce >>>> >>>> >>>> On Mon, 4 Jun 2007, Evren Yurtesen wrote: >>>> >>>>> Bruce Allen wrote: >>>>>> Your read error might be fixable by writing to the entire disk >>>>>> surface to >>>>>> force sector reallocation. Note that this will wipe out any data on >>>>>> the >>>>>> disk. >>>>> This is possible, certain disk recovery programs suggest it. However >> in >>>>> my opinion the disk is faulty, if the error disappears after rewrite, >>>>> there is no guarantee that it wont come back after 1 month or 1 year, >>>>> but I believe there is a high possibility that it will re-appear. >>>>> >>>>> Also, I think that disk manufacturers accept this disk to be faulty so >>>>> it should be changed under warranty without any problems. >>>>> >>>>> Thanks, >>>>> Evren >>>>> >>>>>> Cheers, >>>>>> Bruce >>>>>> >>>>>> >>>>>> On Mon, 4 Jun 2007, Ivan Carey wrote: >>>>>> >>>>>>> Hello, >>>>>>> I have just installed 2 new 500Gb HDD's in a raid 1 server using >>>>>>> gmirror >>>>>>> and freebsd 6.2 >>>>>>> the disks are at /dev/ad4 and /dev/ad6 >>>>>>> >>>>>>> After setting up the gmirror I received an error when testing the >> main >>>>>>> HDD and smartd also gives and error. >>>>>>> I also had to run fsck on one of the partitions to repair a >> SUPERBLOCK >>>>>>> error. >>>>>>> >>>>>>> I did not see any problems occur with the system. >>>>>>> >>>>>>> Are these errors indicating I have a faulty new HDD and how may I >>>>>>> repair >>>>>>> this error? >>>>>>> >>>>>>> I have included the smartctl output below and the smartd output >>>>>>> below that >>>>>>> >>>>>>> ***************************************************************** >>>>>>> server# smartctl -a /dev/ad4 >>>>>>> smartctl version 5.36 [i386-portbld-freebsd6.2] Copyright (C) 2002-6 >>>>>>> Bruce Allen >>>>>>> Home page is http://smartmontools.sourceforge.net/ >>>>>>> >>>>>>> === START OF INFORMATION SECTION === >>>>>>> Device Model: WDC WD5000AAKS-65TMA0 >>>>>>> Serial Number: WD-WCAPW1675560 >>>>>>> Firmware Version: 12.01C01 >>>>>>> User Capacity: 500,107,862,016 bytes >>>>>>> Device is: Not in smartctl database [for details use: -P >>>>>>> showall] >>>>>>> ATA Version is: 7 >>>>>>> ATA Standard is: Exact ATA specification draft version not >> indicated >>>>>>> Local Time is: Mon Jun 4 17:08:06 2007 EST >>>>>>> SMART support is: Available - device has SMART capability. >>>>>>> SMART support is: Enabled >>>>>>> >>>>>>> === START OF READ SMART DATA SECTION === >>>>>>> SMART overall-health self-assessment test result: PASSED >>>>>>> >>>>>>> General SMART Values: >>>>>>> Offline data collection status: (0x84) Offline data collection >>>>>>> activity >>>>>>> was suspended by an >>>>>>> interrupting >>>>>>> command from host. >>>>>>> Auto Offline Data Collection: >>>>>>> Enabled. >>>>>>> Self-test execution status: ( 121) The previous self-test >>>>>>> completed >>>>>>> having >>>>>>> the read element of the test >>>>>>> failed. >>>>>>> Total time to complete Offline >>>>>>> data collection: (12600) seconds. >>>>>>> Offline data collection >>>>>>> capabilities: (0x7b) SMART execute Offline >>>>>>> immediate. >>>>>>> Auto Offline data collection >>>>>>> on/off support. >>>>>>> Suspend Offline collection >>>>>>> upon new >>>>>>> command. >>>>>>> Offline surface scan >> supported. >>>>>>> Self-test supported. >>>>>>> Conveyance Self-test >> supported. >>>>>>> Selective Self-test >> supported. >>>>>>> SMART capabilities: (0x0003) Saves SMART data before >>>>>>> entering >>>>>>> power-saving mode. >>>>>>> Supports SMART auto save >> timer. >>>>>>> Error logging capability: (0x01) Error logging supported. >>>>>>> General Purpose Logging >>>>>>> supported. >>>>>>> Short self-test routine >>>>>>> recommended polling time: ( 2) minutes. >>>>>>> Extended self-test routine >>>>>>> recommended polling time: ( 157) minutes. >>>>>>> Conveyance self-test routine >>>>>>> recommended polling time: ( 6) minutes. >>>>>>> >>>>>>> SMART Attributes Data Structure revision number: 16 >>>>>>> Vendor Specific SMART Attributes with Thresholds: >>>>>>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE >>>>>>> UPDATED WHEN_FAILED RAW_VALUE >>>>>>> 1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail >>>>>>> Always - 0 >>>>>>> 3 Spin_Up_Time 0x0003 175 173 021 Pre-fail >>>>>>> Always - 6250 >>>>>>> 4 Start_Stop_Count 0x0032 100 100 000 Old_age >>>>>>> Always - 24 >>>>>>> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail >>>>>>> Always - 0 >>>>>>> 7 Seek_Error_Rate 0x000e 200 200 051 Old_age >>>>>>> Always - 0 >>>>>>> 9 Power_On_Hours 0x0032 100 100 000 Old_age >>>>>>> Always - 56 >>>>>>> 10 Spin_Retry_Count 0x0012 100 253 051 Old_age >>>>>>> Always - 0 >>>>>>> 11 Calibration_Retry_Count 0x0012 100 253 051 Old_age >>>>>>> Always - 0 >>>>>>> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age >>>>>>> Always - 24 >>>>>>> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age >>>>>>> Always - 23 >>>>>>> 193 Load_Cycle_Count 0x0032 200 200 000 Old_age >>>>>>> Always - 24 >>>>>>> 194 Temperature_Celsius 0x0022 114 108 000 Old_age >>>>>>> Always - 36 >>>>>>> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age >>>>>>> Always - 0 >>>>>>> 197 Current_Pending_Sector 0x0012 200 200 000 Old_age >>>>>>> Always - 1 >>>>>>> 198 Offline_Uncorrectable 0x0010 100 253 000 Old_age >>>>>>> Offline - 0 >>>>>>> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age >>>>>>> Always - 0 >>>>>>> 200 Multi_Zone_Error_Rate 0x0008 100 253 051 Old_age >>>>>>> Offline - 0 >>>>>>> >>>>>>> SMART Error Log Version: 1 >>>>>>> ATA Error Count: 1 >>>>>>> CR = Command Register [HEX] >>>>>>> FR = Features Register [HEX] >>>>>>> SC = Sector Count Register [HEX] >>>>>>> SN = Sector Number Register [HEX] >>>>>>> CL = Cylinder Low Register [HEX] >>>>>>> CH = Cylinder High Register [HEX] >>>>>>> DH = Device/Head Register [HEX] >>>>>>> DC = Device Command Register [HEX] >>>>>>> ER = Error register [HEX] >>>>>>> ST = Status register [HEX] >>>>>>> Powered_Up_Time is measured from power on, and printed as >>>>>>> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, >>>>>>> SS=sec, and sss=millisec. It "wraps" after 49.710 days. >>>>>>> >>>>>>> Error 1 occurred at disk power-on lifetime: 38 hours (1 days + 14 >>>>>>> hours) >>>>>>> When the command that caused the error occurred, the device was >>>>>>> active >>>>>>> or idle. >>>>>>> >>>>>>> After command completion occurred, registers were: >>>>>>> ER ST SC SN CL CH DH >>>>>>> -- -- -- -- -- -- -- >>>>>>> 40 51 00 98 c1 65 e5 Error: UNC at LBA = 0x0565c198 = 90554776 >>>>>>> >>>>>>> Commands leading to the command that caused the error were: >>>>>>> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name >>>>>>> -- -- -- -- -- -- -- -- ---------------- -------------------- >>>>>>> c8 00 80 80 c1 65 05 00 10:55:44.882 READ DMA >>>>>>> c8 00 80 00 c1 65 05 00 10:55:44.882 READ DMA >>>>>>> c8 00 80 80 c0 65 05 00 10:55:44.881 READ DMA >>>>>>> c8 00 80 00 c0 65 05 00 10:55:44.881 READ DMA >>>>>>> c8 00 80 80 bf 65 05 00 10:55:44.880 READ DMA >>>>>>> >>>>>>> SMART Self-test log structure revision number 1 >>>>>>> Num Test_Description Status Remaining >>>>>>> LifeTime(hours) LBA_of_first_error >>>>>>> # 1 Extended offline Completed: read failure 90% >>>>>>> 53 90554776 >>>>>>> # 2 Short offline Completed: read failure 90% >>>>>>> 53 90554776 >>>>>>> # 3 Short offline Completed: read failure 90% >>>>>>> 52 90554776 >>>>>>> # 4 Short offline Completed: read failure 90% >>>>>>> 52 90554776 >>>>>>> # 5 Short offline Completed without error 00% >>>>>>> 37 - >>>>>>> >>>>>>> SMART Selective self-test log data structure revision number 1 >>>>>>> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS >>>>>>> 1 0 0 Not_testing >>>>>>> 2 0 0 Not_testing >>>>>>> 3 0 0 Not_testing >>>>>>> 4 0 0 Not_testing >>>>>>> 5 0 0 Not_testing >>>>>>> Selective self-test flags (0x0): >>>>>>> After scanning selected spans, do NOT read-scan remainder of disk. >>>>>>> If Selective self-test is pending on power-up, resume after 0 >>>>>>> minute delay. >>>>>>> >>>>>>> >> ***************************************************************************************************** >>>>>>> Mon Jun 4 13:27:27 EST 2007 >>>>>>> Jun 4 13:27:58 server login: ROOT LOGIN (root) ON ttyv0 >>>>>>> Jun 4 13:57:25 server smartd[661]: Device: /dev/ad4, 1 Currently >>>>>>> unreadable (pending) sectors >>>>>>> Jun 4 13:57:25 server smartd[661]: Device: /dev/ad4, Self-Test Log >>>>>>> error count increased from 0 to 4 >>>>>>> Jun 4 14:27:25 server smartd[661]: Device: /dev/ad4, 1 Currently >>>>>>> unreadable (pending) sectors >>>>>>> Jun 4 14:57:25 server smartd[661]: Device: /dev/ad4, 1 Currently >>>>>>> unreadable (pending) sectors >>>>>>> Jun 4 15:27:25 server smartd[661]: Device: /dev/ad4, 1 Currently >>>>>>> unreadable (pending) sectors >>>>>>> Jun 4 15:57:25 server smartd[661]: Device: /dev/ad4, 1 Currently >>>>>>> unreadable (pending) sectors >>>>>>> Jun 4 16:27:25 server smartd[661]: Device: /dev/ad4, 1 Currently >>>>>>> unreadable (pending) sectors >>>>>>> Jun 4 16:57:25 server smartd[661]: Device: /dev/ad4, 1 Currently >>>>>>> unreadable (pending) sectors >>>>>>> >> *********************************************************************************************** >>>>>>> >>>>>>> Thanks, >>>>>>> Ivan >>>>>>> >>>>>>> >> ------------------------------------------------------------------------- >>>>>>> This SF.net email is sponsored by DB2 Express >>>>>>> Download DB2 Express C - the FREE version of DB2 express and take >>>>>>> control of your XML. No limits. Just data. Click to get it now. >>>>>>> http://sourceforge.net/powerbar/db2/ >>>>>>> _______________________________________________ >>>>>>> Smartmontools-support mailing list >>>>>>> Sma...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/smartmontools-support >>>>>>> >>>>>> >> ------------------------------------------------------------------------- >>>>>> This SF.net email is sponsored by DB2 Express >>>>>> Download DB2 Express C - the FREE version of DB2 express and take >>>>>> control of your XML. No limits. Just data. Click to get it now. >>>>>> http://sourceforge.net/powerbar/db2/ >>>>>> _______________________________________________ >>>>>> Smartmontools-support mailing list >>>>>> Sma...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/smartmontools-support >>>>> >> ------------------------------------------------------------------------- >>>>> This SF.net email is sponsored by DB2 Express >>>>> Download DB2 Express C - the FREE version of DB2 express and take >>>>> control of your XML. No limits. Just data. Click to get it now. >>>>> http://sourceforge.net/powerbar/db2/ >>>>> _______________________________________________ >>>>> Smartmontools-support mailing list >>>>> Sma...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/smartmontools-support >>>>> >>>> >> ------------------------------------------------------------------------- >>>> This SF.net email is sponsored by DB2 Express >>>> Download DB2 Express C - the FREE version of DB2 express and take >>>> control of your XML. No limits. Just data. Click to get it now. >>>> http://sourceforge.net/powerbar/db2/ >>>> _______________________________________________ >>>> Smartmontools-support mailing list >>>> Sma...@li... >>>> https://lists.sourceforge.net/lists/listinfo/smartmontools-support >>>> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> Smartmontools-support mailing list >> Sma...@li... >> https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |