From: Chris L. <cl...@ya...> - 2009-09-23 22:25:31
|
Hi Franc and folks, Thanks for that. Yes, I have the same manual and page 134 also confuses me. I'm also interested in the details about the errors that have occurred on the drive, such as this one: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 41 00 03 00 00 05 22 a4 4c 40 00 If I convert the LBA of 00 00 05 22 a4 4c to decimal, I get 86156364, which is the LBA mentioned here in the self-test: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Conveyance offline Completed: read failure 90% 4830 86156364 # 2 Extended offline Completed: read failure 90% 4830 86156364 # 3 Short offline Completed: read failure 90% 4829 86156364 And I am trying to decode the "ER" value of 40. >From page 96 of that manual (http://www.fel.fujitsu.com/support/disk/manuals/product_manual___a160_.pdf), it has the structure described below for the error field. 40 hex is binary 0100 0000, so I take that to mean that bit #6 (see below) is set, which means "Uncorrectable Data Error (UNC)". But what does this really mean? Does it imply that other blocks are likely to currently have or ultimately (in the future) get the same error? I've tracked down LBA 86156364 to a specific file in my filesystem and can confirm that trying to access that file does indeed seem to hang my system and add another error to the total error count reported for the drive. And of course this new error mentions the same LBA of 86156364. Error field structure from drive manual: - Bit 7: Unused - Bit 6: Uncorrectable Data Error (UNC). This bit indicates that an uncorrectable data error has been encountered. - Bit 5: Unused - Bit 4: ID Not Found (IDNF). This bit indicates an error except for bad sector, uncorrectable error and SB not found. Or, SATA Frame Error Write (SFRW) This bit indicates that a SATA communication error has been encountered during the write process. In this case, bit4 and bit2 are set both. - Bit 3: SATA Frame Error Read (SF RR). This bit indicates that a SATA communication error has been encountered during the read process. In this case, bit3 and bit2 are set both. - Bit 2: Aborted Command (ABRT). This bit indicates that the requested command was aborted due to a device status error (e.g. Not Ready, Write Fault) or the command code was invalid. - Bit 1: Track 0 Not Found (TK0NF). This bit indicates that track 0 was not found during RECALIBRATE command execution. - Bit 0: Address Mark Not Found (AMNF). This bit indicates that the SB Not Found error occurred. > Date: Thu, 24 Sep 2009 05:17:20 +1000 > From: Franc Zabkar <fz...@in...> > Subject: Re: [smartmontools-support] Fujitsu SATA drive failing > self-tests - help with diagnosis > To: sma...@li... > Message-ID: <125...@ma...> > Content-Type: text/plain; charset="us-ascii"; format=flowed > > I believe the raw value of the Raw_Read_Error_Rate attribute reflects > a sector count rather than an error rate. My older model Fujitsu > drive appears to count the number of read errors in each block of > 0x40000 reads and then adjust the cooked value accordingly. > > See this Usenet discussion (includes my test results): > > http://groups.google.com/group/comp.sys.ibm.pc.hardware.storage/browse_thread/thread/b6eb8aa2476f9cac/030c515959145d44#030c515959145d44 > > The Reallocated_Sector_Count attribute is a 48-bit number (as are all > the others). The actual raw value is 0x07d000000000, ie (0x07d0, > 0x00000000). The uppermost 16 bits indicate the total number of > remaining sectors available to be reallocated, in this case 2000, > whereas the lower 32-bits hold the actual number of reallocated sectors. > > The UDMA_CRC_Error_Count is 0x0300c61f which suggests it may be a > two-part value, 0x0300 and 0xc61f. > > Multi_Zone_Error_Rate = 168589527 = 0x0a0c78d7 > > 240 Head_Flying_Hours = 282001 > > 241 Unknown_Attribute = 71965295247363 = 0x4173b9dc0003 <--- probably 3 > > 242 Unknown_Attribute = 31232272760836 = 0x1c67d4860004 <--- probably 4 > > Page 134 of the following Fujitsu manual only serves to confuse me. > > MHZ2320BJ, MHZ2250BJ, MHZ2200BJ, MHZ2160BJ, MHZ2120BJ, MHZ2080BJ > product manual: > http://www.fel.fujitsu.com/support/disk/manuals/product_manual___a160_.pdf > > Attribute 240 is described as a Transfer Error Rate, not Head_Flying_Hours. > > Attribute 200 is a Write Error Rate, not Multi_Zone_Error_Rate. > > Attribute 191, Sense_Error_Rate, is not listed. Neither are 241 or 242. > > -Franc Zabkar |
From: Chris L. <cl...@ya...> - 2009-09-24 10:29:56
|
Thanks again Franc. Do you know of a tool (Windows, linux, bootable media) I could use to do the brute force read of that sector in a loop that I could actually use without the OS or the tool crashing from the drive error itself? Maybe I need something on a Knoppix (linux) bootable CD/DVD, because Windows gets very unhappy when reading that sector. Is MHDD any good? > Date: Thu, 24 Sep 2009 09:58:25 +1000 > From: Franc Zabkar <fz...@in...> > Subject: Re: [smartmontools-support] Fujitsu SATA drive failing > self-tests - help with diagnosis > To: sma...@li... > Message-ID: <125...@ma...> > Content-Type: text/plain; charset="us-ascii"; format=flowed > > I think that the only thing you can infer about your drive at the > moment is that it has grown one solid, reproducible defect. > > Page 34 refers to attribute 197, Current Pending Sector Count. This > appears to be missing from your SMART report. Normally one would > expect that LBA 86156364 would show up as "pending reallocation". I > suggest you delete the file that is causing problems and restore it > from a backup. This will free up the clusters occupied by that file, > including the damaged sector. Next time the OS tries to write to that > sector, the drive should automatically remap it and increment the > reallocated sector count. Another way that you may be able to recover > LBA 86156364 is by brute force. Keep reading that particular sector > in a loop until you achieve just one successful read, at which time > the sector should be transparently remapped. > > This article has an explanation of the SMART attributes: > http://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes > > It may not be completely accurate, though. > > -Franc |
From: Eric S. <ej...@sh...> - 2009-09-24 16:46:28
|
The linux "dd" command is good for that. You might need to write a little script for it to do the looping. Knoppix is a great live distro, and a good choice for troubleshooting, system recovery, and this sort of thing. Chris Lopes wrote: > Thanks again Franc. Do you know of a tool (Windows, linux, bootable media) I could use to do the brute force read of that sector in a loop that I could actually use without the OS or the tool crashing from the drive error itself? > > Maybe I need something on a Knoppix (linux) bootable CD/DVD, because Windows gets very unhappy when reading that sector. Is MHDD any good? > >> Date: Thu, 24 Sep 2009 09:58:25 +1000 >> From: Franc Zabkar <fz...@in...> >> Subject: Re: [smartmontools-support] Fujitsu SATA drive failing >> self-tests - help with diagnosis >> To: sma...@li... >> Message-ID: <125...@ma...> >> Content-Type: text/plain; charset="us-ascii"; format=flowed >> >> I think that the only thing you can infer about your drive at the >> moment is that it has grown one solid, reproducible defect. >> >> Page 34 refers to attribute 197, Current Pending Sector Count. This >> appears to be missing from your SMART report. Normally one would >> expect that LBA 86156364 would show up as "pending reallocation". I >> suggest you delete the file that is causing problems and restore it >> from a backup. This will free up the clusters occupied by that file, >> including the damaged sector. Next time the OS tries to write to that >> sector, the drive should automatically remap it and increment the >> reallocated sector count. Another way that you may be able to recover >> LBA 86156364 is by brute force. Keep reading that particular sector >> in a loop until you achieve just one successful read, at which time >> the sector should be transparently remapped. >> >> This article has an explanation of the SMART attributes: >> http://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes >> >> It may not be completely accurate, though. >> >> -Franc > > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf -- -Eric 'shubes' |
From: Christian F. <Chr...@t-...> - 2009-09-24 17:52:42
|
Eric Shubert wrote: > The linux "dd" command is good for that. You might need to write a > little script for it to do the looping. GNU ddrescue is a IMO very good tool do to this. It works automatic, no script needed. It also allows to easily overwrite the bad sectors to force reallocation. http://www.gnu.org/software/ddrescue/ddrescue.html http://www.forensicswiki.org/wiki/Ddrescue The Debian package is 'gddrescue' (Package 'ddrescue' contains another tool named dd_rescue). > Knoppix is a great live distro, > and a good choice for troubleshooting, system recovery, and this sort of > thing. > My favorite: http://grml.org/ > Chris Lopes wrote: > >> Thanks again Franc. Do you know of a tool (Windows, linux, bootable media) I could use to do the brute force read of that sector in a loop that I could actually use without the OS or the tool crashing from the drive error itself? >> >> Maybe I need something on a Knoppix (linux) bootable CD/DVD, because Windows gets very unhappy when reading that sector. Is MHDD any good? >> >> ddrescue is also available as a Cygwin package. Cheers, Christian |
From: Eric S. <ej...@sh...> - 2009-09-24 18:00:26
|
Christian Franke wrote: > Eric Shubert wrote: >> The linux "dd" command is good for that. You might need to write a >> little script for it to do the looping. > > GNU ddrescue is a IMO very good tool do to this. It works automatic, no > script needed. It also allows to easily overwrite the bad sectors to > force reallocation. > > http://www.gnu.org/software/ddrescue/ddrescue.html > http://www.forensicswiki.org/wiki/Ddrescue > > The Debian package is 'gddrescue' (Package 'ddrescue' contains another > tool named dd_rescue). > >> Knoppix is a great live distro, >> and a good choice for troubleshooting, system recovery, and this sort of >> thing. >> > > My favorite: http://grml.org/ > > >> Chris Lopes wrote: >> >>> Thanks again Franc. Do you know of a tool (Windows, linux, bootable media) I could use to do the brute force read of that sector in a loop that I could actually use without the OS or the tool crashing from the drive error itself? >>> >>> Maybe I need something on a Knoppix (linux) bootable CD/DVD, because Windows gets very unhappy when reading that sector. Is MHDD any good? >>> >>> > > ddrescue is also available as a Cygwin package. > > > Cheers, > Christian > Thanks Christian! -- -Eric 'shubes' |
From: Franc Z. <fz...@in...> - 2009-09-24 19:03:45
|
MHDD was written by a data recovery professional. It appears to be highly acclaimed by the DR community. I haven't yet needed to use it, though. There is a commercial tool named Spinrite that claims to brute-force bad sectors. If it fails, it then reverts to turning off ECC and repetitively reading the raw uncorrected data from the faulty sector until it produces some kind of statistical frequency map for each bit. If the data are part of a text document, then a simple Read Long ATA command may retrieve the raw uncorrected data (plus ECC bytes), with your brain providing the corrections, hopefully minor ones. -Franc At 08:29 PM 24/09/09, you wrote: >Do you know of a tool (Windows, linux, bootable media) I could use >to do the brute force read of that sector in a loop that I could >actually use without the OS or the tool crashing from the drive error itself? > >Maybe I need something on a Knoppix (linux) bootable CD/DVD, because >Windows gets very unhappy when reading that sector. Is MHDD any good? |
From: Tim S. <ti...@bu...> - 2009-09-24 19:35:15
|
Chris Lopes wrote: > Thanks again Franc. Do you know of a tool (Windows, linux, bootable media) I could use to do the brute force read of that sector in a loop that I could actually use without the OS or the tool crashing from the drive error itself? > You can use "hdparm --readsector" - any recent hdparm on a bootable Linux (e.g. knoppix or grml) should do. It's probably not going to recover tho, in my experience - in which case you'll need to use hdparm --writesector to force a remap... It's probably worth checking the surrounding sectors as well (using readsector, or SMART selective self-test) whilst you're at it... Ta, Tim. |
From: Tim S. <ti...@se...> - 2009-09-24 10:38:36
|
Chris Lopes wrote: > Thanks again Franc. Do you know of a tool (Windows, linux, bootable media) I could use to do the brute force read of that sector in a loop that I could actually use without the OS or the tool crashing from the drive error itself? > You can use "hdparm --readsector" - any recent hdparm on a bootable Linux (e.g. knoppix or grml) should do. It's probably not going to recover tho, in my experience - in which case you'll need to use hdparm --writesector to force a remap... It's probably worth checking the surrounding sectors as well (using readsector, or SMART selective self-test) whilst you're at it... Ta, Tim. -- South East Open Source Solutions Limited Registered in England and Wales with company number 06134732. Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309 |