From: Praveen <pra...@gm...> - 2004-11-04 09:53:58
|
Hi, I have a Maxtor 6Y160P0 drive, with one fat32 and one reiserfs partition. Recently i realised (after some compilation errors which always used to hang the system - i was using kdecvs-build program) that syslog showed two specific SMART messages regarding the maxtor drive 1. Device: /dev/hda, 137 Offline uncorrectable sectors 2. Device: /dev/hda, 138 Currently unreadable (pending) sectors. I then ran a long test using smartctl -t long which showed status "Completed:Read failure" and listed the LBA of first error. I then ran 'badblocks' on each of the partition to verify if there are any bad sectors but it showed nothing. I again ran the badblocks program with -n option which does a non-destructive write test and it showed lot of errors on the fat32 partition like Nov 3 15:58:37 hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=146560385, high=8, low=12342657, sector=146560291 Nov 3 15:58:37 end_request: I/O error, dev hda, sector 146560291. After running this all smartctl tests now pass with status "Completed without error". The syslog message from smartctl now only indicates 1. Device: /dev/hda, 138 Currently unreadable (pending) sectors. I am not sure what just happened with the drive, did 'badblocks' force the drive to remap bad sectors . What does the "currently unreadable sectors" signify and why does the smart test now pass without errors. Should i do a low level format of the drive? The errors only occured for the fat32 partition not the reiserfs partition. Why did the system hang when compiling in reiserfs partition when the errors were found in fat32 partition ( i double checked the lba of dma errors above and also the smartctl LBA of first error and they are in fat32 partition) and by the way the same compilations finished fine on another drive so i guess it couldn't have been anything else. I have another drive which actually has the OS and that runs just fine. I have read the badblocks howto and i am unsure on how to apply it to fat partition but anyway to prevent screwup on my part i would rather backup the data on the drive and do a low level format. Thanks, praveen. |
From: Volker K. <lis...@pa...> - 2004-11-05 03:22:33
|
> I then ran a long test using smartctl -t long which > showed status "Completed:Read failure" and listed the LBA of first > error. Confirmation that the disk has suffered from read error(s). > I then ran 'badblocks' on each of the partition to verify if > there are any bad sectors but it showed nothing. In my own experience running badblocks to obtain any information about errors on the disk is a complete utter waste of time. The program is suitable for 1980s harddisks only (essentially). If it finds anything "bad", your harddisk is so bad you'd know about it without having to run badblocks. Some people use it though for testing brand new disks with success (if it fails there, that disk didn't yet waste any of your time). What badblocks does however is write(!) every disk blocks at least once. The actual data written is irrelevant, the point is that the magnetisation on the disk surface is recreated, so a previously unreadable sector can be read again. Don't think this is good news though - fact is the surface at that point had previously degraded below what's needed to read the data. It will happen again, you may roll your dice to find out for how long that block will store its data this time. > badblocks program with -n option which does a non-destructive write > test and it showed lot of errors on the fat32 partition like If badblocks finds the disk to be stuffed, that means the disk is stuffed. Not necessarily every sector is stuffed, but a "lot" means you ought to retire the disk to the junkyard today. Not tomorrow. Copy off what you need before it's too late. > Nov 3 15:58:37 hda: dma_intr: error=0x40 { UncorrectableError }, > LBAsect=146560385, high=8, low=12342657, sector=146560291 > Nov 3 15:58:37 end_request: I/O error, dev hda, sector 146560291. Some braindamage in the Linux kernel. It shows DMA errors and does this that and the other in the DMA department when the disk simply doesn't return the data in time because it's not readable. You get the same problems with CDs which have unreadable blocks. > After running this all smartctl tests now pass with status "Completed > without error". The data retention time on all blocks tested by the selftest is longer than the time it takes to run the test, yet > 1. Device: /dev/hda, 138 Currently unreadable (pending) sectors. the disk still has blocks it can't read. Well the bottom line is who cares, the disk is stuffed. > but anyway to prevent screwup on my part i would rather backup the > data on the drive and do a low level format. Don't waste your time. The disk may seem to work for a while then fail again. The sooner you hit that disk with a large brick the less time you'll waste on it and the less data you'll lose. Volker -- Volker Kuhlmann is possibly list0570 with the domain in header http://volker.dnsalias.net/ Please do not CC list postings to me. |
From: Bruce A. <ba...@gr...> - 2004-11-05 17:20:19
|
> What badblocks does however is write(!) every disk blocks at least once. > The actual data written is irrelevant, the point is that the > magnetisation on the disk surface is recreated, so a previously > unreadable sector can be read again. > > Don't think this is good news though - fact is the surface at that > point had previously degraded below what's needed to read the data. It > will happen again, you may roll your dice to find out for how long > that block will store its data this time. Volker, is this right? I thought that if badblocks re-writes a sector, and that sector is marked as bad by the disk, then writing it will force it to reallocate, not merely write the data to 'the same' sector. The disk firmware does this transparently: the sector LBA stays the same but the actual physical position, where the sector with this LBA is stored on the disk, changes. Of course if badblocks can't READ the sector first, I don't see how it can rewrite it (;-). Cheers, Bruce |
From: Volker K. <lis...@pa...> - 2004-11-06 08:58:38
|
> Volker, is this right? I thought that if badblocks re-writes a sector, > and that sector is marked as bad by the disk, then writing it will force > it to reallocate, not merely write the data to 'the same' sector. You're right of course, that's what should happen. I was thinking of the case where the disk fails to read a sector, but still doesn't reallocate. Does this ever happen? Guess we won't know. How is "fails" defined here? "Fails finally", or "fails the first few attempts"? One would be totally at the mercy of the firmware. As a ball park, it seems that one shouldn't trust a disk when there is a steep jump in the number of reallocated sectors, or the number of reallocated sectors is larger than (this is arguable though) 5...50 per year of age. > Of course if badblocks can't READ the sector first, I don't see how it can > rewrite it (;-). I assume that's the read errors logged in syslog. In that case the non-destructive test becomes destructive if badblocks decides to proceed with the test of that sector; if it doesn't proceed, the sector would have to be left untested. The source knows which will happen. Volker -- Volker Kuhlmann is possibly list0570 with the domain in header http://volker.dnsalias.net/ Please do not CC list postings to me. |
From: Bruce A. <ba...@gr...> - 2004-11-06 11:13:12
|
> > Of course if badblocks can't READ the sector first, I don't see how it can > > rewrite it (;-). > > I assume that's the read errors logged in syslog. In that case the > non-destructive test becomes destructive if badblocks decides to > proceed with the test of that sector; if it doesn't proceed, the > sector would have to be left untested. The source knows which will > happen. Volker, would you be willing to browse the badblocks source to see? I think this question will come up again. If it's not obvoius I can write to Theodore Ts'o and ask him. Cheers, Bruce |
From: Volker K. <lis...@pa...> - 2004-11-19 06:00:28
|
On Sun 07 Nov 2004 00:11:39 NZDT +1300, Bruce Allen wrote: > > > Of course if badblocks can't READ the sector first, I don't see how it can > > > rewrite it (;-). > > > > I assume that's the read errors logged in syslog. In that case the > > non-destructive test becomes destructive if badblocks decides to > > proceed with the test of that sector; if it doesn't proceed, the > > sector would have to be left untested. The source knows which will > > happen. > > Volker, would you be willing to browse the badblocks source to see? I > think this question will come up again. If it's not obvoius I can write to > Theodore Ts'o and ask him. Ok, I had a glance. The relevant file is e2fsprogs-1.34/misc/badblocks.c Blocks already marked bad on the filesystem are skipped while testing. The read test (function test_ro) does a simple block read, and some comparison but I don't get quite what for. The read/write does a write, a read, and a cmp, and adds any unwritable and blocks and any failing comparison to the bad blocks list. The non-destructive test reads blocks into memory first. In answer to the main question: any blocks with read errors are simply added to the bad blocks list and henceforth skipped... After that it's a write, cmp, and restore. Any blocks producing more errors are added to the bad list. As a btw, any catastrophic program termination skips the restore stage and the test is no longer non-detructive. The whole thing relies on the kernel to produce suitable error results. When I tested a known bad disk on which badblocks didn't find any bad blocks although there were plenty of I/O errors IIRC, it seems the kernel didn't report this back to the app properly. Volker -- Volker Kuhlmann is possibly list0570 with the domain in header http://volker.dnsalias.net/ Please do not CC list postings to me. |
From: Bruce A. <ba...@gr...> - 2004-11-19 08:45:20
|
When badblocks calls open() does it use the O_DIRECT flag? On Fri, 19 Nov 2004, Volker Kuhlmann wrote: > On Sun 07 Nov 2004 00:11:39 NZDT +1300, Bruce Allen wrote: > > > > > Of course if badblocks can't READ the sector first, I don't see how it can > > > > rewrite it (;-). > > > > > > I assume that's the read errors logged in syslog. In that case the > > > non-destructive test becomes destructive if badblocks decides to > > > proceed with the test of that sector; if it doesn't proceed, the > > > sector would have to be left untested. The source knows which will > > > happen. > > > > Volker, would you be willing to browse the badblocks source to see? I > > think this question will come up again. If it's not obvoius I can write to > > Theodore Ts'o and ask him. > > Ok, I had a glance. The relevant file is e2fsprogs-1.34/misc/badblocks.c > > Blocks already marked bad on the filesystem are skipped while testing. > > The read test (function test_ro) does a simple block read, and some > comparison but I don't get quite what for. > > The read/write does a write, a read, and a cmp, and adds any unwritable > and blocks and any failing comparison to the bad blocks list. > > The non-destructive test reads blocks into memory first. In answer to > the main question: any blocks with read errors are simply added to the > bad blocks list and henceforth skipped... > After that it's a write, cmp, and restore. Any blocks producing more > errors are added to the bad list. As a btw, any catastrophic program > termination skips the restore stage and the test is no longer > non-detructive. > > The whole thing relies on the kernel to produce suitable error results. > When I tested a known bad disk on which badblocks didn't find any bad > blocks although there were plenty of I/O errors IIRC, it seems the > kernel didn't report this back to the app properly. > > Volker > > -- > Volker Kuhlmann is possibly list0570 with the domain in header > http://volker.dnsalias.net/ Please do not CC list postings to me. > > > ------------------------------------------------------- > This SF.Net email is sponsored by: InterSystems CACHE > FREE OODBMS DOWNLOAD - A multidimensional database that combines > robust object and relational technologies, making it a perfect match > for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > > |
From: Volker K. <lis...@pa...> - 2004-11-19 09:47:48
|
> When badblocks calls open() does it use the O_DIRECT flag? Not with open(), but if O_DIRECT is defined, it calls fcntl() and manipulates the O_DIRECT bit in some flag field. Volker -- Volker Kuhlmann is possibly list0570 with the domain in header http://volker.dnsalias.net/ Please do not CC list postings to me. |
From: Bruce A. <ba...@gr...> - 2004-11-19 10:05:08
|
> > When badblocks calls open() does it use the O_DIRECT flag? > > Not with open(), but if O_DIRECT is defined, it calls fcntl() and > manipulates the O_DIRECT bit in some flag field. OK that's the right behavior. Should make it bypass buffering in the kernel block layer. |