Thread: [smartmontools-support]Currently unreadable sectors problem??

Disk Inspection and Monitoring

Brought to you by: ballen4705, chrfranke, dipohl

smartmontools-support

[smartmontools-support]Currently unreadable sectors problem??

From: Praveen <pra...@gm...> - 2004-11-04 09:53:58

Hi,
              I have a Maxtor  6Y160P0 drive, with one fat32 and one
reiserfs partition. Recently i realised (after some compilation errors
which always used to hang the system - i was using kdecvs-build
program) that  syslog showed two specific SMART messages regarding the
maxtor drive
             1. Device: /dev/hda, 137 Offline uncorrectable sectors
             2. Device: /dev/hda, 138 Currently unreadable (pending) sectors.
       
             I then ran a long test using smartctl -t long which
showed status "Completed:Read failure" and listed the LBA of first
error. I then ran 'badblocks' on each of the partition to verify if
there are any bad sectors but it showed nothing. I again ran the
badblocks program with -n option which does a non-destructive write
test and it showed lot of errors on the fat32 partition like
    
Nov  3 15:58:37 hda: dma_intr: error=0x40 { UncorrectableError },
LBAsect=146560385, high=8, low=12342657, sector=146560291
Nov  3 15:58:37 end_request: I/O error, dev hda, sector 146560291.

After running this all smartctl tests now pass with status "Completed
without error". The syslog message from smartctl now only indicates
            1.  Device: /dev/hda, 138 Currently unreadable (pending) sectors.

I am not sure what just happened with the drive, did 'badblocks' force
the drive to remap bad sectors . What does the "currently unreadable
sectors" signify and why does the smart test now pass without errors.
Should i do a low level format of the drive? The errors only occured
for the fat32 partition not the reiserfs partition. Why did the system
hang when compiling in reiserfs partition when the errors were found
in fat32 partition ( i double checked the lba of dma errors above and
also the smartctl LBA of first error and they are in fat32 partition)
and by the way the same compilations finished fine on another drive so
i guess it couldn't have been anything else. I have another drive
which actually has the OS and that runs just fine. I have  read the
badblocks howto and i am unsure on how to apply it to fat partition
but anyway to prevent screwup on my part i would rather backup the
data on the drive and do a low level format.

Thanks,
praveen.

Re: [smartmontools-support]Currently unreadable sectors problem??

From: Volker K. <lis...@pa...> - 2004-11-05 03:22:33

>              I then ran a long test using smartctl -t long which
> showed status "Completed:Read failure" and listed the LBA of first
> error.

Confirmation that the disk has suffered from read error(s).

> I then ran 'badblocks' on each of the partition to verify if
> there are any bad sectors but it showed nothing.

In my own experience running badblocks to obtain any information about
errors on the disk is a complete utter waste of time. The program is
suitable for 1980s harddisks only (essentially). If it finds anything
"bad", your harddisk is so bad you'd know about it without having to run
badblocks. Some people use it though for testing brand new disks with
success (if it fails there, that disk didn't yet waste any of your
time).

What badblocks does however is write(!) every disk blocks at least once.
The actual data written is irrelevant, the point is that the
magnetisation on the disk surface is recreated, so a previously
unreadable sector can be read again.

Don't think this is good news though - fact is the surface at that point
had previously degraded below what's needed to read the data. It will
happen again, you may roll your dice to find out for how long that
block will store its data this time.

> badblocks program with -n option which does a non-destructive write
> test and it showed lot of errors on the fat32 partition like

If badblocks finds the disk to be stuffed, that means the disk is
stuffed. Not necessarily every sector is stuffed, but a "lot" means you
ought to retire the disk to the junkyard today. Not tomorrow. Copy off
what you need before it's too late.

> Nov  3 15:58:37 hda: dma_intr: error=0x40 { UncorrectableError },
> LBAsect=146560385, high=8, low=12342657, sector=146560291
> Nov  3 15:58:37 end_request: I/O error, dev hda, sector 146560291.

Some braindamage in the Linux kernel. It shows DMA errors and does this
that and the other in the DMA department when the disk simply doesn't
return the data in time because it's not readable. You get the same
problems with CDs which have unreadable blocks.

> After running this all smartctl tests now pass with status "Completed
> without error".

The data retention time on all blocks tested by the selftest is longer
than the time it takes to run the test, yet

>             1.  Device: /dev/hda, 138 Currently unreadable (pending) sectors.

the disk still has blocks it can't read. Well the bottom line is who
cares, the disk is stuffed.

> but anyway to prevent screwup on my part i would rather backup the
> data on the drive and do a low level format.

Don't waste your time. The disk may seem to work for a while then fail
again. The sooner you hit that disk with a large brick the less time
you'll waste on it and the less data you'll lose.

Volker

-- 
Volker Kuhlmann			is possibly list0570 with the domain in header
http://volker.dnsalias.net/		Please do not CC list postings to me.

Re: [smartmontools-support]Currently unreadable sectors problem??

From: Bruce A. <ba...@gr...> - 2004-11-05 17:20:19

> What badblocks does however is write(!) every disk blocks at least once.
> The actual data written is irrelevant, the point is that the
> magnetisation on the disk surface is recreated, so a previously
> unreadable sector can be read again.
> 
> Don't think this is good news though - fact is the surface at that
> point had previously degraded below what's needed to read the data. It
> will happen again, you may roll your dice to find out for how long
> that block will store its data this time.

Volker, is this right?  I thought that if badblocks re-writes a sector,
and that sector is marked as bad by the disk, then writing it will force
it to reallocate, not merely write the data to 'the same' sector.  The
disk firmware does this transparently: the sector LBA stays the same but
the actual physical position, where the sector with this LBA is stored on
the disk, changes.

Of course if badblocks can't READ the sector first, I don't see how it can
rewrite it (;-).

Cheers,
	Bruce

Re: [smartmontools-support]Currently unreadable sectors problem??

From: Volker K. <lis...@pa...> - 2004-11-06 08:58:38

> Volker, is this right?  I thought that if badblocks re-writes a sector,
> and that sector is marked as bad by the disk, then writing it will force
> it to reallocate, not merely write the data to 'the same' sector.

You're right of course, that's what should happen. I was thinking of the
case where the disk fails to read a sector, but still doesn't
reallocate. Does this ever happen? Guess we won't know. How is "fails"
defined here? "Fails finally", or "fails the first few attempts"? One
would be totally at the mercy of the firmware.

As a ball park, it seems that one shouldn't trust a disk when there is a
steep jump in the number of reallocated sectors, or the number of
reallocated sectors is larger than (this is arguable though) 5...50 per
year of age.

> Of course if badblocks can't READ the sector first, I don't see how it can
> rewrite it (;-).

I assume that's the read errors logged in syslog. In that case the
non-destructive test becomes destructive if badblocks decides to proceed
with the test of that sector; if it doesn't proceed, the sector would
have to be left untested. The source knows which will happen.

Volker

-- 
Volker Kuhlmann			is possibly list0570 with the domain in header
http://volker.dnsalias.net/		Please do not CC list postings to me.

Re: [smartmontools-support]Currently unreadable sectors problem??

From: Bruce A. <ba...@gr...> - 2004-11-06 11:13:12

> > Of course if badblocks can't READ the sector first, I don't see how it can
> > rewrite it (;-).
> 
> I assume that's the read errors logged in syslog. In that case the
> non-destructive test becomes destructive if badblocks decides to
> proceed with the test of that sector; if it doesn't proceed, the
> sector would have to be left untested. The source knows which will
> happen.

Volker, would you be willing to browse the badblocks source to see?  I
think this question will come up again. If it's not obvoius I can write to
Theodore Ts'o and ask him.

Cheers,
	Bruce

Re: [smartmontools-support]Currently unreadable sectors problem??

From: Volker K. <lis...@pa...> - 2004-11-19 06:00:28

On Sun 07 Nov 2004 00:11:39 NZDT +1300, Bruce Allen wrote:

> > > Of course if badblocks can't READ the sector first, I don't see how it can
> > > rewrite it (;-).
> > 
> > I assume that's the read errors logged in syslog. In that case the
> > non-destructive test becomes destructive if badblocks decides to
> > proceed with the test of that sector; if it doesn't proceed, the
> > sector would have to be left untested. The source knows which will
> > happen.
> 
> Volker, would you be willing to browse the badblocks source to see?  I
> think this question will come up again. If it's not obvoius I can write to
> Theodore Ts'o and ask him.

Ok, I had a glance. The relevant file is e2fsprogs-1.34/misc/badblocks.c

Blocks already marked bad on the filesystem are skipped while testing.

The read test (function test_ro) does a simple block read, and some
comparison but I don't get quite what for.

The read/write does a write, a read, and a cmp, and adds any unwritable
and blocks and any failing comparison to the bad blocks list.

The non-destructive test reads blocks into memory first. In answer to
the main question: any blocks with read errors are simply added to the
bad blocks list and henceforth skipped...
After that it's a write, cmp, and restore. Any blocks producing more
errors are added to the bad list. As a btw, any catastrophic program
termination skips the restore stage and the test is no longer
non-detructive.

The whole thing relies on the kernel to produce suitable error results.
When I tested a known bad disk on which badblocks didn't find any bad
blocks although there were plenty of I/O errors IIRC, it seems the
kernel didn't report this back to the app properly.

Volker

-- 
Volker Kuhlmann			is possibly list0570 with the domain in header
http://volker.dnsalias.net/		Please do not CC list postings to me.

Re: [smartmontools-support]Currently unreadable sectors problem??

From: Bruce A. <ba...@gr...> - 2004-11-19 08:45:20

When badblocks calls open() does it use the O_DIRECT flag?

On Fri, 19 Nov 2004, Volker Kuhlmann wrote:

> On Sun 07 Nov 2004 00:11:39 NZDT +1300, Bruce Allen wrote:
> 
> > > > Of course if badblocks can't READ the sector first, I don't see how it can
> > > > rewrite it (;-).
> > > 
> > > I assume that's the read errors logged in syslog. In that case the
> > > non-destructive test becomes destructive if badblocks decides to
> > > proceed with the test of that sector; if it doesn't proceed, the
> > > sector would have to be left untested. The source knows which will
> > > happen.
> > 
> > Volker, would you be willing to browse the badblocks source to see?  I
> > think this question will come up again. If it's not obvoius I can write to
> > Theodore Ts'o and ask him.
> 
> Ok, I had a glance. The relevant file is e2fsprogs-1.34/misc/badblocks.c
> 
> Blocks already marked bad on the filesystem are skipped while testing.
> 
> The read test (function test_ro) does a simple block read, and some
> comparison but I don't get quite what for.
> 
> The read/write does a write, a read, and a cmp, and adds any unwritable
> and blocks and any failing comparison to the bad blocks list.
> 
> The non-destructive test reads blocks into memory first. In answer to
> the main question: any blocks with read errors are simply added to the
> bad blocks list and henceforth skipped...
> After that it's a write, cmp, and restore. Any blocks producing more
> errors are added to the bad list. As a btw, any catastrophic program
> termination skips the restore stage and the test is no longer
> non-detructive.
> 
> The whole thing relies on the kernel to produce suitable error results.
> When I tested a known bad disk on which badblocks didn't find any bad
> blocks although there were plenty of I/O errors IIRC, it seems the
> kernel didn't report this back to the app properly.
> 
> Volker
> 
> -- 
> Volker Kuhlmann			is possibly list0570 with the domain in header
> http://volker.dnsalias.net/		Please do not CC list postings to me.
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Smartmontools-support mailing list
> Sma...@li...
> https://lists.sourceforge.net/lists/listinfo/smartmontools-support
> 
>

Re: [smartmontools-support]Currently unreadable sectors problem??

From: Volker K. <lis...@pa...> - 2004-11-19 09:47:48

> When badblocks calls open() does it use the O_DIRECT flag?

Not with open(), but if O_DIRECT is defined, it calls fcntl() and
manipulates the O_DIRECT bit in some flag field.

Volker

-- 
Volker Kuhlmann			is possibly list0570 with the domain in header
http://volker.dnsalias.net/		Please do not CC list postings to me.

Re: [smartmontools-support]Currently unreadable sectors problem??

From: Bruce A. <ba...@gr...> - 2004-11-19 10:05:08

> > When badblocks calls open() does it use the O_DIRECT flag?
> 
> Not with open(), but if O_DIRECT is defined, it calls fcntl() and
> manipulates the O_DIRECT bit in some flag field.

OK that's the right behavior.  Should make it bypass buffering in the
kernel block layer.