Thread: [smartmontools-support]LibPATA code issues / 2.6.16 (previously, 2.6.15.x) | S.M.A.R.T. Monitoring Tools

smartmontools-support

[smartmontools-support]LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Justin P. <jp...@lu...> - 2006-04-21 19:14:51

Yet a new problem, under 2.6.16, when I fill up the disk, smartmontools 
reports this:

Apr 21 14:24:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
(pending) sectors
Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
(pending) sectors
Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Offline uncorrectable
sectors

What made it error under 2.6.16?

$ time dd if=/dev/zero of=file.out
dd: writing to `file.out': No space left on device
781118873+0 records in
781118872+0 records out
399932862464 bytes (400 GB) copied, 8873.06 seconds, 45.1 MB/s

real    147m53.092s
user    8m1.395s
sys     42m4.500s

$

Under 2.6.15.x, I did not see this behavior, is this going bad, or?

Thanks,

Justin.

[smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Jeff G. <jg...@po...> - 2006-04-21 22:47:13

Linus Torvalds wrote:
> 
> On Fri, 21 Apr 2006, Jeff Garzik wrote:
>> You can force the disk to replace the bad sectors by doing a disk-level write:
>>
>> 	dd if=/dev/zero of=/dev/sda1 bs=4k
> 
> NOTE! Obviously don't do this before you've backed up the disk.  Depending 
> on the filesystem, you might just have overwritten something important, or 
> just your pr0n collection ;)
> 
> Jeff, please be a little more careful about telling people commands like 
> that. Some people might cut-and-paste the command without realizing what 
> it's doing as a way to "fix" their problem.

Agreed, though the original poster had already done a 400GB dd from 
/dev/zero...

	Jeff

[smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Linus T. <tor...@os...> - 2006-04-22 00:06:10

On Fri, 21 Apr 2006, Jeff Garzik wrote:

> 
> Agreed, though the original poster had already done a 400GB dd from
> /dev/zero...

Yes, but to a _file_ on the partition (ie he didn't overwrite any existign 
data, just the empty parts of a filesystem).

I realize that it's not enough for the "re-allocate on write" behaviour, 
and for that you really _do_ need to re-write the whole disk to get all 
the broken blocks reallocated, but my argument was just that we should 
make sure to _tell_ people when they are overwriting all their old data ;)

		Linus

Re: [smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Leon W. <le...@ma...> - 2006-05-06 15:09:22

Hi all,

On Fri, 2006-04-21 at 17:05 -0700, Linus Torvalds wrote:
> 
> On Fri, 21 Apr 2006, Jeff Garzik wrote:
> 
> > 
> > Agreed, though the original poster had already done a 400GB dd from
> > /dev/zero...
> 
> Yes, but to a _file_ on the partition (ie he didn't overwrite any existign 
> data, just the empty parts of a filesystem).
> 
> I realize that it's not enough for the "re-allocate on write" behaviour, 
> and for that you really _do_ need to re-write the whole disk to get all 
> the broken blocks reallocated, but my argument was just that we should 
> make sure to _tell_ people when they are overwriting all their old data ;)
> 
I did not realize this before, and asked badblocks maintainer Theodore
if badblocks /some/file was supported (the man page says no); but of
course any filesystem can decide to re-allocate blocks for a file.

However, for large files where parts may be bad sectors, I am still
searching for a way to read, then re-write every physical sector
occupied by the file. 

With the purpose to remap the bad sectors inside large MPEG files (where
I would rather have a few zeroed holes than a read error in them).

Anyone know such tooling exists? I suspect it has to use filesystem
specific IOCTL's to query for the blocks involved.

Regards,

Leon

Re: [smartmontools-support] LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Justin P. <jp...@lu...> - 2006-06-11 11:13:46

[4597362.011000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4597362.011000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4597362.011000] ata3: error=0x04 { DriveStatusError }

Now under 2.6.16.20. (was doing an rsync from 1 drive (IDE) -> to this 
SATA) drive.

The SATA drive AFAIK does not have any issues, no bad sectors/etc, still 
the same drive as before, but this is the new one from the previous RMA.

Just FYI.


On Fri, 21 Apr 2006, Linus Torvalds wrote:

>
>
> On Fri, 21 Apr 2006, Jeff Garzik wrote:
>
>>
>> Agreed, though the original poster had already done a 400GB dd from
>> /dev/zero...
>
> Yes, but to a _file_ on the partition (ie he didn't overwrite any existign
> data, just the empty parts of a filesystem).
>
> I realize that it's not enough for the "re-allocate on write" behaviour,
> and for that you really _do_ need to re-write the whole disk to get all
> the broken blocks reallocated, but my argument was just that we should
> make sure to _tell_ people when they are overwriting all their old data ;)
>
> 		Linus
>

Re: [smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Ingo O. <ioe...@ra...> - 2006-05-07 12:48:19

On Saturday, 6. May 2006 17:09, Leon Woestenberg wrote:
> However, for large files where parts may be bad sectors, I am still
> searching for a way to read, then re-write every physical sector
> occupied by the file. 
> 
> With the purpose to remap the bad sectors inside large MPEG files (where
> I would rather have a few zeroed holes than a read error in them).

This much easier to solve in the player software:
do {
	ret = read(fd, buffer, size)
	if (ret > 0) {
		playbuffer(buffer, ret)
	} else if (ret < 0) {
		switch(errno) {
		case EIO:
			playbuffer(allzeroesbuffer, size);
			/* skip over this frame because of disk problems */
			lseek(fd, size, SEEK_CUR);
			/* TODO: Handle return or lseek() here */
		}
	}
} while(ret != 0)

> Anyone know such tooling exists? I suspect it has to use filesystem
> specific IOCTL's to query for the blocks involved.

The (somewhat) portable ioctl() FIBMAP would suffice. 
That way you find out what blocks are this file is mapped to,
and could add some of these blocks to the badblock list of e2fsck.

Regards

Ingo Oeser

Re: [smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Leon W. <le...@ma...> - 2006-05-10 20:42:36

Hello Ingo,

On Sun, 2006-05-07 at 14:44 +0200, Ingo Oeser wrote:
> On Saturday, 6. May 2006 17:09, Leon Woestenberg wrote:
> > However, for large files where parts may be bad sectors, I am still
> > searching for a way to read, then re-write every physical sector
> > occupied by the file. 
> > 
> > With the purpose to remap the bad sectors inside large MPEG files (where
> > I would rather have a few zeroed holes than a read error in them).
> 
> This much easier to solve in the player software:
> do {
> 	ret = read(fd, buffer, size)
> 	if (ret > 0) {
> 		playbuffer(buffer, ret)
> 	} else if (ret < 0) {
> 		switch(errno) {
> 		case EIO:
>
I haven't done any tests yet, but I know a drive can be latent in
returning a read error, and then there might be all kinds of retries
involved in the kernel I/O block layer that add to the latency.

For high bitrate stuff, I would like to pre-empt out such errors in
a background process.


> The (somewhat) portable ioctl() FIBMAP would suffice. 
> That way you find out what blocks are this file is mapped to,
> and could add some of these blocks to the badblock list of e2fsck.
> 
Except that I am not using an ext2 filesystem per se.

Regards,

Leon.

Re: [smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Ingo O. <ioe...@ra...> - 2006-05-11 19:53:26

Hi Leon,

On Wednesday, 10. May 2006 22:42, Leon Woestenberg wrote:
> I haven't done any tests yet, but I know a drive can be latent in
> returning a read error, and then there might be all kinds of retries
> involved in the kernel I/O block layer that add to the latency.

That's correct.

> For high bitrate stuff, I would like to pre-empt out such errors in
> a background process.

Interesting idea. Isn't S.M.A.R.T. supposed to implement this via
their (long) test mechanisms? These should suspend the test on
any disk activity and resume on disk idle.

Doing this in software would be quite strange. The retries you see
are per disk, so your playback will be interrupted anyway, if any such 
retry happens. Just time the maximum retry time and choose your 
filesystem read buffer large enough.

> > The (somewhat) portable ioctl() FIBMAP would suffice. 
> > That way you find out what blocks are this file is mapped to,
> > and could add some of these blocks to the badblock list of e2fsck.
> > 
> Except that I am not using an ext2 filesystem per se.

That's true. But usually filesystems have such mechanisms.

Even better would be using device mapper tables and the zero
target (which returns just zeroes). But this would not work with a 
mounted filesystem, since there is no system wide 
"break pagecache to block mapping"-API, AFAIK.

BTW: Does your player support large ranges of zeroes without any error?
	I remember having to set GOP_START-marks within those ranges.

And all that stuff just helps for hard disk recorders or maybe recordings
on rewritable media. It doesn't help with read-only media, which your player
has to handle anyway. Big buffer helps both cases.

Regards

Ingo Oeser

Re: [smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Leon W. <le...@ma...> - 2006-05-11 21:28:14

Hello Ingo,

On Thu, 2006-05-11 at 21:50 +0200, Ingo Oeser wrote:
> Hi Leon,
> 
> On Wednesday, 10. May 2006 22:42, Leon Woestenberg wrote:
> > I haven't done any tests yet, but I know a drive can be latent in
> > returning a read error, and then there might be all kinds of retries
> > involved in the kernel I/O block layer that add to the latency.
> 
> That's correct.
>  
> > For high bitrate stuff, I would like to pre-empt out such errors in
> > a background process.
> 
> Interesting idea. Isn't S.M.A.R.T. supposed to implement this via
> their (long) test mechanisms? These should suspend the test on
> any disk activity and resume on disk idle.
>
SMART will detect read errors, but will not decide to happily overwrite
your data.

> Doing this in software would be quite strange. The retries you see
>
There is no other way to have bad blocks replaced. You need to
initialize a write to such block from software.

> are per disk, so your playback will be interrupted anyway, if any such 
>
Not if I get rid of bad blocks *before* playing the files. The idea
is that all files are regularly checked for bad blocks, and bad blocks
are replaced.

> retry happens. Just time the maximum retry time and choose your 
> filesystem read buffer large enough.
>
Yes, but large enough is hard to measure, bad blocks often come
together.

> And all that stuff just helps for hard disk recorders or maybe recordings
> on rewritable media. It doesn't help with read-only media, which your player
> has to handle anyway. Big buffer helps both cases.
> 
Our application is very specific; it writes all the time. Most recorded
media is never played out, but if it is, it must be available. It uses
RAID-5.

Since a few weeks, Linux software RAID-5 is supposed to do a bad blocks
remapping (it can do this lossless, because the data is available
redundantly).

I suggested this last year, but unfortunately could not spend time on
working on the Linux md system. Glad someone picked it up!

Regards,

Leon Woestenberg.

> 
> Regards
> 
> Ingo Oeser

Re: [smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Ingo O. <ioe...@ra...> - 2006-05-12 16:34:28

Hi Leon,

On Thursday, 11. May 2006 23:28, Leon Woestenberg wrote:
> SMART will detect read errors, but will not decide to happily overwrite
> your data.

Ok, that's true, if you are out of sectors for transparent sector remapping.

> Our application is very specific; it writes all the time. Most recorded
> media is never played out, but if it is, it must be available. It uses
> RAID-5.
> 
> Since a few weeks, Linux software RAID-5 is supposed to do a bad blocks
> remapping (it can do this lossless, because the data is available
> redundantly).
>
> I suggested this last year, but unfortunately could not spend time on
> working on the Linux md system. Glad someone picked it up!

Ok, these details of your requirements and ideas are completly new to me.

Now I understand your issue completly. Did you mention this before?
I certainly didn't read that.

And the problem you see is, that the faulty blocks are not remapped 
properly? Is it that the bad block not rebuild properly? 
Is the drive marked faulty instead of remapping?

I understand what you are trying to do now, but not what 
the actual problem is.

Thanks for your patience. I know I can be quite slow sometimes :-/

Regards

Ingo Oeser

Re: [smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Leon W. <le...@ma...> - 2006-05-13 23:40:54

Hi Ingo,

On Fri, 2006-05-12 at 18:31 +0200, Ingo Oeser wrote:
> Hi Leon,
> 
> On Thursday, 11. May 2006 23:28, Leon Woestenberg wrote:
> > SMART will detect read errors, but will not decide to happily overwrite
> > your data.
> 
> Ok, that's true, if you are out of sectors for transparent sector remapping.
>
No, *even* if there are plenty sectors available to remap to, the drive
will not remap a bad sector unless you (as the user) *overwrite* the bad
sector.

AFAIK, no drive will remap a bad sector on *read* actions from the
host. 

The drive will *mark* a bad sector for its own purpose, maybe, and then
later when the user writes to it, remap it.

(Although theoretically, due to the error-correcting codes, it might
read the data correctly, and driven by the error-detecting codes might
decide to remap the sector -- However, as said, I do not know of a drive
that does just that.)

> And the problem you see is, that the faulty blocks are not remapped 
> properly? Is it that the bad block not rebuild properly? 
> I understand what you are trying to do now, but not what 
> the actual problem is.
> 
I want to pro-actively have the drive remap the bad sectors, *before*
they are read by a time-critical application.

So, I want to implement a periodic scan [*] that gives me the bad
blocks, which I then overwrite with zero blocks.

[*] Either through smartd, smartctl (which with some drives, give
reliable bad sector locations), or otherwise by reading all files
on disk in a slow rate, scanning for read errors.

> Thanks for your patience. I know I can be quite slow sometimes :-/
> 
No, indeed I did not explain the application, so no apologies
excepted :-)

Regards,

Leon.

Re: [smartmontools-support]LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Justin P. <jp...@lu...> - 2006-04-22 16:39:49

Thanks for all the responses,

RMA'd the drive, will test replacement in same manner once it arrives- in 
1-2 weeks.

Justin.

On Fri, 21 Apr 2006, Justin Piszcz wrote:

> Yet a new problem, under 2.6.16, when I fill up the disk, smartmontools 
> reports this:
>
> Apr 21 14:24:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
> (pending) sectors
> Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
> (pending) sectors
> Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Offline uncorrectable
> sectors
>
> What made it error under 2.6.16?
>
> $ time dd if=/dev/zero of=file.out
> dd: writing to `file.out': No space left on device
> 781118873+0 records in
> 781118872+0 records out
> 399932862464 bytes (400 GB) copied, 8873.06 seconds, 45.1 MB/s
>
> real    147m53.092s
> user    8m1.395s
> sys     42m4.500s
>
> $
>
> Under 2.6.15.x, I did not see this behavior, is this going bad, or?
>
> Thanks,
>
> Justin.
>
>
>
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Smartmontools-support mailing list
> Sma...@li...
> https://lists.sourceforge.net/lists/listinfo/smartmontools-support
>

[smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Jeff G. <jg...@po...> - 2006-04-21 19:20:31

Justin Piszcz wrote:
> Yet a new problem, under 2.6.16, when I fill up the disk, smartmontools 
> reports this:
> 
> Apr 21 14:24:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
> (pending) sectors
> Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
> (pending) sectors
> Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Offline uncorrectable
> sectors
> 
> What made it error under 2.6.16?
> 
> $ time dd if=/dev/zero of=file.out
> dd: writing to `file.out': No space left on device
> 781118873+0 records in
> 781118872+0 records out
> 399932862464 bytes (400 GB) copied, 8873.06 seconds, 45.1 MB/s
> 
> real    147m53.092s
> user    8m1.395s
> sys     42m4.500s
> 
> $
> 
> Under 2.6.15.x, I did not see this behavior, is this going bad, or?

That's a disk-level problem.  You've got bad sectors.

You can force the disk to replace the bad sectors by doing a disk-level 
write:

	dd if=/dev/zero of=/dev/sda1 bs=4k

and then test the disk with

	smartctl -d ata -t long /dev/sda

If sectors continue to die, the disk is toast.

	Jeff

[smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)

From: Linus T. <tor...@os...> - 2006-04-21 19:29:22

On Fri, 21 Apr 2006, Jeff Garzik wrote:
> 
> You can force the disk to replace the bad sectors by doing a disk-level write:
> 
> 	dd if=/dev/zero of=/dev/sda1 bs=4k

NOTE! Obviously don't do this before you've backed up the disk.  Depending 
on the filesystem, you might just have overwritten something important, or 
just your pr0n collection ;)

Jeff, please be a little more careful about telling people commands like 
that. Some people might cut-and-paste the command without realizing what 
it's doing as a way to "fix" their problem.

			Linus