From: Justin P. <jp...@lu...> - 2006-04-21 19:14:51
|
Yet a new problem, under 2.6.16, when I fill up the disk, smartmontools reports this: Apr 21 14:24:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable (pending) sectors Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable (pending) sectors Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Offline uncorrectable sectors What made it error under 2.6.16? $ time dd if=/dev/zero of=file.out dd: writing to `file.out': No space left on device 781118873+0 records in 781118872+0 records out 399932862464 bytes (400 GB) copied, 8873.06 seconds, 45.1 MB/s real 147m53.092s user 8m1.395s sys 42m4.500s $ Under 2.6.15.x, I did not see this behavior, is this going bad, or? Thanks, Justin. |
From: Jeff G. <jg...@po...> - 2006-04-21 22:47:13
|
Linus Torvalds wrote: > > On Fri, 21 Apr 2006, Jeff Garzik wrote: >> You can force the disk to replace the bad sectors by doing a disk-level write: >> >> dd if=/dev/zero of=/dev/sda1 bs=4k > > NOTE! Obviously don't do this before you've backed up the disk. Depending > on the filesystem, you might just have overwritten something important, or > just your pr0n collection ;) > > Jeff, please be a little more careful about telling people commands like > that. Some people might cut-and-paste the command without realizing what > it's doing as a way to "fix" their problem. Agreed, though the original poster had already done a 400GB dd from /dev/zero... Jeff |
From: Linus T. <tor...@os...> - 2006-04-22 00:06:10
|
On Fri, 21 Apr 2006, Jeff Garzik wrote: > > Agreed, though the original poster had already done a 400GB dd from > /dev/zero... Yes, but to a _file_ on the partition (ie he didn't overwrite any existign data, just the empty parts of a filesystem). I realize that it's not enough for the "re-allocate on write" behaviour, and for that you really _do_ need to re-write the whole disk to get all the broken blocks reallocated, but my argument was just that we should make sure to _tell_ people when they are overwriting all their old data ;) Linus |
From: Leon W. <le...@ma...> - 2006-05-06 15:09:22
|
Hi all, On Fri, 2006-04-21 at 17:05 -0700, Linus Torvalds wrote: > > On Fri, 21 Apr 2006, Jeff Garzik wrote: > > > > > Agreed, though the original poster had already done a 400GB dd from > > /dev/zero... > > Yes, but to a _file_ on the partition (ie he didn't overwrite any existign > data, just the empty parts of a filesystem). > > I realize that it's not enough for the "re-allocate on write" behaviour, > and for that you really _do_ need to re-write the whole disk to get all > the broken blocks reallocated, but my argument was just that we should > make sure to _tell_ people when they are overwriting all their old data ;) > I did not realize this before, and asked badblocks maintainer Theodore if badblocks /some/file was supported (the man page says no); but of course any filesystem can decide to re-allocate blocks for a file. However, for large files where parts may be bad sectors, I am still searching for a way to read, then re-write every physical sector occupied by the file. With the purpose to remap the bad sectors inside large MPEG files (where I would rather have a few zeroed holes than a read error in them). Anyone know such tooling exists? I suspect it has to use filesystem specific IOCTL's to query for the blocks involved. Regards, Leon |
From: Justin P. <jp...@lu...> - 2006-06-11 11:13:46
|
[4597362.011000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4597362.011000] ata3: status=0x51 { DriveReady SeekComplete Error } [4597362.011000] ata3: error=0x04 { DriveStatusError } Now under 2.6.16.20. (was doing an rsync from 1 drive (IDE) -> to this SATA) drive. The SATA drive AFAIK does not have any issues, no bad sectors/etc, still the same drive as before, but this is the new one from the previous RMA. Just FYI. On Fri, 21 Apr 2006, Linus Torvalds wrote: > > > On Fri, 21 Apr 2006, Jeff Garzik wrote: > >> >> Agreed, though the original poster had already done a 400GB dd from >> /dev/zero... > > Yes, but to a _file_ on the partition (ie he didn't overwrite any existign > data, just the empty parts of a filesystem). > > I realize that it's not enough for the "re-allocate on write" behaviour, > and for that you really _do_ need to re-write the whole disk to get all > the broken blocks reallocated, but my argument was just that we should > make sure to _tell_ people when they are overwriting all their old data ;) > > Linus > |
From: Ingo O. <ioe...@ra...> - 2006-05-07 12:48:19
|
On Saturday, 6. May 2006 17:09, Leon Woestenberg wrote: > However, for large files where parts may be bad sectors, I am still > searching for a way to read, then re-write every physical sector > occupied by the file. > > With the purpose to remap the bad sectors inside large MPEG files (where > I would rather have a few zeroed holes than a read error in them). This much easier to solve in the player software: do { ret = read(fd, buffer, size) if (ret > 0) { playbuffer(buffer, ret) } else if (ret < 0) { switch(errno) { case EIO: playbuffer(allzeroesbuffer, size); /* skip over this frame because of disk problems */ lseek(fd, size, SEEK_CUR); /* TODO: Handle return or lseek() here */ } } } while(ret != 0) > Anyone know such tooling exists? I suspect it has to use filesystem > specific IOCTL's to query for the blocks involved. The (somewhat) portable ioctl() FIBMAP would suffice. That way you find out what blocks are this file is mapped to, and could add some of these blocks to the badblock list of e2fsck. Regards Ingo Oeser |
From: Leon W. <le...@ma...> - 2006-05-10 20:42:36
|
Hello Ingo, On Sun, 2006-05-07 at 14:44 +0200, Ingo Oeser wrote: > On Saturday, 6. May 2006 17:09, Leon Woestenberg wrote: > > However, for large files where parts may be bad sectors, I am still > > searching for a way to read, then re-write every physical sector > > occupied by the file. > > > > With the purpose to remap the bad sectors inside large MPEG files (where > > I would rather have a few zeroed holes than a read error in them). > > This much easier to solve in the player software: > do { > ret = read(fd, buffer, size) > if (ret > 0) { > playbuffer(buffer, ret) > } else if (ret < 0) { > switch(errno) { > case EIO: > I haven't done any tests yet, but I know a drive can be latent in returning a read error, and then there might be all kinds of retries involved in the kernel I/O block layer that add to the latency. For high bitrate stuff, I would like to pre-empt out such errors in a background process. > The (somewhat) portable ioctl() FIBMAP would suffice. > That way you find out what blocks are this file is mapped to, > and could add some of these blocks to the badblock list of e2fsck. > Except that I am not using an ext2 filesystem per se. Regards, Leon. |
From: Ingo O. <ioe...@ra...> - 2006-05-11 19:53:26
|
Hi Leon, On Wednesday, 10. May 2006 22:42, Leon Woestenberg wrote: > I haven't done any tests yet, but I know a drive can be latent in > returning a read error, and then there might be all kinds of retries > involved in the kernel I/O block layer that add to the latency. That's correct. > For high bitrate stuff, I would like to pre-empt out such errors in > a background process. Interesting idea. Isn't S.M.A.R.T. supposed to implement this via their (long) test mechanisms? These should suspend the test on any disk activity and resume on disk idle. Doing this in software would be quite strange. The retries you see are per disk, so your playback will be interrupted anyway, if any such retry happens. Just time the maximum retry time and choose your filesystem read buffer large enough. > > The (somewhat) portable ioctl() FIBMAP would suffice. > > That way you find out what blocks are this file is mapped to, > > and could add some of these blocks to the badblock list of e2fsck. > > > Except that I am not using an ext2 filesystem per se. That's true. But usually filesystems have such mechanisms. Even better would be using device mapper tables and the zero target (which returns just zeroes). But this would not work with a mounted filesystem, since there is no system wide "break pagecache to block mapping"-API, AFAIK. BTW: Does your player support large ranges of zeroes without any error? I remember having to set GOP_START-marks within those ranges. And all that stuff just helps for hard disk recorders or maybe recordings on rewritable media. It doesn't help with read-only media, which your player has to handle anyway. Big buffer helps both cases. Regards Ingo Oeser |
From: Leon W. <le...@ma...> - 2006-05-11 21:28:14
|
Hello Ingo, On Thu, 2006-05-11 at 21:50 +0200, Ingo Oeser wrote: > Hi Leon, > > On Wednesday, 10. May 2006 22:42, Leon Woestenberg wrote: > > I haven't done any tests yet, but I know a drive can be latent in > > returning a read error, and then there might be all kinds of retries > > involved in the kernel I/O block layer that add to the latency. > > That's correct. > > > For high bitrate stuff, I would like to pre-empt out such errors in > > a background process. > > Interesting idea. Isn't S.M.A.R.T. supposed to implement this via > their (long) test mechanisms? These should suspend the test on > any disk activity and resume on disk idle. > SMART will detect read errors, but will not decide to happily overwrite your data. > Doing this in software would be quite strange. The retries you see > There is no other way to have bad blocks replaced. You need to initialize a write to such block from software. > are per disk, so your playback will be interrupted anyway, if any such > Not if I get rid of bad blocks *before* playing the files. The idea is that all files are regularly checked for bad blocks, and bad blocks are replaced. > retry happens. Just time the maximum retry time and choose your > filesystem read buffer large enough. > Yes, but large enough is hard to measure, bad blocks often come together. > And all that stuff just helps for hard disk recorders or maybe recordings > on rewritable media. It doesn't help with read-only media, which your player > has to handle anyway. Big buffer helps both cases. > Our application is very specific; it writes all the time. Most recorded media is never played out, but if it is, it must be available. It uses RAID-5. Since a few weeks, Linux software RAID-5 is supposed to do a bad blocks remapping (it can do this lossless, because the data is available redundantly). I suggested this last year, but unfortunately could not spend time on working on the Linux md system. Glad someone picked it up! Regards, Leon Woestenberg. > > Regards > > Ingo Oeser |
From: Ingo O. <ioe...@ra...> - 2006-05-12 16:34:28
|
Hi Leon, On Thursday, 11. May 2006 23:28, Leon Woestenberg wrote: > SMART will detect read errors, but will not decide to happily overwrite > your data. Ok, that's true, if you are out of sectors for transparent sector remapping. > Our application is very specific; it writes all the time. Most recorded > media is never played out, but if it is, it must be available. It uses > RAID-5. > > Since a few weeks, Linux software RAID-5 is supposed to do a bad blocks > remapping (it can do this lossless, because the data is available > redundantly). > > I suggested this last year, but unfortunately could not spend time on > working on the Linux md system. Glad someone picked it up! Ok, these details of your requirements and ideas are completly new to me. Now I understand your issue completly. Did you mention this before? I certainly didn't read that. And the problem you see is, that the faulty blocks are not remapped properly? Is it that the bad block not rebuild properly? Is the drive marked faulty instead of remapping? I understand what you are trying to do now, but not what the actual problem is. Thanks for your patience. I know I can be quite slow sometimes :-/ Regards Ingo Oeser |
From: Leon W. <le...@ma...> - 2006-05-13 23:40:54
|
Hi Ingo, On Fri, 2006-05-12 at 18:31 +0200, Ingo Oeser wrote: > Hi Leon, > > On Thursday, 11. May 2006 23:28, Leon Woestenberg wrote: > > SMART will detect read errors, but will not decide to happily overwrite > > your data. > > Ok, that's true, if you are out of sectors for transparent sector remapping. > No, *even* if there are plenty sectors available to remap to, the drive will not remap a bad sector unless you (as the user) *overwrite* the bad sector. AFAIK, no drive will remap a bad sector on *read* actions from the host. The drive will *mark* a bad sector for its own purpose, maybe, and then later when the user writes to it, remap it. (Although theoretically, due to the error-correcting codes, it might read the data correctly, and driven by the error-detecting codes might decide to remap the sector -- However, as said, I do not know of a drive that does just that.) > And the problem you see is, that the faulty blocks are not remapped > properly? Is it that the bad block not rebuild properly? > I understand what you are trying to do now, but not what > the actual problem is. > I want to pro-actively have the drive remap the bad sectors, *before* they are read by a time-critical application. So, I want to implement a periodic scan [*] that gives me the bad blocks, which I then overwrite with zero blocks. [*] Either through smartd, smartctl (which with some drives, give reliable bad sector locations), or otherwise by reading all files on disk in a slow rate, scanning for read errors. > Thanks for your patience. I know I can be quite slow sometimes :-/ > No, indeed I did not explain the application, so no apologies excepted :-) Regards, Leon. |
From: Justin P. <jp...@lu...> - 2006-04-22 16:39:49
|
Thanks for all the responses, RMA'd the drive, will test replacement in same manner once it arrives- in 1-2 weeks. Justin. On Fri, 21 Apr 2006, Justin Piszcz wrote: > Yet a new problem, under 2.6.16, when I fill up the disk, smartmontools > reports this: > > Apr 21 14:24:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable > (pending) sectors > Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable > (pending) sectors > Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Offline uncorrectable > sectors > > What made it error under 2.6.16? > > $ time dd if=/dev/zero of=file.out > dd: writing to `file.out': No space left on device > 781118873+0 records in > 781118872+0 records out > 399932862464 bytes (400 GB) copied, 8873.06 seconds, 45.1 MB/s > > real 147m53.092s > user 8m1.395s > sys 42m4.500s > > $ > > Under 2.6.15.x, I did not see this behavior, is this going bad, or? > > Thanks, > > Justin. > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |
From: Jeff G. <jg...@po...> - 2006-04-21 19:20:31
|
Justin Piszcz wrote: > Yet a new problem, under 2.6.16, when I fill up the disk, smartmontools > reports this: > > Apr 21 14:24:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable > (pending) sectors > Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable > (pending) sectors > Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Offline uncorrectable > sectors > > What made it error under 2.6.16? > > $ time dd if=/dev/zero of=file.out > dd: writing to `file.out': No space left on device > 781118873+0 records in > 781118872+0 records out > 399932862464 bytes (400 GB) copied, 8873.06 seconds, 45.1 MB/s > > real 147m53.092s > user 8m1.395s > sys 42m4.500s > > $ > > Under 2.6.15.x, I did not see this behavior, is this going bad, or? That's a disk-level problem. You've got bad sectors. You can force the disk to replace the bad sectors by doing a disk-level write: dd if=/dev/zero of=/dev/sda1 bs=4k and then test the disk with smartctl -d ata -t long /dev/sda If sectors continue to die, the disk is toast. Jeff |
From: Linus T. <tor...@os...> - 2006-04-21 19:29:22
|
On Fri, 21 Apr 2006, Jeff Garzik wrote: > > You can force the disk to replace the bad sectors by doing a disk-level write: > > dd if=/dev/zero of=/dev/sda1 bs=4k NOTE! Obviously don't do this before you've backed up the disk. Depending on the filesystem, you might just have overwritten something important, or just your pr0n collection ;) Jeff, please be a little more careful about telling people commands like that. Some people might cut-and-paste the command without realizing what it's doing as a way to "fix" their problem. Linus |