From: Kai S. <mai...@co...> - 2008-10-28 19:31:38
|
I have two Samsung HD502IJ disks in a software RAID 1 array on CentOS 5. There are two mirrored partitions (/dev/md0, /dev/md1) on each, a small one for the xen hypervisor system and a large one with the remainder of the disk that is managed by LVM to get a bunch of smaller partitions for xen guests. Both disks show one Offline_Uncorrectable error at different offsets. Earlier this month I started to get errors from md in the logs about attempts for reallocation which seemed to fail for the first tries and then succeed (I can make the logs available if somebody thinks it would help). Since then it happened three or four more times. RAID has been fine all the time and shows as clean. I checked with smartctl -a -d ata, output can be found here: http://winware.org/smart/raid-error.sda.txt http://winware.org/smart/raid-error.sdb.txt The short offline tests then found the Offline_Uncorrectable errors and I also started getting two pending sectors in sda in the smartd messages. However, these disappeared after some time and only the two Offline_Uncorrectable errors remain. It looks to me that I have a bunch of closely related bad blocks (after offset 208645048) on sda that keep upping my Raw_Read_Error_Rate each time a reading attempt is made on them. (which happens somewhat frequently because of backups each day). (Forget about the error on sdb for now.) How can I trigger a reallocation so that these are not getting used anymore? If I understand correctly I cannot follow the bad block how-to exactly, or only partly? I read this thread http://sourceforge.net/mailarchive/message.php? msg_id=Pine.LNX.4.64.0806270653350.5844%40gc.phys.uwm.edu which suggests I could copy over good data from the other disk, but it's not clear to me at all how I find out where exactly the problem is and how I copy the correct data over. For identifying I followed the badblock how-to section concerning LVM and I think I have identified the correct bad block number for that LVM partition. However, I can't prove that there is a problem with the two methods from the how-to. 1. Using dd to read from that block and around it is always fine. This might be due to the RAID? As I'm reading from the LVM device on the RAID partition I might always get readable data. Would I need to destroy the RAID before I can get any errors? But if I remove RAID I also break the LVM that sits on it. How would I then access a specific LVM partition on a specific disk? 2. And when I try debugfs on it I always get debugfs: icheck 1000 Block Inode number 1000 <block not found> even on a low number like 1000. Thanks for any hints. Kai |
From: Christian F. <Chr...@t-...> - 2008-10-30 12:09:14
|
Kai Schaetzl wrote: > I have two Samsung HD502IJ disks in a software RAID 1 array on CentOS > 5. > There are two mirrored partitions (/dev/md0, /dev/md1) on each, a > small one for the xen hypervisor system and a large one with the > remainder of the disk that is managed by LVM to get a bunch of smaller > partitions for xen guests. > Both disks show one Offline_Uncorrectable error at different offsets. > ... > The short offline tests then found the Offline_Uncorrectable errors > and I also started getting two pending sectors in sda in the smartd > messages. > However, these disappeared after some time and only the two > Offline_Uncorrectable errors remain. > ... > I read this thread > http://sourceforge.net/mailarchive/message.php? > msg_id=Pine.LNX.4.64.0806270653350.5844%40gc.phys.uwm.edu > which suggests I could copy over good data from the other disk, but > it's not clear to me at all how I find out where exactly the problem > is and how I copy the correct data over. > AFIAK, the Linux software RAID does this for you if it encounters a bad block on one of the disks: http://lxr.linux.no/linux+v2.6.27/drivers/md/raid1.c#L1621 So a raw read through the RAID driver may force the reallocation - with a probability of 50% :-) (e.g. 'ddrescue -v /dev/md0 /dev/null read.log') Note: Some older Samsung disks (at least SP1614C from P80 series) do not increment Reallocated_Sector_Ct and do not reset Offline_Uncorrectable on bad sector reallocation. I don't know whether this is the case for T- or F1-Series disks. Cheers, Christian |
From: Bruce A. <ba...@gr...> - 2008-10-30 15:13:18
|
> AFIAK, the Linux software RAID does this for you if it encounters a bad > block on one of the disks: > http://lxr.linux.no/linux+v2.6.27/drivers/md/raid1.c#L1621 That's great!! When was this feature added to Linux software RAID? Does it work for all redundant RAID levels like RAID-5 or RAID-6, or only for mirroring? Cheers, Bruce |
From: David G. <da...@dg...> - 2008-10-30 16:00:01
|
Bruce Allen wrote: >> AFIAK, the Linux software RAID does this for you if it encounters a bad >> block on one of the disks: >> http://lxr.linux.no/linux+v2.6.27/drivers/md/raid1.c#L1621 > > That's great!! When was this feature added to Linux software RAID? Does > it work for all redundant RAID levels like RAID-5 or RAID-6, or only for > mirroring? I think you guys want: http://linux-raid.osdl.org/index.php/RAID_Administration Looking at: echo check > /sys/block/mdX/md/sync_action echo repair > /sys/block/mdX/md/sync_action David PS Bruce - did you get a recent email re Samsung? -- "Don't worry, you'll be fine; I saw it work in a cartoon once..." |
From: Bruce A. <ba...@gr...> - 2008-10-30 16:11:15
|
>>> AFIAK, the Linux software RAID does this for you if it encounters a bad >>> block on one of the disks: >>> http://lxr.linux.no/linux+v2.6.27/drivers/md/raid1.c#L1621 >> >> That's great!! When was this feature added to Linux software RAID? Does >> it work for all redundant RAID levels like RAID-5 or RAID-6, or only for >> mirroring? > > I think you guys want: > http://linux-raid.osdl.org/index.php/RAID_Administration > > Looking at: > > echo check > /sys/block/mdX/md/sync_action > echo repair > /sys/block/mdX/md/sync_action Hi David, I am not sure if these are directly relevant. FOr example the docs say 'check on a raid 5 with 1 missing device will not do anything. After all there is nothing it can do.' But this is NOT true if there is a bad block on a drive. Then the RAID driver *could* determine the correct data from the other drives. > PS Bruce - did you get a recent email re Samsung? Not sure -- still catching up on my email. Bruce |
From: David G. <da...@dg...> - 2008-10-30 16:34:03
|
Bruce Allen wrote: >>>> AFIAK, the Linux software RAID does this for you if it encounters a bad >>>> block on one of the disks: >>>> http://lxr.linux.no/linux+v2.6.27/drivers/md/raid1.c#L1621 >>> >>> That's great!! When was this feature added to Linux software RAID? Oh : pre- 2.6.17 >>> Does >>> it work for all redundant RAID levels like RAID-5 or RAID-6, or only for >>> mirroring? >> >> I think you guys want: >> http://linux-raid.osdl.org/index.php/RAID_Administration >> >> Looking at: >> >> echo check > /sys/block/mdX/md/sync_action >> echo repair > /sys/block/mdX/md/sync_action > > Hi David, > > I am not sure if these are directly relevant. FOr example the docs say > 'check on a raid 5 with 1 missing device will not do anything. After > all there is nothing it can do.' But this is NOT true if there is a bad > block on a drive. Then the RAID driver *could* determine the correct > data from the other drives. I think it is. It partly depends on your understanding of 'missing'... md no longer kicks instantly on a read failure; I'm pretty sure it attempts a corrective write. Anyhow the repair is preventative - with a full array it asks md to read blocks on all devices. If the device has been totally kicked already then you have no redundancy and you are stuck. OTOH if there is a bad block read _during the "repair"_ then md will attempt to use the other device/blocks to calculate the correct value and write the block. David -- "Don't worry, you'll be fine; I saw it work in a cartoon once..." |
From: Bruce A. <ba...@gr...> - 2008-10-30 20:53:28
|
>> I am not sure if these are directly relevant. FOr example the docs say >> 'check on a raid 5 with 1 missing device will not do anything. After >> all there is nothing it can do.' But this is NOT true if there is a bad >> block on a drive. Then the RAID driver *could* determine the correct >> data from the other drives. > > I think it is. > > It partly depends on your understanding of 'missing'... md no longer kicks > instantly on a read failure; I'm pretty sure it attempts a corrective write. OK, that was my question. Does it attempt a corrective write with a RAID-5 or RAID-6 array, or only with mirrored drives? > Anyhow the repair is preventative - with a full array it asks md to read > blocks on all devices. If the device has been totally kicked already > then you have no redundancy and you are stuck. Clear > OTOH if there is a bad block read _during the "repair"_ then md will > attempt to use the other device/blocks to calculate the correct value > and write the block. OK. So the question is, what if there is a bad block read _during "normal operation"_. Will md attempt to use the other device/blocks to calculate the correct value and write the block? Cheers, Bruce |
From: Michal S. <so...@zi...> - 2008-10-31 01:15:48
|
Bruce Allen wrote: > > OK, that was my question. Does it attempt a corrective write with a > RAID-5 or RAID-6 array, or only with mirrored drives? > All raid levels with redundancy offer that feature, as far as I know (and quick peek over sources seems to confirm it). Succesful attempt to correct read error will increase /sys/block/md.../md/errors . Appropriate message will be available in kernel log as well. Mentioned errors value is preserved across reboots if you use 1.x md superblock. I haven't had any drives failing on me recently, so you might just ask on linux-raid mailinglist to be 100% sure - the above is based on on Documentation/md.txt though. > > OK. So the question is, what if there is a bad block read _during "normal > operation"_. Will md attempt to use the other device/blocks to calculate > the correct value and write the block? > Yes. 'Check' / 'resync' / 'repair' (like 'resync' but ignoring bitmap) are more suited for situations, where the data can be mismatched (e.g. due to power failure). 'Check' will report but won't attempt to fix this. Repair and resync will fix such mismatches. I'm not sure if 'check' attempts to fix badblocks (and detecting mismatches won't work in such case with e.g. raid5) or if it operates in completely read-only fashion. 'Repair' will probably assume that data in readable sectors is valid and attempt to fix badsector. |
From: Oliver B. <oli...@ae...> - 2008-10-30 15:28:39
|
Hi Bruce, I think the question is also whether you want to use software RAID in a RAID-5 or RAID-6 setup. Those usually put a considerable burden on your system's CPU. That's why (good) hardware controllers have dedicated processors that run the XOR-engine. Cheers, Oliver Bruce Allen wrote: >> AFIAK, the Linux software RAID does this for you if it encounters a bad >> block on one of the disks: >> http://lxr.linux.no/linux+v2.6.27/drivers/md/raid1.c#L1621 > > That's great!! When was this feature added to Linux software RAID? Does > it work for all redundant RAID levels like RAID-5 or RAID-6, or only for > mirroring? > > Cheers, > Bruce > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Smartmontools-devel mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-devel |
From: Bruce A. <ba...@gr...> - 2008-10-30 15:34:40
|
On Thu, 30 Oct 2008, Oliver Bock wrote: > I think the question is also whether you want to use software RAID in a > RAID-5 or RAID-6 setup. Those usually put a considerable burden on your > system's CPU. That's why (good) hardware controllers have dedicated > processors that run the XOR-engine. For server-class hardware this used to be the case, but it is no longer true. For example the Sun X4500 boxes use software RAID on 4 opteron cores and can get around 500 MB/s in RAID-6 mode. The Areca RAID cards have a single 800 MHz Intel CPU core. Compare this with a modern storage server that might have four 3 GHz cores. Cheers, Bruce > Bruce Allen wrote: >>> AFIAK, the Linux software RAID does this for you if it encounters a bad >>> block on one of the disks: >>> http://lxr.linux.no/linux+v2.6.27/drivers/md/raid1.c#L1621 >> >> That's great!! When was this feature added to Linux software RAID? Does >> it work for all redundant RAID levels like RAID-5 or RAID-6, or only for >> mirroring? >> >> Cheers, >> Bruce >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Smartmontools-devel mailing list >> Sma...@li... >> https://lists.sourceforge.net/lists/listinfo/smartmontools-devel > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Smartmontools-devel mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-devel > |
From: Bruce A. <ba...@gr...> - 2008-10-31 08:05:48
|
Hi Michal, >> OK, that was my question. Does it attempt a corrective write with a >> RAID-5 or RAID-6 array, or only with mirrored drives? > All raid levels with redundancy offer that feature, as far as I know > (and quick peek over sources seems to confirm it). > > Succesful attempt to correct read error will increase > /sys/block/md.../md/errors . Appropriate message will be available in > kernel log as well. Mentioned errors value is preserved across reboots > if you use 1.x md superblock. > > I haven't had any drives failing on me recently, so you might just ask > on linux-raid mailinglist to be 100% sure - the above is based on on > Documentation/md.txt though. A generally useful comment for those interested in these issues: One can now TEST these RAID features very easily, using a new program that Mark Lord recently added to the hdparm package. This program has the ability to create uncorrectable (UNC) sectors on a drive. An attempt to read these sectors will result in a READ error. The UNC sector can only be corrected with a write. The code is called make_bad_sector.c: http://hdparm.sourcearchive.com/documentation/8.9/make__bad__sector_8c-source.html The code makes use of the 'WRITE LONG' ATA command. I understand from Mark that future versions will also offer the option to use the new ATA-8 'WRITE UNC' command. You can use make_bad_sector to create unreadable sectors, and use this to test the behavior of RAID systems like Linux software RAID, when they encounter unreadable/corrupted data sectors on a disk. Cheers, Bruce |
From: Mark L. <ml...@po...> - 2008-10-31 13:10:49
|
Bruce Allen wrote: .. > One can now TEST these RAID features very easily, using a new program > that Mark Lord recently added to the hdparm package. This program has > the ability to create uncorrectable (UNC) sectors on a drive. An > attempt to read these sectors will result in a READ error. The UNC > sector can only be corrected with a write. > > The code is called make_bad_sector.c: > http://hdparm.sourcearchive.com/documentation/8.9/make__bad__sector_8c-source.html > > The code makes use of the 'WRITE LONG' ATA command. I understand from > Mark that future versions will also offer the option to use the new > ATA-8 'WRITE UNC' command. > > You can use make_bad_sector to create unreadable sectors, and use this > to test the behavior of RAID systems like Linux software RAID, when they > encounter unreadable/corrupted data sectors on a disk. .. The "future version" is already here: this functionality is part of current versions of hdparm, and automatically chooses between WRITE_UNC and WRITE_LONG according to the device capabilities. To corrupt a sector: hdparm --make-bad-sector nnnnn /dev/sdX (replace nnnnn with sector number (LBA), and /dev/sdX with actual device name). Or, for faster failures, modern drives support "flagged" bad sectors, which can be created by prepending the letter f to the nnnnn value. To later repair the bad sector manually, just overwrite it (with zeros) using this command: hdparm --write-sector nnnnn /dev/sdX Cheers -- Mark Lord Real-Time Remedies Inc. ml...@po... |
From: Bruce A. <ba...@gr...> - 2008-10-31 13:14:29
|
Hi Mark, >> You can use make_bad_sector to create unreadable sectors, and use this >> to test the behavior of RAID systems like Linux software RAID, when >> they encounter unreadable/corrupted data sectors on a disk. > .. > > The "future version" is already here: this functionality is part of > current versions of hdparm, and automatically chooses between WRITE_UNC > and WRITE_LONG according to the device capabilities. Thanks for the correction and clarification! I should have read the code. > To corrupt a sector: hdparm --make-bad-sector nnnnn /dev/sdX > > (replace nnnnn with sector number (LBA), and /dev/sdX with actual device > name). > > Or, for faster failures, modern drives support "flagged" bad sectors, > which can be created by prepending the letter f to the nnnnn value. > > To later repair the bad sector manually, just overwrite it (with zeros) > using this command: hdparm --write-sector nnnnn /dev/sdX I think this is *really* useful. We have already used this extensively in Hannover to verify that our Areca hardware RAID controllers behave correctly in read, scrub and verify modes. (They DO behave correctly!) Cheers, Bruce |
From: Bruce A. <ba...@gr...> - 2009-01-15 20:46:27
|
Hi Mark, Wow, yuk. Using the -r ioctl,2 option as Christian suggested should help to sort it out. I just can't see this being flaky drive firmware. Something that might be interesting is to dig out an old version of smartmontools (use '-d ata') and see if that shows the same problems. Also, on the current version try switching between '-d ata' and '-d sat' to see if that shows any difference. Cheers, Bruce On Thu, 15 Jan 2009, Mark Lord wrote: > Mark Lord wrote: >> Guys, >> >> I have a Hitachi 750GB drive here, which smartctl reports as FAILING NOW >> when installed on an Intel AHCI controller in a pure 64-bit Core2Duo box >> (linux-2.6.28). >> >> 5 Reallocated_Sector_Ct 0x0033 003 003 005 Pre-fail >> Always FAILING_NOW 1839 >> >> But.. that exact same drive, when inspected on a SiliconImage 3132 port >> on my notebook computer shows zero problems -- pure 32-bit system there. >> >> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail >> Always - 0 >> >> Oddly, running the 32-bit executable from the notebook, but on the 64-bit >> box, gives the same error that the 64-bit version gives on that box. >> >> Peculiar, and I suspect a 32/64 bit bug somewhere. >> Is there a standard easy way to trace this, something akin >> to the "--verbose" flag that (my) hdparm utility has for such cases? >> >> Meanwhile, I'm going to boot a 32-bit livecd on the 64-bit box >> and try again with that. > .. > > Mmm.. something fishy is going on here -- now it also seems to be > showing the error on the 32-bit notebook, where earlier it didn't. > Must be flaky drive firmware or something. > > Cheers > |
From: Mark L. <ml...@po...> - 2009-01-15 21:05:09
|
Bruce Allen wrote: > Hi Mark, > > Wow, yuk. Using the -r ioctl,2 option as Christian suggested should > help to sort it out. I just can't see this being flaky drive firmware. > > Something that might be interesting is to dig out an old version of > smartmontools (use '-d ata') and see if that shows the same problems. > Also, on the current version try switching between '-d ata' and '-d sat' > to see if that shows any difference. .. Thanks. I moved the drive to a third system, and there the BIOS even complains about SMART failure before booting. Bad drive, all of a sudden. Oddly enough, googling around found a near-identical complaint for this exact model (HUA721075KLA330) from somebody else: http://www.newegg.com/Product/Product.aspx?Item=N82E16822145184 Pity, as the 32MB onboard cache made this beast noticeably faster than the "equivalent" seagate model. Cheers -- Mark Lord Real-Time Remedies Inc. ml...@po... |
From: Bruce A. <ba...@gr...> - 2009-01-15 21:23:29
|
Hi Mark, It's weird: the drive showed 1800+ reallocated sectors, then showed zero! I guess if the memory that hosts the firmware on the drive is flaky, that might explain it, but I would have expected a drive lock-up instead, not even responding to sata commands for example. Anyway, I am reluctantly starting to agree with you that this looks like a bad drive, not bad code... Cheers, Bruce On Thu, 15 Jan 2009, Mark Lord wrote: > Bruce Allen wrote: >> Hi Mark, >> >> Wow, yuk. Using the -r ioctl,2 option as Christian suggested should >> help to sort it out. I just can't see this being flaky drive firmware. >> >> Something that might be interesting is to dig out an old version of >> smartmontools (use '-d ata') and see if that shows the same problems. >> Also, on the current version try switching between '-d ata' and '-d sat' >> to see if that shows any difference. > .. > > Thanks. I moved the drive to a third system, and there the BIOS > even complains about SMART failure before booting. > > Bad drive, all of a sudden. > > Oddly enough, googling around found a near-identical complaint > for this exact model (HUA721075KLA330) from somebody else: > > http://www.newegg.com/Product/Product.aspx?Item=N82E16822145184 > > Pity, as the 32MB onboard cache made this beast noticeably faster > than the "equivalent" seagate model. > > Cheers > |
From: Mark L. <ml...@po...> - 2009-01-15 17:00:23
|
Guys, I have a Hitachi 750GB drive here, which smartctl reports as FAILING NOW when installed on an Intel AHCI controller in a pure 64-bit Core2Duo box (linux-2.6.28). 5 Reallocated_Sector_Ct 0x0033 003 003 005 Pre-fail Always FAILING_NOW 1839 But.. that exact same drive, when inspected on a SiliconImage 3132 port on my notebook computer shows zero problems -- pure 32-bit system there. 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 Oddly, running the 32-bit executable from the notebook, but on the 64-bit box, gives the same error that the 64-bit version gives on that box. Peculiar, and I suspect a 32/64 bit bug somewhere. Is there a standard easy way to trace this, something akin to the "--verbose" flag that (my) hdparm utility has for such cases? Meanwhile, I'm going to boot a 32-bit livecd on the 64-bit box and try again with that. Thanks -- Mark Lord Real-Time Remedies Inc. ml...@po... |
From: Christian F. <Chr...@t-...> - 2009-01-15 18:02:54
|
Mark Lord wrote: > ... > Peculiar, and I suspect a 32/64 bit bug somewhere. > Is there a standard easy way to trace this, something akin > to the "--verbose" flag that (my) hdparm utility has for such cases? > > Hi Mark, please try "smartctl -r ioctl,2 ..." Cheers, Christian |
From: Mark L. <ml...@po...> - 2009-01-15 17:25:02
|
Mark Lord wrote: > Guys, > > I have a Hitachi 750GB drive here, which smartctl reports as FAILING NOW > when installed on an Intel AHCI controller in a pure 64-bit Core2Duo box > (linux-2.6.28). > > 5 Reallocated_Sector_Ct 0x0033 003 003 005 Pre-fail > Always FAILING_NOW 1839 > > But.. that exact same drive, when inspected on a SiliconImage 3132 port > on my notebook computer shows zero problems -- pure 32-bit system there. > > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail > Always - 0 > > Oddly, running the 32-bit executable from the notebook, but on the 64-bit > box, gives the same error that the 64-bit version gives on that box. > > Peculiar, and I suspect a 32/64 bit bug somewhere. > Is there a standard easy way to trace this, something akin > to the "--verbose" flag that (my) hdparm utility has for such cases? > > Meanwhile, I'm going to boot a 32-bit livecd on the 64-bit box > and try again with that. .. Mmm.. something fishy is going on here -- now it also seems to be showing the error on the 32-bit notebook, where earlier it didn't. Must be flaky drive firmware or something. Cheers -- Mark Lord Real-Time Remedies Inc. ml...@po... |