From: Richard a. M. B. <reg...@co...> - 2008-02-21 11:34:59
|
All, I have 6 WDC WD5000AAKS 500GB disks in a Software RAID 5 array under Linux. The machine was recently power cycled to change a PSU and on reboot sda was bumped from the RAID array and would not re-add. It turns out the disk has 'shrunk' : ted ~ # cat /proc/partitions major minor #blocks name 8 0 488385527 sda 8 16 488386584 sdb 8 32 488386584 sdc 8 48 488386584 sdd 8 64 488386584 sde 8 80 488386584 sdf 9 0 1953545984 md0 I'm assuming this is due to 57 bad blocks 'known' to the disk and therefore the disk is reporting fewer available blocks to Linux. However, the disk is reporting no errors to smartctl. I've run the Short and Extended offline tests and these both give "Completed without error". Can smartctl 'prove' that the disk has bad blocks prior to me RMAing the disk? Cheers, Rich. p.s., I've run badblocks, but this only analyses blocks reported to Linux, i.e. the 488385527 'good' ones. |
From: Jim P. <ji...@jt...> - 2008-02-24 06:49:03
|
Richard and Monica Bland wrote: > I have 6 WDC WD5000AAKS 500GB disks in a Software RAID 5 array under > Linux. The machine was recently power cycled to change a PSU and on > reboot sda was bumped from the RAID array and would not re-add. > > It turns out the disk has 'shrunk' : .. > I'm assuming this is due to 57 bad blocks 'known' to the disk and > therefore the disk is reporting fewer available blocks to Linux. Hi Rich, Disks don't do that. Bad blocks are reallocated from a spare area on disk, and that's why there are a limited number of reallocations that can occur. The problem must lie elsewhere -- maybe something like HPA (host protected area) enabled on the disk by the BIOS, or maybe some driver issue. The full dmesg output from boot would be the best place to track that down. > However, the disk is reporting no errors to smartctl. I've run the Short > and Extended offline tests and these both give "Completed without > error". Can smartctl 'prove' that the disk has bad blocks prior to me > RMAing the disk? Full "smartctl -a" output would show any reallocations that occured, although if the extended offline test is successful than it's likely the disk does not need to be returned. -jim |
From: Bruce A. <ba...@gr...> - 2008-02-24 06:56:44
|
Richard, I agree with Jim's comments below. Bruce On Sun, 24 Feb 2008, Jim Paris wrote: > Richard and Monica Bland wrote: >> I have 6 WDC WD5000AAKS 500GB disks in a Software RAID 5 array under >> Linux. The machine was recently power cycled to change a PSU and on >> reboot sda was bumped from the RAID array and would not re-add. >> >> It turns out the disk has 'shrunk' : > .. >> I'm assuming this is due to 57 bad blocks 'known' to the disk and >> therefore the disk is reporting fewer available blocks to Linux. > > Hi Rich, > > Disks don't do that. Bad blocks are reallocated from a spare area on > disk, and that's why there are a limited number of reallocations that > can occur. > > The problem must lie elsewhere -- maybe something like HPA (host > protected area) enabled on the disk by the BIOS, or maybe some driver > issue. The full dmesg output from boot would be the best place to > track that down. > >> However, the disk is reporting no errors to smartctl. I've run the Short >> and Extended offline tests and these both give "Completed without >> error". Can smartctl 'prove' that the disk has bad blocks prior to me >> RMAing the disk? > > Full "smartctl -a" output would show any reallocations that occured, > although if the extended offline test is successful than it's likely > the disk does not need to be returned. > > -jim > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |
From: Richard a. M. B. <reg...@co...> - 2008-02-24 13:40:36
|
Jim Paris wrote: > Richard and Monica Bland wrote: >> I have 6 WDC WD5000AAKS 500GB disks in a Software RAID 5 array under >> Linux. The machine was recently power cycled to change a PSU and on >> reboot sda was bumped from the RAID array and would not re-add. >> >> It turns out the disk has 'shrunk' : > .. >> I'm assuming this is due to 57 bad blocks 'known' to the disk and >> therefore the disk is reporting fewer available blocks to Linux. > > The problem must lie elsewhere -- maybe something like HPA (host > protected area) enabled on the disk by the BIOS, or maybe some driver > issue. The full dmesg output from boot would be the best place to > track that down. > Good call, Jim. dmesg did indeed show a HPA on (only) sda: [ 52.548387] ata1.00: Host Protected Area detected: [ 52.548388] current size: 976771055 sectors [ 52.548389] native size: 976773168 sectors Not 100% sure how it got there, although there was a BIOS upgrade on the machine around the same time as the PSU upgrade. Briefly looking through the BIOS, I couldn't see where it was enabled, so will have to do a bit more digging. I would imagine if I removed the HPA'd sda, the BIOS would HPA the 'new' sda (currently sdb) and REALLY mess up the RAID5 :( Cheers, Rich |
From: Bruce A. <ba...@gr...> - 2008-02-24 10:11:36
|
Does 'smartctl -i' report that the disks have shrunk (under User Capacity)? Bruce On Thu, 21 Feb 2008, Richard and Monica Bland wrote: > All, > > I have 6 WDC WD5000AAKS 500GB disks in a Software RAID 5 array under > Linux. The machine was recently power cycled to change a PSU and on > reboot sda was bumped from the RAID array and would not re-add. > > It turns out the disk has 'shrunk' : > > ted ~ # cat /proc/partitions > major minor #blocks name > > 8 0 488385527 sda > 8 16 488386584 sdb > 8 32 488386584 sdc > 8 48 488386584 sdd > 8 64 488386584 sde > 8 80 488386584 sdf > 9 0 1953545984 md0 > > I'm assuming this is due to 57 bad blocks 'known' to the disk and > therefore the disk is reporting fewer available blocks to Linux. > > However, the disk is reporting no errors to smartctl. I've run the Short > and Extended offline tests and these both give "Completed without > error". Can smartctl 'prove' that the disk has bad blocks prior to me > RMAing the disk? > > Cheers, > > Rich. > > p.s., I've run badblocks, but this only analyses blocks reported to > Linux, i.e. the 488385527 'good' ones. > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |
From: Richard a. M. B. <reg...@co...> - 2008-02-24 19:48:25
|
Jim Paris wrote: > Richard and Monica Bland wrote: >> I have 6 WDC WD5000AAKS 500GB disks in a Software RAID 5 array under >> Linux. The machine was recently power cycled to change a PSU and on >> reboot sda was bumped from the RAID array and would not re-add. >> >> It turns out the disk has 'shrunk' : > .. >> I'm assuming this is due to 57 bad blocks 'known' to the disk and >> therefore the disk is reporting fewer available blocks to Linux. > > The problem must lie elsewhere -- maybe something like HPA (host > protected area) enabled on the disk by the BIOS, or maybe some driver > issue. The full dmesg output from boot would be the best place to > track that down. > OK, the HPA problem is now 'fixed'. I think a HPA was created when I chose to backup the current BIOS prior to flashing a new version. I assumed this would backup the BIOS to the secondary BIOS location, since this is a 'Dual BIOS' motherboard, but apparently not - it looks like it backs it up to a HPA in the first disk it finds :( Not good on a RAIDed system... I say the problem is now 'fixed'... I went into the BIOS Reflash utility to delete the HPA, but there was no such option. However, on rebooting the HPA has gone... *shrug* Sorry for bothering y'all. Rich. |
From: Bruno W. I. <br...@wo...> - 2008-02-29 22:54:14
|
On Sun, Feb 24, 2008 at 19:48:16 +0000, Richard and Monica Bland <reg...@co...> wrote: > > OK, the HPA problem is now 'fixed'. I think a HPA was created when I > chose to backup the current BIOS prior to flashing a new version. I > assumed this would backup the BIOS to the secondary BIOS location, since > this is a 'Dual BIOS' motherboard, but apparently not - it looks like it > backs it up to a HPA in the first disk it finds :( Not good on a RAIDed > system... > > I say the problem is now 'fixed'... I went into the BIOS Reflash utility > to delete the HPA, but there was no such option. However, on rebooting > the HPA has gone... *shrug* There is a program which can set or query the max address on pata drives (I am not sure about sata or scsi) under linux at: http://www.win.tue.nl/~aeb/linux/setmax.c |