Re: [smartmontools-support] How to fix bad blocks on software RAID?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Kai Schaetzl wrote:
> I have two Samsung HD502IJ disks in a software RAID 1 array on CentOS
> 5.
> There are two mirrored partitions (/dev/md0, /dev/md1) on each, a
> small one for the xen hypervisor system and a large one with the
> remainder of the disk that is managed by LVM to get a bunch of smaller
> partitions for xen guests.
> Both disks show one Offline_Uncorrectable error at different offsets.
> ...
> The short offline tests then found the Offline_Uncorrectable errors
> and I also started getting two pending sectors in sda in the smartd
> messages.
> However, these disappeared after some time and only the two
> Offline_Uncorrectable errors remain.
> ...
> I read this thread
> http://sourceforge.net/mailarchive/message.php?
> msg_id=Pine.LNX.4.64.0806270653350.5844%40gc.phys.uwm.edu
> which suggests I could copy over good data from the other disk, but
> it's not clear to me at all how I find out where exactly the problem
> is and how I copy the correct data over.
> 

AFIAK, the Linux software RAID does this for you if it encounters a bad
block on one of the disks:
http://lxr.linux.no/linux+v2.6.27/drivers/md/raid1.c#L1621

So a raw read through the RAID driver may force the reallocation - with
a probability of 50% :-)
(e.g. 'ddrescue -v /dev/md0 /dev/null read.log') 

Note: Some older Samsung disks (at least SP1614C from P80 series) do not
increment Reallocated_Sector_Ct and do not reset Offline_Uncorrectable
on bad sector reallocation. I don't know whether this is the case for T-
or F1-Series disks.

Cheers,
Christian

Re: [smartmontools-support] How to fix bad blocks on software RAID?

Disk Inspection and Monitoring

Re: [smartmontools-support] How to fix bad blocks on software RAID?