I was investigating the function e2fsck_handle_read_error() in ehandler.c which is normally invoked when a read doesn't return the expected amount of data (or experiences an outright error). In case the offending "short read" returned at least some data, this function tries to read in the rest. This is quite sensible.
However, if the offending "short read" returned no data at all (e.g. hard error) then this is what this function does:
if (ask(ctx, _("Ignore error"), 1)) {
if (ask(ctx, _("Force rewrite"), 1))
io_channel_write_blk(channel, block, 1, data);
return 0;
}
So basically it gives the operator the option to ignore the read error, and then follow up by WRITING over the offending block if the operator gives his consent.
Could someone please explain to me why it makes sense to WRITE a block which you failed to read?
BTW, it has been my observation that most people invoke fsck with "-y", thereby unwittingly giving their consent to overwrite those blocks.
Furthermore, if I understand the code correctly, after e2fsck_handle_read_error() returns 0 after rewriting the offending block, the caller doesn't do anything sensible either, such as mark the block as bad.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It makes sense to do the force rewrite because it may cause the hard drive to remap the block to one of its spare blocks that are reserved specifically for this purpose. This is a much easier way to deal with a bad block in the inode table than trying to relocate the entire inode table (which must be contiguous).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Forcing the drive to remap the block by writing is something that didn't occur to me, and I suppose it makes a certain amount of sense. However, I'm uncomfortable with this: what if the disk is not directly attached to the host? For example, what if it is on a SAN or iSCSI? The read error could have been caused by a transient fibre channel or ethernet switch problem, so a subsequent force-rewrite could easily end up destroying good blocks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Greetings,
I was investigating the function e2fsck_handle_read_error() in ehandler.c which is normally invoked when a read doesn't return the expected amount of data (or experiences an outright error). In case the offending "short read" returned at least some data, this function tries to read in the rest. This is quite sensible.
However, if the offending "short read" returned no data at all (e.g. hard error) then this is what this function does:
if (ask(ctx, _("Ignore error"), 1)) {
if (ask(ctx, _("Force rewrite"), 1))
io_channel_write_blk(channel, block, 1, data);
return 0;
}
So basically it gives the operator the option to ignore the read error, and then follow up by WRITING over the offending block if the operator gives his consent.
Could someone please explain to me why it makes sense to WRITE a block which you failed to read?
BTW, it has been my observation that most people invoke fsck with "-y", thereby unwittingly giving their consent to overwrite those blocks.
Furthermore, if I understand the code correctly, after e2fsck_handle_read_error() returns 0 after rewriting the offending block, the caller doesn't do anything sensible either, such as mark the block as bad.
It makes sense to do the force rewrite because it may cause the hard drive to remap the block to one of its spare blocks that are reserved specifically for this purpose. This is a much easier way to deal with a bad block in the inode table than trying to relocate the entire inode table (which must be contiguous).
Forcing the drive to remap the block by writing is something that didn't occur to me, and I suppose it makes a certain amount of sense. However, I'm uncomfortable with this: what if the disk is not directly attached to the host? For example, what if it is on a SAN or iSCSI? The read error could have been caused by a transient fibre channel or ethernet switch problem, so a subsequent force-rewrite could easily end up destroying good blocks.