I have a failing hard drive with some bad sectors where hdparm's read-sector command returns a successful read if the kernel resets the sata link due to a frozen drive after hdparm's 60 second timeout. If the kernel returns a failure without reseting the link hdparm's sb[] output is "72 03 13 00 00 00 00 0e 09 0c 00 01 00 01 00 70 00 a4 00 0c e0 51 00 00 00 00 00 00 00 00 00 00" and it reports an I/O error. If the kernel returns a failure due to a timeout, sb[] output is "72 0b 00 00 00 00 00 0e 09 0c 00 00 00 01 00 70 00 a4 00 0c e0 40 00 00 00 00 00 00 00 00 00 00", and hdparm reports a successful read. io_hdr output is "ATA_16 status=0x2, host_status=0x0, driver_status=0x8" in both cases. Both io_hdr and sb[] output are all 0 for non-failing sectors. hdparm 9.43 also returns sb[10]=1 for the bad sector reading but it also says it is using LBA48. The rest of sb is the same for both hdparm 9.39 and hdparm 9.43.
Here is a sample of the kernel timeout report:
[ 2277.856096] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2277.856102] ata2.00: failed command: READ SECTOR(S)
[ 2277.856110] ata2.00: cmd 20/00:01:70:a4:0c/00:00:00:00:00/e0 tag 0 pio 512 in
[ 2277.856112] res 40/00:01:70:a4:0c/00:00:00:00:00/e0 Emask 0x4 (timeout)
[ 2277.856116] ata2.00: status: { DRDY }
[ 2277.856122] ata2: hard resetting link
[ 2278.161026] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2278.165726] ata2.00: configured for UDMA/33
[ 2278.165751] ata2: EH complete
I have tried hdparm 9.39 and 9.43 and kernel versions 3.6.6 and 3.2.33 all with similar results. Is this a bug in hdparm, a bug in the kernel, a bug in the motherboard sata chipset or drive firmware, or is the hard drive really successfully reading after the sata reset?
sg_decode_sense 72 03 13 00 00 00 00 0e 09 0c 00 01 00 01 00 70 00 a4 00 0c e0 51 00 00 00 00 00 00 00 00 00 00
Descriptor format, current; Sense key: Medium Error
Additional sense: Address mark not found for data field
Descriptor type: ATA Status Return
extend=0 error=0x1 sector_count=0x1
lba=0x0ca470
device=0xe0 status=0x51
Thus I suspect a bug in hdparm not interpreting status correctly (see bug [#65]).
Related
Bugs: #65
Last edit: Markus 2015-08-26