|
From: Maciej W. R. <ma...@ds...> - 2003-05-14 14:24:53
|
Hello,
I have a problem running smartmontools for devices attached to a BusLogic
BT-958 HBA. The HBA reports command timeouts such as the one below:
scsi : aborting command due to timeout : pid 412, scsi0, channel 0, id 1, lun 0
Log Sense 00 40 00 00 00 00 00 fc 00
scsi0: Aborting CCB #400 to Target 1
SCSI host 0 abort (pid 412) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
scsi0: Sending Bus Device Reset CCB #401 to Target 1
SCSI host 0 channel 0 reset (pid 412) timed out - trying harder
SCSI bus is being reset for host 0 channel 0.
scsi0: Resetting BusLogic BT-958 due to Target 1
scsi0: *** BusLogic BT-958 Initialized Successfully ***
SCSI host 0 abort (pid 414) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
scsi0: Resetting BusLogic BT-958 due to Target 1
scsi0: *** BusLogic BT-958 Initialized Successfully ***
SCSI host 0 abort (pid 416) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
scsi0: Resetting BusLogic BT-958 due to Target 1
scsi0: *** BusLogic BT-958 Initialized Successfully ***
SCSI host 0 abort (pid 418) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
scsi0: Resetting BusLogic BT-958 due to Target 1
scsi0: *** BusLogic BT-958 Initialized Successfully ***
SCSI host 0 abort (pid 420) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
scsi0: Resetting BusLogic BT-958 due to Target 1
scsi0: *** BusLogic BT-958 Initialized Successfully ***
Under certain conditions, especially if the root/swap disk is involved the
resets continue indefinitely or the NMI watchdog kicks in.
I've tracked the problem down to the Log Sense command. It appears that
the HBA's firmware aborts whenever the amount of data returned by a device
is smaller than specified as the buffer length in the SEND_COMMAND ioctl
(the length is passed down to the firmware; the length specified in the
CDB is independent and doesn't matter directly). I suppose the firmware
treats the value passed as a transfer length, while for the Log Sense
command it really means an allocation length.
Unfortunately, after BusLogic changed owners for three times now (now
being a part of LSI Logic), it's hard to get any support from them and
getting the firmware fixed is impossible. Additionally, the BusLogic's
maintainer, Leonard Zubkoff, cannot help anymore.
Under these conditions I decided to work the problem around in
smartmontools. Here is my proposal. Log Sense pages are retrieved in two
steps, getting a page length first and then getting a full page using the
length obtained. A patch follows. It works fine for me -- no timeouts
anymore.
What do you think?
Maciej
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: ma...@ds..., PGP key available +
smartmontools-5.1-11-log_sense.patch
diff -up --recursive --new-file smartmontools-5.1-11.macro/scsicmds.c smartmontools-5.1-11/scsicmds.c
--- smartmontools-5.1-11.macro/scsicmds.c 2003-05-01 11:06:27.000000000 +0000
+++ smartmontools-5.1-11/scsicmds.c 2003-05-12 09:39:00.000000000 +0000
@@ -347,17 +347,49 @@ int scsiLogSense(int device, int pagenum
struct scsi_sense_disect sinfo;
UINT8 cdb[10];
UINT8 sense[32];
+ int pageLen;
int status, res;
+ /* Get page length first. */
+ pageLen = 4;
+ if (pageLen > bufLen)
+ return -EIO;
+
memset(&io_hdr, 0, sizeof(io_hdr));
memset(cdb, 0, sizeof(cdb));
io_hdr.dxfer_dir = DXFER_FROM_DEVICE;
- io_hdr.dxfer_len = bufLen;
+ io_hdr.dxfer_len = pageLen;
io_hdr.dxferp = pBuf;
cdb[0] = LOG_SENSE;
cdb[2] = 0x40 | (pagenum & 0x3f); /* Page control (PC)==1 */
- cdb[7] = (bufLen >> 8) & 0xff;
- cdb[8] = bufLen & 0xff;
+ cdb[7] = (pageLen >> 8) & 0xff;
+ cdb[8] = pageLen & 0xff;
+ io_hdr.cmnd = cdb;
+ io_hdr.cmnd_len = sizeof(cdb);
+ io_hdr.sensep = sense;
+ io_hdr.max_sense_len = sizeof(sense);
+
+ status = do_scsi_cmnd_io(device, &io_hdr);
+ scsi_do_sense_disect(&io_hdr, &sinfo);
+ if ((res = scsiSimpleSenseFilter(&sinfo)))
+ return res;
+ if (status > 0)
+ return -EIO;
+
+ /* Now get the whole page. */
+ pageLen = (pBuf[2] << 8) | pBuf[3];
+ if (pageLen > bufLen)
+ pageLen = bufLen;
+
+ memset(&io_hdr, 0, sizeof(io_hdr));
+ memset(cdb, 0, sizeof(cdb));
+ io_hdr.dxfer_dir = DXFER_FROM_DEVICE;
+ io_hdr.dxfer_len = pageLen;
+ io_hdr.dxferp = pBuf;
+ cdb[0] = LOG_SENSE;
+ cdb[2] = 0x40 | (pagenum & 0x3f); /* Page control (PC)==1 */
+ cdb[7] = (pageLen >> 8) & 0xff;
+ cdb[8] = pageLen & 0xff;
io_hdr.cmnd = cdb;
io_hdr.cmnd_len = sizeof(cdb);
io_hdr.sensep = sense;
@@ -369,6 +401,7 @@ int scsiLogSense(int device, int pagenum
return res;
if (status > 0)
status = -EIO;
+
return status;
}
|