From: Maciej W. R. <ma...@ds...> - 2003-05-14 14:24:53
|
Hello, I have a problem running smartmontools for devices attached to a BusLogic BT-958 HBA. The HBA reports command timeouts such as the one below: scsi : aborting command due to timeout : pid 412, scsi0, channel 0, id 1, lun 0 Log Sense 00 40 00 00 00 00 00 fc 00 scsi0: Aborting CCB #400 to Target 1 SCSI host 0 abort (pid 412) timed out - resetting SCSI bus is being reset for host 0 channel 0. scsi0: Sending Bus Device Reset CCB #401 to Target 1 SCSI host 0 channel 0 reset (pid 412) timed out - trying harder SCSI bus is being reset for host 0 channel 0. scsi0: Resetting BusLogic BT-958 due to Target 1 scsi0: *** BusLogic BT-958 Initialized Successfully *** SCSI host 0 abort (pid 414) timed out - resetting SCSI bus is being reset for host 0 channel 0. scsi0: Resetting BusLogic BT-958 due to Target 1 scsi0: *** BusLogic BT-958 Initialized Successfully *** SCSI host 0 abort (pid 416) timed out - resetting SCSI bus is being reset for host 0 channel 0. scsi0: Resetting BusLogic BT-958 due to Target 1 scsi0: *** BusLogic BT-958 Initialized Successfully *** SCSI host 0 abort (pid 418) timed out - resetting SCSI bus is being reset for host 0 channel 0. scsi0: Resetting BusLogic BT-958 due to Target 1 scsi0: *** BusLogic BT-958 Initialized Successfully *** SCSI host 0 abort (pid 420) timed out - resetting SCSI bus is being reset for host 0 channel 0. scsi0: Resetting BusLogic BT-958 due to Target 1 scsi0: *** BusLogic BT-958 Initialized Successfully *** Under certain conditions, especially if the root/swap disk is involved the resets continue indefinitely or the NMI watchdog kicks in. I've tracked the problem down to the Log Sense command. It appears that the HBA's firmware aborts whenever the amount of data returned by a device is smaller than specified as the buffer length in the SEND_COMMAND ioctl (the length is passed down to the firmware; the length specified in the CDB is independent and doesn't matter directly). I suppose the firmware treats the value passed as a transfer length, while for the Log Sense command it really means an allocation length. Unfortunately, after BusLogic changed owners for three times now (now being a part of LSI Logic), it's hard to get any support from them and getting the firmware fixed is impossible. Additionally, the BusLogic's maintainer, Leonard Zubkoff, cannot help anymore. Under these conditions I decided to work the problem around in smartmontools. Here is my proposal. Log Sense pages are retrieved in two steps, getting a page length first and then getting a full page using the length obtained. A patch follows. It works fine for me -- no timeouts anymore. What do you think? Maciej -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: ma...@ds..., PGP key available + smartmontools-5.1-11-log_sense.patch diff -up --recursive --new-file smartmontools-5.1-11.macro/scsicmds.c smartmontools-5.1-11/scsicmds.c --- smartmontools-5.1-11.macro/scsicmds.c 2003-05-01 11:06:27.000000000 +0000 +++ smartmontools-5.1-11/scsicmds.c 2003-05-12 09:39:00.000000000 +0000 @@ -347,17 +347,49 @@ int scsiLogSense(int device, int pagenum struct scsi_sense_disect sinfo; UINT8 cdb[10]; UINT8 sense[32]; + int pageLen; int status, res; + /* Get page length first. */ + pageLen = 4; + if (pageLen > bufLen) + return -EIO; + memset(&io_hdr, 0, sizeof(io_hdr)); memset(cdb, 0, sizeof(cdb)); io_hdr.dxfer_dir = DXFER_FROM_DEVICE; - io_hdr.dxfer_len = bufLen; + io_hdr.dxfer_len = pageLen; io_hdr.dxferp = pBuf; cdb[0] = LOG_SENSE; cdb[2] = 0x40 | (pagenum & 0x3f); /* Page control (PC)==1 */ - cdb[7] = (bufLen >> 8) & 0xff; - cdb[8] = bufLen & 0xff; + cdb[7] = (pageLen >> 8) & 0xff; + cdb[8] = pageLen & 0xff; + io_hdr.cmnd = cdb; + io_hdr.cmnd_len = sizeof(cdb); + io_hdr.sensep = sense; + io_hdr.max_sense_len = sizeof(sense); + + status = do_scsi_cmnd_io(device, &io_hdr); + scsi_do_sense_disect(&io_hdr, &sinfo); + if ((res = scsiSimpleSenseFilter(&sinfo))) + return res; + if (status > 0) + return -EIO; + + /* Now get the whole page. */ + pageLen = (pBuf[2] << 8) | pBuf[3]; + if (pageLen > bufLen) + pageLen = bufLen; + + memset(&io_hdr, 0, sizeof(io_hdr)); + memset(cdb, 0, sizeof(cdb)); + io_hdr.dxfer_dir = DXFER_FROM_DEVICE; + io_hdr.dxfer_len = pageLen; + io_hdr.dxferp = pBuf; + cdb[0] = LOG_SENSE; + cdb[2] = 0x40 | (pagenum & 0x3f); /* Page control (PC)==1 */ + cdb[7] = (pageLen >> 8) & 0xff; + cdb[8] = pageLen & 0xff; io_hdr.cmnd = cdb; io_hdr.cmnd_len = sizeof(cdb); io_hdr.sensep = sense; @@ -369,6 +401,7 @@ int scsiLogSense(int device, int pagenum return res; if (status > 0) status = -EIO; + return status; } |