|
From: Tony B. <to...@cy...> - 2025-11-03 15:44:46
|
On 9/29/25 10:28, Tony Battersby wrote: > v1 -> v2 > - Add new patch "scsi: qla2xxx: clear cmds after chip reset" suggested > by Dmitry Bogdanov. > - Rename "scsi: qla2xxx: fix oops during cmd abort" to "scsi: qla2xxx: > fix races with aborting commands" and make SCST reset the ISP on a HW > timeout instead of unmapping DMA that might still be in use. > - Fix "scsi: qla2xxx: fix TMR failure handling" to free mcmds properly > for LIO. > - In "scsi: qla2xxx: add back SRR support", detect more buggy HBA fw > versions based on the fw release notes. > - Shorten code comment in "scsi: qla2xxx: improve safety of cmd lookup > by handle" and improve patch description. > - Rebase other patches as needed. > > v1: > https://lore.kernel.org/r/f89...@cy.../ > > This patch series improves the qla2xxx FC driver in target mode. I > developed these patches using the out-of-tree SCST target-mode subsystem > (https://scst.sourceforge.net/), although most of the improvements will > also apply to the other target-mode subsystems such as the in-tree LIO. > Unfortunately qla2xxx+LIO does not pass all of my tests, but my patches > do not make it any worse (results below). These patches have been > well-tested at my employer with qla2xxx+SCST in both initiator mode and > target mode and with a variety of FC HBAs and initiators. Since SCST is > out-of-tree, some of the patches have parts that apply in-tree and other > parts that apply out-of-tree to SCST. I am going to include the > out-of-tree SCST patches to provide additional context; feel free to > ignore them if you are not interested. > > All patches apply to linux 6.17 and SCST 3.10 master branch. > > Summary of patches: > - bugfixes > - cleanups > - improve handling of aborts and task management requests > - improve log message > - add back SLER / SRR support (removed in 2017) > > Some of these patches improve handling of aborts and task management > requests. This is some of the testing that I did: > > Test 1: Use /dev/sg to queue random disk I/O with short timeouts; make > sure cmds are aborted successfully. > Test 2: Queue lots of disk I/O, then use "sg_reset -N -d /dev/sg" on > initiator to reset logical unit. > Test 3: Queue lots of disk I/O, then use "sg_reset -N -t /dev/sg" on > initiator to reset target. > Test 4: Queue lots of disk I/O, then use "sg_reset -N -b /dev/sg" on > initiator to reset bus. > Test 5: Queue lots of disk I/O, then use "sg_reset -N -H /dev/sg" on > initiator to reset host. > Test 6: Use fiber channel attenuator to trigger SRR during > write/read/compare test; check data integrity. > > With my patches, SCST passes all of these tests. > > Results with in-tree LIO target-mode subsystem: > > Test 1: Seems to abort the same cmd multiple times (both > qlt_24xx_retry_term_exchange() and __qlt_send_term_exchange()). But > cmds get aborted, so give it a pass? > > Test 2: Seems to work; cmds are aborted. > > Test 3: Target reset doesn't seem to abort cmds, instead, a few seconds > later: > qla2xxx [0000:04:00.0]-f058:9: qla_target(0): tag 1314312, op 2a: CTIO > with TIMEOUT status 0xb received (state 1, port 51:40:2e:c0:18:1d:9f:cc, > LUN 0) > > Tests 4 and 5: The initiator is unable to log back in to the target; the > following messages are repeated over and over on the target: > qla2xxx [0000:04:00.0]-e01c:9: Sending TERM ELS CTIO (ha=00000000f8811390) > qla2xxx [0000:04:00.0]-f097:9: Linking sess 000000008df5aba8 [0] wwn > 51:40:2e:c0:18:1d:9f:cc with PLOGI ACK to wwn 51:40:2e:c0:18:1d:9f:cc > s_id 00:00:01, ref=2 pla 00000000835a9271 link 0 > > Test 6: passes with my patches; SRR not supported previously. > > So qla2xxx+LIO seems a bit flaky when handling exceptions, but my > patches do not make it any worse. Perhaps someone who is more familiar > with LIO can look at the difference between LIO and SCST and figure out > how to improve it. > > Tony Battersby > https://www.cybernetics.com/ > > Tony Battersby (16): > Revert "scsi: qla2xxx: Perform lockless command completion in abort > path" > scsi: qla2xxx: fix initiator mode with qlini_mode=exclusive > scsi: qla2xxx: fix lost interrupts with qlini_mode=disabled > scsi: qla2xxx: use reinit_completion on mbx_intr_comp > scsi: qla2xxx: remove code for unsupported hardware > scsi: qla2xxx: improve debug output for term exchange > scsi: qla2xxx: fix term exchange when cmd_sent_to_fw == 1 > scsi: qla2xxx: clear cmds after chip reset > scsi: qla2xxx: fix races with aborting commands > scsi: qla2xxx: improve checks in qlt_xmit_response / qlt_rdy_to_xfer > scsi: qla2xxx: fix TMR failure handling > scsi: qla2xxx: fix invalid memory access with big CDBs > scsi: qla2xxx: add cmd->rsp_sent > scsi: qla2xxx: improve cmd logging > scsi: qla2xxx: add back SRR support > scsi: qla2xxx: improve safety of cmd lookup by handle > > drivers/scsi/qla2xxx/qla_dbg.c | 3 +- > drivers/scsi/qla2xxx/qla_def.h | 1 - > drivers/scsi/qla2xxx/qla_gbl.h | 2 +- > drivers/scsi/qla2xxx/qla_init.c | 1 + > drivers/scsi/qla2xxx/qla_isr.c | 32 +- > drivers/scsi/qla2xxx/qla_mbx.c | 2 + > drivers/scsi/qla2xxx/qla_mid.c | 4 +- > drivers/scsi/qla2xxx/qla_os.c | 35 +- > drivers/scsi/qla2xxx/qla_target.c | 1775 +++++++++++++++++++++++----- > drivers/scsi/qla2xxx/qla_target.h | 112 +- > drivers/scsi/qla2xxx/tcm_qla2xxx.c | 17 + > 11 files changed, 1646 insertions(+), 338 deletions(-) > > > base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a Martin, Could you apply this patch series for 6.19? I have addressed all review comments and no one has given me any objections. All patches are on v2 except patch #11 which is on v3. https://lore.kernel.org/linux-scsi/e95...@cy.../ Thanks, Tony Battersby https://www.cybernetics.com/ |