|
From: Rob T. <ro...@rt...> - 2022-05-01 08:57:59
|
Reporting back after some more tests. It appears that adding and then deleting an NPIV port causes corruption. This can still be a hardware issue on my end but I'd like to rule out an SCST issue. Just interacting with sysfs, something gets corrupted at kernel level to the point where Linux hangs. This is reproducable on my system, without any of my code. Fresh power-on boot, no tools or code running. I'm sending this sequence: root@fctgt:~# echo "1">/sys/kernel/scst_tgt/targets/qla2x00t/21:00:00:24:ff:2f:90:5f/enabled root@fctgt:~# echo "add_target 51:01:04:f0:18:0c:02:00 node_name=50:01:04:f0:18:0c:02:00;parent_host=21:00:00:24:ff:2f:90:5f">/sys/kernel/scst_tgt/targets/qla2x00t/mgmt root@fctgt:~# echo "1">/sys/kernel/scst_tgt/targets/qla2x00t/51:01:04:f0:18:0c:02:00/enabled root@fctgt:~# echo "0">/sys/kernel/scst_tgt/targets/qla2x00t/51:01:04:f0:18:0c:02:00/enabled root@fctgt:~# echo "del_target 51:01:04:f0:18:0c:02:00">/sys/kernel/scst_tgt/targets/qla2x00t/mgmt root@fctgt:~# reboot root@fctgt:~# On the VM console, Linux shows the system shutting down until SCST being shutdown, counts up to 30 seconds, then hard hangs. Even the cursor doesn't blink anymore. Dmesg output: [Sun May 1 10:20:59 2022] sqatgt: Initializing SCST Cavium adapter target driver interface - driver version=10.02.07.400-k, SCST version=3.7.0-pre, Cavium version=10.02.07.400-k [Sun May 1 10:20:59 2022] [889]: scst: Target template qla2x00t registered successfully [Sun May 1 10:20:59 2022] sqatgt(17/0): Registering target pwwn=21:00:00:24:ff:2f:90:5e [Sun May 1 10:20:59 2022] [889]: scst: Target 21:00:00:24:ff:2f:90:5e for template qla2x00t registered successfully [Sun May 1 10:20:59 2022] sqatgt(17/0): Registering target pwwn=21:00:00:24:ff:2f:90:5e scst_tgt 0000000094cfc6a1 sqa_tgt 000000003e29951e [Sun May 1 10:20:59 2022] sqatgt(34/0): Registering target pwwn=21:00:00:24:ff:2f:90:5f [Sun May 1 10:20:59 2022] [889]: scst: Target 21:00:00:24:ff:2f:90:5f for template qla2x00t registered successfully [Sun May 1 10:20:59 2022] sqatgt(34/0): Registering target pwwn=21:00:00:24:ff:2f:90:5f scst_tgt 00000000d067a987 sqa_tgt 000000009815396b [Sun May 1 10:20:59 2022] [892]: iscsi-scst: iser portal with cm_id 000000000d66db6a listens on 0.0.0.0:3260 [Sun May 1 10:20:59 2022] [892]: iscsi-scst: Created iser portal cm_id:000000000d66db6a [Sun May 1 10:20:59 2022] [892]: iscsi-scst: iser portal with cm_id 00000000743a842f listens on [::]:3260 [Sun May 1 10:20:59 2022] [892]: iscsi-scst: Created iser portal cm_id:00000000743a842f [Sun May 1 10:20:59 2022] [870]: scst: Removed all devices from group copy_manager_tgt [Sun May 1 10:20:59 2022] [870]: scst: Removed all devices from group 21:00:00:24:ff:2f:90:5e [Sun May 1 10:20:59 2022] [870]: scst: Removed all devices from group 21:00:00:24:ff:2f:90:5f [Sun May 1 10:20:59 2022] [870]: scst: Removed all devices from group scst_local_tgt [Sun May 1 10:21:01 2022] [870]: scst: Waiting for 1 active commands to complete... [Sun May 1 10:21:01 2022] [870]: scst: All active commands completed [Sun May 1 10:21:01 2022] [870]: scst: Target scst_local_tgt for template scst_local unregistered successfully [Sun May 1 10:21:01 2022] loop7: detected capacity change from 0 to 8 [Sun May 1 10:24:44 2022] [870]: scst: Using autogenerated relative target id 1 for target 21:00:00:24:ff:2f:90:5f [Sun May 1 10:24:44 2022] sqatgt(34/0): Enabling target pwwn=21:00:00:24:ff:2f:90:5f [Sun May 1 10:24:44 2022] qla2xxx [0000:0b:00.1]-00af:34: Performing ISP error recovery - ha=000000009e05ce0f. [Sun May 1 10:24:45 2022] qla2xxx [0000:0b:00.1]-500a:34: LOOP UP detected (8 Gbps). [Sun May 1 10:24:47 2022] sqatgt(34/0): Registering initiator: pwwn=21:01:00:1b:32:2f:94:1e [Sun May 1 10:24:47 2022] [5]: scst: Using security group "21:00:00:24:ff:2f:90:5f" for initiator "21:01:00:1b:32:2f:94:1e" (target 21:00:00:24:ff:2f:90:5f) [Sun May 1 10:24:47 2022] qla2xxx [0000:0b:00.1]-d034:34: qla24xx_do_nack_work create sess success 000000009cbacb66 [Sun May 1 10:25:00 2022] qla2xxx [0000:0b:00.1]-7080:35: VP entry id 1 assigned. [Sun May 1 10:25:00 2022] scsi host35: qla2xxx [Sun May 1 10:25:00 2022] sqatgt: add target 51:01:04:f0:18:0c:02:00 [Sun May 1 10:25:00 2022] sqatgt(35/1): Registering target pwwn=51:01:04:f0:18:0c:02:00 [Sun May 1 10:25:00 2022] [870]: scst: Target 51:01:04:f0:18:0c:02:00 for template qla2x00t registered successfully [Sun May 1 10:25:00 2022] sqatgt(35/1): Registering target pwwn=51:01:04:f0:18:0c:02:00 scst_tgt 0000000055d33ec2 sqa_tgt 00000000a4d0e7e1 [Sun May 1 10:25:00 2022] scsi host34: vport-34:0-0 created via shost34 channel 0 [Sun May 1 10:25:01 2022] qla2xxx [0000:0b:00.1]-5013:34: RSCN database changed -- 0001 1701 0000. [Sun May 1 10:25:02 2022] qla2xxx [0000:0b:00.1]-5013:34: RSCN database changed -- 0001 1701 0000. [Sun May 1 10:26:05 2022] [870]: scst: Using autogenerated relative target id 2 for target 51:01:04:f0:18:0c:02:00 [Sun May 1 10:26:05 2022] sqatgt(35/1): Enabling target pwwn=51:01:04:f0:18:0c:02:00 [Sun May 1 10:26:07 2022] qla2xxx [0000:0b:00.1]-5013:34: RSCN database changed -- 0001 1701 0000. [Sun May 1 10:26:27 2022] sqatgt(35/1): Disabling target pwwn=51:01:04:f0:18:0c:02:00 [Sun May 1 10:27:19 2022] qlt_del_vtarget: 510104f0180c0200 [Sun May 1 10:27:19 2022] qla2xxx 0000:0b:00.1: qlt_del_vtarget: port_name 510104f0180c0200 [Sun May 1 10:27:19 2022] Unregistering target for host 35(00000000d01d51a2) [Sun May 1 10:27:19 2022] sqatgt(35/1): Target release finished sqa_tgt 00000000a4d0e7e1 [Sun May 1 10:27:19 2022] [870]: scst: Waiting for 1 active commands to complete... [Sun May 1 10:27:19 2022] [870]: scst: All active commands completed [Sun May 1 10:27:19 2022] [870]: scst: Target 51:01:04:f0:18:0c:02:00 for template qla2x00t unregistered successfully [Sun May 1 10:27:19 2022] qla2xxx [0000:0b:00.1]-7088:35: VP[1] deleted. [Sun May 1 10:27:20 2022] qla2xxx [0000:0b:00.1]-5013:34: RSCN database changed -- 0001 1701 0000. Not sure if it's relevant but on the FC client side (other VM attached to SAN) I see this console message: [Sun May 1 10:28:41 2022] rport-35:0-0 blocked FC remote port time out: removing target and saving binding Any ideas? Rob On 4/28/2022 7:34 PM, Rob Turk wrote: > I'm a few steps further in development, hitting an issue that triggers > a kernel NULL pointer BUG report. > > Hardware setup: Two VM's (FC Target, FC CLient), both with their own > dual-port FC adapter and pass-through to VM. > One port of the FC client point-to-point to a port on the FC target. > The other port for client and target attach to a Brocade 300 switch > with NPIV support. > > The sequence I'm running first creates a SCST User device. It then > registers the device on a port of the FC adapter. On the > point-to-point physical port I add a LUN and that works fine. When I > use the NPIV-capable port then I add an NPIV virtual port, and > register the device as LUN 0 on that port. > > This all works well. But then I want to shutdown and remove the device > and the NPIV virtual port, and then disable the physical port if > nothing is running on it anymore. This is when I hit a kernel BUG report. > > This is the dmesg output, showing the creation of the device and the > NPIV port, as well as the sequence of removing: > > [Thu Apr 28 18:56:23 2022] [16942]: scst: Virtual device handler > UVTL_DEVICE12 for type 1 registered successfully > [snip] > [Thu Apr 28 18:56:34 2022] qla2xxx [0000:0b:00.1]-5013:34: RSCN > database changed -- 0001 1701 0000. > [Thu Apr 28 18:56:44 2022] qla2xxx [0000:0b:00.1]-00af:34: Performing > ISP error recovery - ha=000000003bca05dd. > [Thu Apr 28 18:56:55 2022] sqatgt(21/0): Disabling target > pwwn=21:00:00:24:ff:2f:90:5e > [Thu Apr 28 18:57:05 2022] qla2xxx [0000:0b:00.0]-00af:21: Performing > ISP error recovery - ha=000000003a674198. > > This is my software log which shows what commands I'm sending to the > sysfs filesystem: > 21:00:00:24:ff:2f:90:5e (point to point) and 21:00:00:24:ff:2f:90:5f > (to switch) are physical ports on the QLogic card. > 51:01:04:f0:18:0c:02:00 is the WWPN for the NPIV port. > > [16942I] 18:56:26.012 - buildDefaultIdentity: V=STK , > P=T10000C , devid=12, type=1 (Tape), > wwnn=50:01:04:f0:18:0c:02:00, wwpn=51:01:04:f0:18:0c:02:00 > [snip] > [16942I] 18:57:19.024 - _clearFcPortList: Removing Physical port > 21:00:00:24:ff:2f:90:5e > [16942I] 18:57:19.024 - -- END OF TRACE for ID 12 V0.1 -- > [16942] 18:57:19.024 - -- Trace end: Thursday, April 28, 2022 > > Filtering just the sysfs interactions: > [16942I] 18:56:28.048 - writeStringToFile: Wrote "1" to > /sys/kernel/scst_tgt/targets/qla2x00t/21:00:00:24:ff:2f:90:5e/enabled > [16942I] 18:56:30.096 - writeStringToFile: Wrote "1" to > /sys/kernel/scst_tgt/targets/qla2x00t/21:00:00:24:ff:2f:90:5f/enabled > [16942I] 18:56:30.105 - writeStringToFile: Wrote "add_target > 51:01:04:f0:18:0c:02:00 > node_name=50:01:04:f0:18:0c:02:00;parent_host=21:00:00:24:ff:2f:90:5f" > to /sys/kernel/scst_tgt/targets/qla2x00t/mgmt > [16942I] 18:56:31.105 - writeStringToFile: Wrote "add UVTL_DEVICE12 0" > to > /sys/kernel/scst_tgt/targets/qla2x00t/51:01:04:f0:18:0c:02:00/luns/mgmt > [16942I] 18:56:31.115 - writeStringToFile: Wrote "1" to > /sys/kernel/scst_tgt/targets/qla2x00t/51:01:04:f0:18:0c:02:00/enabled > [16942I] 18:56:36.117 - writeStringToFile: Wrote "0" to > /sys/kernel/scst_tgt/targets/qla2x00t/51:01:04:f0:18:0c:02:00/enabled > [16942I] 18:56:36.148 - writeStringToFile: Wrote "del_target > 51:01:04:f0:18:0c:02:00" to /sys/kernel/scst_tgt/targets/qla2x00t/mgmt > [16942I] 18:56:57.520 - writeStringToFile: Wrote "0" to > /sys/kernel/scst_tgt/targets/qla2x00t/21:00:00:24:ff:2f:90:5f/enabled > [16942I] 18:57:19.024 - writeStringToFile: Wrote "0" to > /sys/kernel/scst_tgt/targets/qla2x00t/21:00:00:24:ff:2f:90:5e/enabled > > After the event, the FC target VM sometimes completely freezes with > CPU going to 100%, requiring a power down/up cycle. Sometimes this > also happens when I manually echo "0" to one of the /enabled controls. > > To remove the SCST User device I simply close the corresponding > handle. I then disable and remove the NPIV port. Is this safe or > should I first remove the device from the NPIV port before disabling > and removing the NPIV port? What is the proper cleanup sequence? > > Anything else I can do to debug this further? > > Thanks, > Rob > > > > _______________________________________________ > Scst-devel mailing list > https://lists.sourceforge.net/lists/listinfo/scst-devel |