Today I have already run three times into the following hang while running /etc/init.d/scst stop, even without having triggered any iSCSI or iSER login or logout:
sysrq: SysRq : Show Blocked State
task PC stack pid father
iscsi-scstd D ffff8800625f7a58 0 31819 1 0x00000006
Call Trace:
[<ffffffff81622ab7>]</ffffffff81622ab7> schedule+0x37/0x90
[<ffffffff816270d9>]</ffffffff816270d9> schedule_timeout+0x249/0x470
[<ffffffff810d2ed8>]</ffffffff810d2ed8> msleep+0x38/0x50
[<ffffffffa05df9e1>]</ffffffffa05df9e1> isert_portal_release+0xf1/0x140 [isert_scst]
[<ffffffffa05db869>]</ffffffffa05db869> isert_portal_remove+0x9/0x10 [isert_scst]
[<ffffffffa05db4da>]</ffffffffa05db4da> isert_close_all_portals+0x2a/0x50 [isert_scst]
[<ffffffffa05b407d>]</ffffffffa05b407d> target_del_all+0x8d/0x2d0 [iscsi_scst]
[<ffffffffa05adea5>]</ffffffffa05adea5> release+0x45/0xe0 [iscsi_scst]
[<ffffffff811d6dd8>]</ffffffff811d6dd8> __fput+0xe8/0x1f0
[<ffffffff811d6f19>]</ffffffff811d6f19> ____fput+0x9/0x10
[<ffffffff81084493>]</ffffffff81084493> task_work_run+0x83/0xb0
[<ffffffff8106622e>]</ffffffff8106622e> do_exit+0x3ee/0xc40
[<ffffffff81066b0b>]</ffffffff81066b0b> do_group_exit+0x4b/0xc0
[<ffffffff81073f5a>]</ffffffff81073f5a> get_signal+0x2ca/0x940
[<ffffffff8101bf43>]</ffffffff8101bf43> do_signal+0x23/0x660
[<ffffffff810022b3>]</ffffffff810022b3> exit_to_usermode_loop+0x73/0xb0
[<ffffffff81002cb0>]</ffffffff81002cb0> syscall_return_slowpath+0xb0/0xc0
[<ffffffff81628573>]</ffffffff81628573> entry_SYSCALL_64_fastpath+0xa6/0xa8
Do you succeed to unload the module after this hang?
From the call trace it looks that the hang is because the msleep of 100 ms in the code.
I had observed the following three times yesterday:
I didn't succeed to reproduce this issue.
Which tests did you run? what kernel? did you compile against MOFED?
This has been observed on a system running a 4.7.0 kernel with a whole bunch of kernel debugging options enabled, without MOFED and with the attached SCST configuration file.
Could you please attach the SCST configuration file again? The link is broken.
This hang probably means that we didn't free all the connections, so you will see the problem only when trying to unload the module or kill iscsi-scstd.
You can check isert_scst module refcount to see if there is a connection leak before trying to unload the module (every connection takes one refcount).
I did some tests on this kernel but it worked for me.
This happened while I was testing an ib_srpt modification. No iSCSI login was performed.
I have a fix, could you check it on your setup?
Sorry but even with that patch applied I still see the following hang during shutdown:
I have debug prints that helped me to do my last fix.
If you can reproduce this issue with my attached prints and send me the log it can help me alot to fix this bug.
This behavior is very easy to reproduce. All that is needed is to start and try to stop SCST in a VM (maybe scst.conf is relevant). Have you already tried to reproduce this behavior yourself?
Sure, We have many tests that start and stop SCST.
I fixed many flows that cause this bug but it worked for me and for our QA (I pushed all my fixes).
Maybe I am missing somthing.
Which branch you are running? Could you upload your scst.conf?
I'm using the trunk. I have attached the scst.conf file I use in my tests to this comment.
Fixed by trunk r7184.