Good day Bart, thanks again for fix the fall speed of reading, but faced with a new problem for about 16000 cycles recreate targets, block devices and cease to give to the network a new goal old is active, then also freezing when connected iniziatore suspends the execution of commands and hangs. All this happens for about 16000+ re-creations. Tried on Debian also 8.9 to 9.8, a symptom of the same at 16000, possibly kakoeto internal limit on the total number of the count of Loon, although if you disconnect from a target command execution is going on but the new targets are not working, please help me to understand why two weeks of testing, the problem manifests itself roughly in the same place and also hang, and after a time kernel:[ 2991.945912] BUG: soft lockup - CPU#1 stuck for 22s! [scst_uid132268:21137] the Cycle of deleting and then adding a block device. Right now I will add a new log from the moment of the failure.
[ ... ]
With which SCST version did this occur? Please provide the output of cat /sys/kernel/scst_tgt/trace_cmds while the hang occurs and the output of cat /sys/kernel/scst_tgt/trace_mcmds. Please provide the call trace from /var/log/messages, /var/log/syslog or from the journalctl output.
The time before the failure, everything is still fine.
Has gone crashing and if the connected initiator, the commands themselves already are not executed.
these parameters are empty
cat /sys/kernel/scsi_tgt/trace_cmds
cat /sys/kernel/scst_tgt/trace_mcmds
root@COMP11:~# modinfo scst
filename: /lib/modules/3.16.0-4-amd64/extra/scst.ko
version: 3.4.0-pre1
description: SCSI target core
license: GPL
author: Vladislav Bolkhovitin
srcversion: 107CB1D6111F08729766E9D
depends: dlm,scsi_mod,crc-t10dif
vermagic: 3.16.0-4-amd64 SMP mod_unload modversions
parm: alua_invariant_check:Enables a run-time ALUA state invariant check. (bool)
parm: scst_threads:SCSI target threads count (int)
parm: scst_max_cmd_mem:Maximum memory allowed to be consumed by all SCSI commands of all devices at any given time in MB (int)
parm: scst_max_dev_cmd_mem:Maximum memory allowed to be consumed by all SCSI commands of a device at any given time in MB (int)
parm: forcibly_close_sessions:If enabled, close the sessions associated with an access control group (ACG) when an ACG is deleted via sysfs instead of returning -EBUSY (int)
parm: auto_cm_assignment:Enables the copy managers auto registration (int)
root@COMP11:~# modinfo iscsi_scst
filename: /lib/modules/3.16.0-4-amd64/extra/iscsi-scst.ko
description: SCST iSCSI Target
license: GPL
version: 3.4.0-pre1
srcversion: 0A30D7D27B21CE845A4541D
depends: scst,libcrc32c
vermagic: 3.16.0-4-amd64 SMP mod_unload modversions
root@COMP11:~#
Last edit: valera 2020-09-24
Last edit: valera 2020-09-24
New Test Checked out revision 8180.
Linux Debian 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
A couple of minutes before the failure.
The point of failure journalctl
messages
syslog
апр 13 00:46:01 Debian kernel: [664]: scst: scst_translate_lun:5025:tgt_dev for LUN 74 not found, command to unexisting LU (initiator copy_manager_sess, target copy_manager_tgt)?
A feeling that he is over the moon if this is how you can make them infinite? if you can help record video of the simulation when and what causes the failure.
Last edit: valera 2019-04-12
Thank you for having provided the call traces. These show that waiting for commands happens on the context of a worker thread (inside sysfs_work_thread_fn()). I will see whether I can rework that code such that the waiting no longer happens on the context of a worker thread.
If you can fix the fall, please write I will test, thank you.
Please correct this error, thanks in advance.
Good day. I also have a crash after a long use. Rebooting the server solves the problem, but this is not the best solution. Can I count on fix this bug?
Please create a new ticket and report all relevant details instead of replying to an existing bug report.
There are chances that fixed the destruction, all the experiments stopped, or this bug is not a priority for developers?.
The first step in fixing a reported issue is to reproduce it. I have not yet been able to reproduce the reported behavior.
Although I still have not been able to reproduce the reported behavior, a candidate fix has been checked in on the trunk. Further feedback is welcome.
manifestation of panic kernel need half an hour of time, please check and whether you can correct, thank you.
Last edit: valera 2020-09-24
New test id 16386 remount
Linux DEBIAN10 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u1 (2019-07-19) x86_64 GNU/Linux
cat /sys/kernel/scst_tgt/trace_cmds
cmd 000000000ddac654: state EXEC_CHECK_BLOCKING; op REPORT LUNS; proc time 608 sec; tgtt iscsi; tgt scst1; session iqn.2000-01.org.etherboot:COMP01; grp scst1; LUN 0; ini iqn.2000-01.org.etherboot:COMP01; cdb a0 00 00 00 00 00 00 00 00 10 00 00
июл 23 09:38:47 DEBIAN10 kernel: INFO: task scst_uid:339 blocked for more than 120 seconds.
июл 23 09:38:47 DEBIAN10 kernel: Tainted: G OE 4.19.0-5-amd64 #1 Debian 4.19.37-5+deb10u1
июл 23 09:38:47 DEBIAN10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
июл 23 09:38:47 DEBIAN10 kernel: scst_uid D 0 339 2 0x80000000
июл 23 09:38:47 DEBIAN10 kernel: Call Trace:
июл 23 09:38:47 DEBIAN10 kernel: ? __schedule+0x2a2/0x870
июл 23 09:38:47 DEBIAN10 kernel: schedule+0x28/0x80
июл 23 09:38:47 DEBIAN10 kernel: schedule_timeout+0x26d/0x390
июл 23 09:38:47 DEBIAN10 kernel: ? vdev_find+0x60/0x60 [scst_vdisk]
июл 23 09:38:47 DEBIAN10 kernel: ? wake_up_klogd+0x30/0x40
июл 23 09:38:47 DEBIAN10 kernel: wait_for_completion+0x11f/0x190
июл 23 09:38:47 DEBIAN10 kernel: ? wake_up_q+0x70/0x70
июл 23 09:38:47 DEBIAN10 kernel: scst_free_device+0x6d/0x90 [scst]
июл 23 09:38:47 DEBIAN10 kernel: vdev_del_device+0x1e/0x40 [scst_vdisk]
июл 23 09:38:47 DEBIAN10 kernel: vdisk_del_device+0x3c/0x50 [scst_vdisk]
июл 23 09:38:47 DEBIAN10 kernel: scst_devt_mgmt_store_work_fn+0x18b/0x1a0 [scst]
июл 23 09:38:47 DEBIAN10 kernel: sysfs_work_thread_fn+0xff/0x330 [scst]
июл 23 09:38:47 DEBIAN10 kernel: ? finish_wait+0x80/0x80
июл 23 09:38:47 DEBIAN10 kernel: ? scst_devt_mgmt_store+0x20/0x20 [scst]
июл 23 09:38:47 DEBIAN10 kernel: kthread+0x112/0x130
июл 23 09:38:47 DEBIAN10 kernel: ? kthread_bind+0x30/0x30
июл 23 09:38:47 DEBIAN10 kernel: ret_from_fork+0x22/0x40
июл 23 14:35:57 DEBIAN10 kernel: scst: Removed all devices from group scst3
июл 23 14:35:57 DEBIAN10 kernel: scst: Removed LUN 0 from group scst3 (target scst3)
июл 23 14:35:57 DEBIAN10 kernel: ------------[ cut here ]------------
июл 23 14:35:57 DEBIAN10 kernel: kernel BUG at mm/slub.c:294!
июл 23 14:35:57 DEBIAN10 kernel: invalid opcode: 0000 [#1] SMP NOPTI
июл 23 14:35:57 DEBIAN10 kernel: CPU: 0 PID: 11583 Comm: kworker/0:1 Tainted: G OE 4.19.0-5-amd64 #1 Debian 4.19.37-5+deb10u1
июл 23 14:35:57 DEBIAN10 kernel: Hardware name: System manufacturer System Product Name/M3A76-CM, BIOS 1001 07/09/2009
июл 23 14:35:57 DEBIAN10 kernel: Workqueue: events scst_tgt_dev_free_workfn [scst]
июл 23 14:35:57 DEBIAN10 kernel: RIP: 0010:kmem_cache_free+0x1ac/0x1d0
июл 23 14:35:57 DEBIAN10 kernel: Code: e4 5b 5d 41 5c c3 48 89 c5 e9 8e fe ff ff 48 89 fe 41 b8 01 00 00 00 48 89 d9 48 89 da 48 89 ef e8 e9 fa ff ff e9 10 ff ff ff <0f> 0b
июл 23 14:35:57 DEBIAN10 kernel: RSP: 0018:ffffb0690617fdf0 EFLAGS: 00010246
июл 23 14:35:57 DEBIAN10 kernel: RAX: ffff9fedac8b2000 RBX: ffff9fedac8b2000 RCX: ffff9fedac8b2000
июл 23 14:35:57 DEBIAN10 kernel: RDX: 000000000000286c RSI: ffff9fedb3a276d0 RDI: ffffd41844b22c00
июл 23 14:35:57 DEBIAN10 kernel: RBP: ffff9fedb2812c00 R08: 0000000000000001 R09: ffffffffc097d371
июл 23 14:35:57 DEBIAN10 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9fedafa7b480
июл 23 14:35:57 DEBIAN10 kernel: R13: ffff9fedac833d20 R14: ffff9fedb07586c0 R15: ffff9fedac8b2028
июл 23 14:35:57 DEBIAN10 kernel: FS: 0000000000000000(0000) GS:ffff9fedb3a00000(0000) knlGS:0000000000000000
июл 23 14:35:57 DEBIAN10 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
июл 23 14:35:57 DEBIAN10 kernel: CR2: 00007f7ecf00e690 CR3: 000000012d47a000 CR4: 00000000000006f0
июл 23 14:35:57 DEBIAN10 kernel: Call Trace:
июл 23 14:35:57 DEBIAN10 kernel: scst_free_tgt_dev+0xe1/0x170 [scst]
июл 23 14:35:57 DEBIAN10 kernel: scst_tgt_dev_free_workfn+0x32/0xd0 [scst]
июл 23 14:35:57 DEBIAN10 kernel: process_one_work+0x1a7/0x3a0
июл 23 14:35:57 DEBIAN10 kernel: worker_thread+0x30/0x390
июл 23 14:35:57 DEBIAN10 kernel: ? create_worker+0x1a0/0x1a0
июл 23 14:35:57 DEBIAN10 kernel: kthread+0x112/0x130
июл 23 14:35:57 DEBIAN10 kernel: ? kthread_bind+0x30/0x30
июл 23 14:35:57 DEBIAN10 kernel: ret_from_fork+0x22/0x40
июл 23 14:35:57 DEBIAN10 kernel: Modules linked in: dm_mod loop scst_vdisk(OE) isert_scst(OE) iscsi_scst(OE) scst(OE) rdma_cm iw_cm ib_cm ib_core dlm configfs libcrc32c snd_
июл 23 14:35:57 DEBIAN10 kernel: libphy floppy
июл 23 14:35:57 DEBIAN10 kernel: ---[ end trace 46def2565fa40a20 ]---
июл 23 14:35:57 DEBIAN10 kernel: RIP: 0010:kmem_cache_free+0x1ac/0x1d0
июл 23 14:35:57 DEBIAN10 kernel: Code: e4 5b 5d 41 5c c3 48 89 c5 e9 8e fe ff ff 48 89 fe 41 b8 01 00 00 00 48 89 d9 48 89 da 48 89 ef e8 e9 fa ff ff e9 10 ff ff ff <0f> 0b
июл 23 14:35:57 DEBIAN10 kernel: RSP: 0018:ffffb0690617fdf0 EFLAGS: 00010246
июл 23 14:35:57 DEBIAN10 kernel: RAX: ffff9fedac8b2000 RBX: ffff9fedac8b2000 RCX: ffff9fedac8b2000
июл 23 14:35:57 DEBIAN10 kernel: RDX: 000000000000286c RSI: ffff9fedb3a276d0 RDI: ffffd41844b22c00
июл 23 14:35:57 DEBIAN10 kernel: RBP: ffff9fedb2812c00 R08: 0000000000000001 R09: ffffffffc097d371
июл 23 14:35:57 DEBIAN10 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9fedafa7b480
июл 23 14:35:57 DEBIAN10 kernel: R13: ffff9fedac833d20 R14: ffff9fedb07586c0 R15: ffff9fedac8b2028
июл 23 14:35:57 DEBIAN10 kernel: FS: 0000000000000000(0000) GS:ffff9fedb3a00000(0000) knlGS:0000000000000000
июл 23 14:35:57 DEBIAN10 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
июл 23 14:35:57 DEBIAN10 kernel: CR2: 00007f7ecf00e690 CR3: 000000012d47a000 CR4: 00000000000006f0
июл 23 14:36:01 DEBIAN10 iscsi-scstd[374]: Connect from 192.168.0.1:51346 to 192.168.0.11:3260
root@DEBIAN10:~# cat /sys/kernel/scst_tgt/trace_cmds
cat: /sys/kernel/scst_tgt/trace_cmds: Ресурс временно недоступен
Related
Tickets:
#1Does trunk r8478 work better?
8477 panic
8478 test write the result
Last edit: valera 2019-07-24
It would help a lot if you could mention all the steps involved in the
procedure that fails. What does e.g. replace sda1 on sda2 mean? Do you
change the SCST sysfs filename attribute or do you use LUN replacement?
8478 test
Jul 25 12:26:08 Debian kernel: [ 1251.188745] scst: Added device scst1 to group scst1 (LUN 0, flags 0x2) to target scst1
Jul 25 12:26:08 Debian kernel: [ 1251.203667] scst: Removed all devices from group scst1
Jul 25 12:26:08 Debian kernel: [ 1251.203679] scst: Removed LUN 0 from group scst1 (target scst1)
Jul 25 12:26:08 Debian iscsi-scstd: Can't destroy target Device or resource busy 3
Jul 25 12:26:08 Debian iscsi-scstd: Can't send mgmt reply (cookie 65569, result -16, res -22): Invalid argument
Jul 25 12:26:08 Debian kernel: [ 1251.215931] scst: Removed LUN 16390 from group copy_manager_tgt (target copy_manager_tgt)
Jul 25 12:26:08 Debian kernel: [ 1251.231592] dev_vdisk: Detached virtual device scst1 ("/dev/sda2")
Jul 25 12:26:08 Debian kernel: [ 1251.231610] scst: Detached from virtual device scst1 (id 16390)
Jul 25 12:29:25 Debian kernel: [ 1447.404283] INFO: task scst_uid:412 blocked for more than 120 seconds.
Jul 25 12:29:25 Debian kernel: [ 1447.404303] Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.144-3.1
Jul 25 12:29:25 Debian kernel: [ 1447.404315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 25 12:29:25 Debian kernel: [ 1447.404328] scst_uid D 0 412 2 0x00000000
Jul 25 12:29:25 Debian kernel: [ 1447.404337] ffff99b9b10b7ac0 0000000000000000 ffff99b9ae0b23c0 ffff99b9b7c18980
Jul 25 12:29:25 Debian kernel: [ 1447.404344] ffffffff8a611500 ffffa4fd80b8bca0 ffffffff8a0144b9 ffffffffc08f00ae
Jul 25 12:29:25 Debian kernel: [ 1447.404350] 0000000000000096 ffff99b9b7c18980 ffffffff89ac4cf2 ffff99b9ae0b23c0
Jul 25 12:29:25 Debian kernel: [ 1447.404356] Call Trace:
Jul 25 12:29:25 Debian kernel: [ 1447.404371] [<ffffffff8a0144b9>]</ffffffff8a0144b9> ? __schedule+0x239/0x6f0
Jul 25 12:29:25 Debian kernel: [ 1447.404381] [<ffffffff89ac4cf2>]</ffffffff89ac4cf2> ? up+0x12/0x60
Jul 25 12:29:25 Debian kernel: [ 1447.404387] [<ffffffff8a0149a2>]</ffffffff8a0149a2> ? schedule+0x32/0x80
Jul 25 12:29:25 Debian kernel: [ 1447.404392] [<ffffffff8a017d4d>]</ffffffff8a017d4d> ? schedule_timeout+0x1dd/0x380
Jul 25 12:29:25 Debian kernel: [ 1447.404399] [<ffffffff89ad2676>]</ffffffff89ad2676> ? vprintk_emit+0x316/0x4d0
Jul 25 12:29:25 Debian kernel: [ 1447.404406] [<ffffffff8a0153e1>]</ffffffff8a0153e1> ? wait_for_completion+0xf1/0x130
Jul 25 12:29:25 Debian kernel: [ 1447.404410] [<ffffffff89aa5a70>]</ffffffff89aa5a70> ? wake_up_q+0x70/0x70
Jul 25 12:29:25 Debian kernel: [ 1447.404457] [<ffffffffc08c69ed>]</ffffffffc08c69ed> ? scst_free_device+0x6d/0x90 [scst]
Jul 25 12:29:25 Debian kernel: [ 1447.404469] [<ffffffffc088ef9b>]</ffffffffc088ef9b> ? vdev_del_device+0x1b/0x50 [scst_vdisk]
Jul 25 12:29:25 Debian kernel: [ 1447.404479] [<ffffffffc088f00a>]</ffffffffc088f00a> ? vcdrom_del_device+0x3a/0x80 [scst_vdisk]
Jul 25 12:29:25 Debian kernel: [ 1447.404483] [<ffffffff8a01673e>]</ffffffff8a01673e> ? mutex_lock+0xe/0x30
Jul 25 12:29:25 Debian kernel: [ 1447.404520] [<ffffffffc08d49df>]</ffffffffc08d49df> ? scst_devt_mgmt_store_work_fn+0x16f/0x210 [scst]
Jul 25 12:29:25 Debian kernel: [ 1447.404558] [<ffffffffc08d4d49>]</ffffffffc08d4d49> ? sysfs_work_thread_fn+0xe9/0x300 [scst]
Jul 25 12:29:25 Debian kernel: [ 1447.404564] [<ffffffff89abd350>]</ffffffff89abd350> ? prepare_to_wait_event+0xf0/0xf0
Jul 25 12:29:25 Debian kernel: [ 1447.404600] [<ffffffffc08d4c60>]</ffffffffc08d4c60> ? scst_alloc_sysfs_work+0xc0/0xc0 [scst]
Jul 25 12:29:25 Debian kernel: [ 1447.404606] [<ffffffff89a9a5d9>]</ffffffff89a9a5d9> ? kthread+0xd9/0xf0
Jul 25 12:29:25 Debian kernel: [ 1447.404612] [<ffffffff89a9a500>]</ffffffff89a9a500> ? kthread_park+0x60/0x60
Jul 25 12:29:25 Debian kernel: [ 1447.404617] [<ffffffff8a0193e4>]</ffffffff8a0193e4> ? ret_from_fork+0x44/0x70
an example of a failure
Replacing sda1 with sda2 is just an example of replacing the target point for work, the work itself will be with a snapshot of the block device, after each reconstruction the machine receives a reference copy of the data, that is, when you restart windows and its data is always clean. All debris and viruses are destroyed. The problem is that when the target reaches 16000 cycles, it freezes and the point to reanimate the meringue server restart is no longer possible.. In the screenshots painted stage failure. Can You record a video of the experience so it will be clear?
new trunk test 8484 crashing at approximately the same point id16386
Even if I increase the re-creation interval, the scst freezes when it reaches 16000+ re-creatе
https://www.youtube.com/watch?v=easPv9cis5o
Last edit: valera 2019-07-25
Thank you for having provided a call trace. That is very helpful. Does the following patch help?
Trunk 8487 video failure
https://youtu.be/NlvdGOykQkU
Is this patch in trunk 8487? and I don't know how to use it if introduced, not helped again crashing.