From: Matteo T. <ma...@RM...> - 2014-10-13 08:50:56
|
Hi all, after almost 1 month I noticed the following strange BUG: drbd TOP: Starting receiver thread (from swapper/0 [0]) BUG: scheduling while atomic: swapper/0/0/0x00000100 Modules linked in: iscsi_scst(O) ib_srpt(O) scst_vdisk(O) scst(O) crc32c_intel drbd(O) libcrc32c autofs4 cpufreq_ondemand bonding ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa uinput iTCO_wdt iTCO_vendor_support raid0 raid10 microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core sg e1000e ptp pps_core ib_qib dca ib_mad ib_core acpi_cpufreq freq_table ext4 jbd2 mbcache raid1 sd_mod crc_t10dif crct10dif_common ahci libahci video 3w_9xxx dm_mirror dm_region_hash dm_log dm_mod [last unloaded: drbd] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.12.14 #1 Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 0000000000000000 ffff8802370039c0 ffffffff8153b1c2 0000000000000000 ffff880237012d00 ffff8802370039d0 ffffffff8107ac0b ffff880237003a70 ffffffff8153d7dd ffff880237003a00 ffff880230e38480 0000000000000024 Call Trace: <IRQ> [<ffffffff8153b1c2>] dump_stack+0x49/0x5f [<ffffffff8107ac0b>] __schedule_bug+0x4b/0x60 [<ffffffff8153d7dd>] __schedule+0x6ad/0x6f0 [<ffffffff8153d8e9>] schedule+0x29/0x70 [<ffffffff8153b375>] schedule_timeout+0x195/0x220 [<ffffffff8153f6aa>] ? common_interrupt+0x6a/0x6a [<ffffffff8153dcb6>] wait_for_completion+0xc6/0x100 [<ffffffff8107ff80>] ? try_to_wake_up+0x2c0/0x2c0 [<ffffffff8106ee1d>] kthread_create_on_node+0x9d/0x120 [<ffffffffa0533840>] ? drbd_destroy_device+0x180/0x180 [drbd] [<ffffffff8153b144>] ? printk+0x4d/0x4f [<ffffffff8105b176>] ? __flush_signals+0x36/0x40 [<ffffffffa0533a7a>] drbd_thread_start+0x11a/0x290 [drbd] [<ffffffffa0533cd8>] _drbd_thread_stop+0xe8/0x100 [drbd] [<ffffffffa0544996>] __drbd_set_state+0xb86/0xcd0 [drbd] [<ffffffffa0525c89>] _drbd_set_state.clone.0+0x39/0x50 [drbd] [<ffffffffa0527f2c>] request_timer_fn+0x3fc/0x560 [drbd] [<ffffffff810668a0>] ? flush_delayed_work+0x40/0x40 [<ffffffffa0527b30>] ? drbd_submit_req_private_bio+0x2b0/0x2b0 [drbd] [<ffffffff810581d9>] call_timer_fn+0x49/0x120 [<ffffffff81058db5>] run_timer_softirq+0x225/0x290 [<ffffffff8126bd10>] ? timerqueue_add+0x60/0xb0 [<ffffffffa0527b30>] ? drbd_submit_req_private_bio+0x2b0/0x2b0 [drbd] [<ffffffff81050dbf>] __do_softirq+0xef/0x260 [<ffffffff8107263b>] ? hrtimer_interrupt+0x15b/0x250 [<ffffffff8154905c>] call_softirq+0x1c/0x30 [<ffffffff81005805>] do_softirq+0x55/0x90 [<ffffffff81050af5>] irq_exit+0xb5/0xc0 [<ffffffff8154987a>] smp_apic_timer_interrupt+0x4a/0x5a [<ffffffff8154858a>] apic_timer_interrupt+0x6a/0x70 <EOI> [<ffffffff81434a6f>] ? cpuidle_enter_state+0x4f/0xd0 [<ffffffff81434a6b>] ? cpuidle_enter_state+0x4b/0xd0 [<ffffffff81434e6f>] cpuidle_idle_call+0xcf/0x160 [<ffffffff8100c5ee>] arch_cpu_idle+0xe/0x30 [<ffffffff8109ed5a>] cpu_idle_loop+0x9a/0x230 [<ffffffff8109ef60>] cpu_startup_entry+0x70/0x80 [<ffffffff81533f07>] rest_init+0x77/0x80 [<ffffffff81b1017c>] start_kernel+0x3e3/0x3f0 [<ffffffff81b0fbc7>] ? repair_env_string+0x5b/0x5b [<ffffffff81b0f59d>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81b0f6d8>] x86_64_start_kernel+0x139/0x140 bad: scheduling from the idle thread! CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W O 3.12.14 #1 Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 ffffffff81a10480 ffff880237003970 ffffffff8153b1c2 0000000000000000 ffff880237012d00 ffff880237003990 ffffffff810829f9 ffff8802370039a0 ffff880237012d00 ffff8802370039c0 ffffffff8107add3 ffff8802370039c0 Call Trace: <IRQ> [<ffffffff8153b1c2>] dump_stack+0x49/0x5f [<ffffffff810829f9>] dequeue_task_idle+0x29/0x40 [<ffffffff8107add3>] dequeue_task+0x93/0xb0 [<ffffffff8107ae12>] deactivate_task+0x22/0x30 [<ffffffff8153d5bc>] __schedule+0x48c/0x6f0 [<ffffffff8153d8e9>] schedule+0x29/0x70 [<ffffffff8153b375>] schedule_timeout+0x195/0x220 [<ffffffff8153f6aa>] ? common_interrupt+0x6a/0x6a [<ffffffff8153dcb6>] wait_for_completion+0xc6/0x100 [<ffffffff8107ff80>] ? try_to_wake_up+0x2c0/0x2c0 [<ffffffff8106ee1d>] kthread_create_on_node+0x9d/0x120 [<ffffffffa0533840>] ? drbd_destroy_device+0x180/0x180 [drbd] [<ffffffff8153b144>] ? printk+0x4d/0x4f [<ffffffff8105b176>] ? __flush_signals+0x36/0x40 [<ffffffffa0533a7a>] drbd_thread_start+0x11a/0x290 [drbd] [<ffffffffa0533cd8>] _drbd_thread_stop+0xe8/0x100 [drbd] [<ffffffffa0544996>] __drbd_set_state+0xb86/0xcd0 [drbd] [<ffffffffa0525c89>] _drbd_set_state.clone.0+0x39/0x50 [drbd] [<ffffffffa0527f2c>] request_timer_fn+0x3fc/0x560 [drbd] [<ffffffff810668a0>] ? flush_delayed_work+0x40/0x40 [<ffffffffa0527b30>] ? drbd_submit_req_private_bio+0x2b0/0x2b0 [drbd] [<ffffffff810581d9>] call_timer_fn+0x49/0x120 [<ffffffff81058db5>] run_timer_softirq+0x225/0x290 [<ffffffff8126bd10>] ? timerqueue_add+0x60/0xb0 [<ffffffffa0527b30>] ? drbd_submit_req_private_bio+0x2b0/0x2b0 [drbd] [<ffffffff81050dbf>] __do_softirq+0xef/0x260 [<ffffffff8107263b>] ? hrtimer_interrupt+0x15b/0x250 [<ffffffff8154905c>] call_softirq+0x1c/0x30 [<ffffffff81005805>] do_softirq+0x55/0x90 [<ffffffff81050af5>] irq_exit+0xb5/0xc0 [<ffffffff8154987a>] smp_apic_timer_interrupt+0x4a/0x5a [<ffffffff8154858a>] apic_timer_interrupt+0x6a/0x70 <EOI> [<ffffffff81434a6f>] ? cpuidle_enter_state+0x4f/0xd0 [<ffffffff81434a6b>] ? cpuidle_enter_state+0x4b/0xd0 [<ffffffff81434e6f>] cpuidle_idle_call+0xcf/0x160 [<ffffffff8100c5ee>] arch_cpu_idle+0xe/0x30 [<ffffffff8109ed5a>] cpu_idle_loop+0x9a/0x230 [<ffffffff8109ef60>] cpu_startup_entry+0x70/0x80 [<ffffffff81533f07>] rest_init+0x77/0x80 [<ffffffff81b1017c>] start_kernel+0x3e3/0x3f0 [<ffffffff81b0fbc7>] ? repair_env_string+0x5b/0x5b [<ffffffff81b0f59d>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81b0f6d8>] x86_64_start_kernel+0x139/0x140 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 0 Oops: 0010 [#1] SMP Modules linked in: iscsi_scst(O) ib_srpt(O) scst_vdisk(O) scst(O) crc32c_intel drbd(O) libcrc32c autofs4 cpufreq_ondemand bonding ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa uinput iTCO_wdt iTCO_vendor_support raid0 raid10 microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core sg e1000e ptp pps_core ib_qib dca ib_mad ib_core acpi_cpufreq freq_table ext4 jbd2 mbcache raid1 sd_mod crc_t10dif crct10dif_common ahci libahci video 3w_9xxx dm_mirror dm_region_hash dm_log dm_mod [last unloaded: drbd] CPU: 7 PID: 52888 Comm: kthreadd Tainted: G W O 3.12.14 #1 Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012 task: ffff8800a3893540 ti: ffff8800771ac000 task.ti: ffff8800771ac000 RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffff8800771add60 EFLAGS: 00010046 RAX: ffffffff81613440 RBX: ffff880237012d00 RCX: 000000000000b7a8 RDX: 0000000000000005 RSI: ffffffff81a10480 RDI: ffff880237012d00 RBP: ffff8800771add88 R08: 0000000000000000 R09: 00087a835b6f23c0 R10: ffffffff816133e5 R11: 0000000000000001 R12: ffffffff81a10480 R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000086 FS: 0000000000000000(0000) GS:ffff8802370e0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000001407e0 Stack: ffffffff8107ae8a 00007f1bb02036a0 ffffffff81a10480 ffff880237012d00 ffffffff81a10c84 ffff8800771add98 ffffffff8107b992 ffff8800771addb8 ffffffff8107b9c1 ffff880237012d00 ffffffff81a10480 ffff8800771addd8 Call Trace: [<ffffffff8107ae8a>] ? enqueue_task+0x6a/0x80 [<ffffffff8107b992>] activate_task+0x22/0x30 [<ffffffff8107b9c1>] ttwu_activate+0x21/0x50 [<ffffffff8107c2bc>] ttwu_do_activate.clone.0+0x3c/0x60 [<ffffffff8107fec6>] try_to_wake_up+0x206/0x2c0 [<ffffffff8107ff92>] default_wake_function+0x12/0x20 [<ffffffff81077659>] __wake_up_common+0x59/0x90 [<ffffffffa0533840>] ? drbd_destroy_device+0x180/0x180 [drbd] [<ffffffff81079666>] complete+0x46/0x60 [<ffffffff8106e97e>] kthread+0x9e/0xe0 [<ffffffff8106e8e0>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8154792c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106e8e0>] ? kthread_freezable_should_stop+0x70/0x70 Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff8800771add60> CR2: 0000000000000000 ---[ end trace f6ab84a85de85d6c ]--- Kernel panic - not syncing: Fatal exception Is scst bad here? Regards, -- matteo |
From: Bart V. A. <bva...@ac...> - 2014-10-13 12:21:47
|
On 10/13/14 10:34, Matteo Tescione wrote: > after almost 1 month I noticed the following strange BUG: > > drbd TOP: Starting receiver thread (from swapper/0 [0]) > > BUG: scheduling while atomic: swapper/0/0/0x00000100 That's indeed a strange bug report. It would help if you could provide us the SCST configuration file (/etc/scst.conf) and also the SCST version (branch + revision, e.g. the output of the svn info command if SCST was retrieved via svn). Apparently this occurred with kernel version 3.12.14. Were any patches applied on top of this kernel, e.g. the SCST exec_req_fifo and/or put_page_callback patches ? With which DRBD version did this occur ? The upstream version or a version that has been obtained directly from LinBit ? Thanks, Bart. |
From: Matteo T. <ma...@RM...> - 2014-10-13 20:13:00
|
Bart, HANDLER vdisk_fileio { DEVICE RMNET_BC0 { filename /dev/drbd0 t10_dev_id 999999-999999 usn 999999 } DEVICE RMNET_BC1 { filename /dev/drbd1 t10_dev_id 888888-888888 usn 888888 rotational 0 } } TARGET_DRIVER iscsi { enabled 1 TARGET iqn.2014.RMnet-BC { allowed_portal xxxxxxxxxxx cpu_mask 00000000,000000e0 enabled 1 rel_tgt_id 1009 LUN 0 RMNET_BC0 LUN 1 RMNET_BC1 } } TARGET_DRIVER ib_srpt { TARGET fe80:0000:0000:0000:0011:7500:0070:a4b4 { enabled 1 cpu_mask 00000000,000000e0 LUN 0 RMNET_BC0 LUN 1 RMNET_BC1 } TARGET fe80:0000:0000:0000:0011:7500:0070:a4b5 { enabled 0 } } Path: scst-trunk URL: svn://svn.code.sf.net/p/scst/svn/trunk Repository Root: svn://svn.code.sf.net/p/scst/svn Repository UUID: d57e44dd-8a1f-0410-8b47-8ef2f437770f Revision: 5704 drbd version: 8.4.4 (api:1/proto:86-101) from standard package, running kernel 3.12.14 patched with scst_exec_req_fifo and put_page_callback applied. Thank you for your time, -- matteo ----- Messaggio originale ----- > Da: "Bart Van Assche" <bva...@ac...> > A: "Matteo Tescione" <ma...@rm...>, scs...@li... > Inviato: Lunedì, 13 ottobre 2014 14:21:34 > Oggetto: Re: [Scst-devel] strane kernel BUG > > On 10/13/14 10:34, Matteo Tescione wrote: > > after almost 1 month I noticed the following strange BUG: > > > > drbd TOP: Starting receiver thread (from swapper/0 [0]) > > > > BUG: scheduling while atomic: swapper/0/0/0x00000100 > > That's indeed a strange bug report. It would help if you could > provide > us the SCST configuration file (/etc/scst.conf) and also the SCST > version (branch + revision, e.g. the output of the svn info command > if > SCST was retrieved via svn). Apparently this occurred with kernel > version 3.12.14. Were any patches applied on top of this kernel, e.g. > the SCST exec_req_fifo and/or put_page_callback patches ? With which > DRBD version did this occur ? The upstream version or a version that > has > been obtained directly from LinBit ? > > Thanks, > > Bart. > > > ------------------------------------------------------------------------------ > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS > Reports > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer > http://p.sf.net/sfu/Zoho > _______________________________________________ > Scst-devel mailing list > https://lists.sourceforge.net/lists/listinfo/scst-devel > |
From: Bart V. A. <bva...@ac...> - 2014-10-14 07:02:22
|
On 10/13/14 22:12, Matteo Tescione wrote: > Path: scst-trunk > URL: svn://svn.code.sf.net/p/scst/svn/trunk > Repository Root: svn://svn.code.sf.net/p/scst/svn > Repository UUID: d57e44dd-8a1f-0410-8b47-8ef2f437770f > Revision: 5704 > > drbd version: 8.4.4 (api:1/proto:86-101) from standard package, > running kernel 3.12.14 patched with scst_exec_req_fifo and > put_page_callback applied. Hello Matteo, To me this looks like a DRBD bug. If a timeout of a DRBD request occurs the DRBD timeout handler (request_timer_fn()) is called, that function grabs a spinlock and next calls kthread_create_on_node() with that spinlock held. kthread_create_on_node() can sleep which means that it is not allowed to call that function with a spinlock held. I'm not sure why this particular behavior has not yet been noticed before. Can you post the first call stack from the e-mail at the start of this thread together with the kernel version number on a DRBD-related mailing list and ask the DRBD authors for their opinion ? Thanks, Bart. |