|
From: Daniel S. <Dan...@dt...> - 2014-07-22 17:44:45
|
Hi all,
we are using SCST as a FC target which works fine for some month. Now, we additionally setup SCST on a new Supermicro (Motherboard X9SRH-7T, Adaptec RAID 72405, Centos 6 x64) server as an ISCSI-target.
The scst.conf defines 4 targets (each one LUN), each bound ('allowed_portal' + CHAP) to one IP-address (3x VLAN eth1.x, 1x eth0). The current active initiators (2 x RHEL6, 1 x Debian) are able to use the target LUN's well.
>From time to time, Linux kernel panics and the server stop working. "Time to time" means
- while having 2-3 hours high IO-load
- sometimes while restarting the open- ISCSI initiators
- sometimes just after some hours of running 3 initiators without load.
The /var/log/messages shows nothing, so the kernel was not able to write anything to this file. Also, enabling remote syslogging to another server didn't show anything related to the panic )-:
On the IPMI console, I can see only something like
do_IRQ: 0.154 No irq handler for vector (irq -1)
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
CPU: 3 PID: 2763 Comm: irqbalance Tainted: G W 3.14.13 #1
Hardware name: Supermicro X9SRH-7F/7TF/X9SRH-7F77TF, BIOS 3.00 07/05/2013
Another crash, the console told me:
ixgbe 0000:01:00.0 eth0: Reset adapter
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1
CPU: 1 PID: 0 Comm: swapp/1 Trainted: G W 3.10.48 #1
Hardware name: Supermicro X9SRH-7F/7TF/X9SRH-7F77TF, BIOS 3.00 07/05/2013
Call Trace
dump_stack
panic
nativ_sched_clock
watchdog_overflow_callback
__oerf_event_overflow
...
We tried kernel 3.10.48 and also I updated to 3.14.13 (patched with the corresponding scst patches
scst-3.0/scst/kernel/scst_exec_req_fifo-3.XX.patch
scst-3.0/iscsi-scst/kernel/patches/put_page_callback-3.xx.xx.patch
We use the current SCST version r5701 (branches/3.0.x). The motherboard BIOS is up-to-date (BIOS 3.00 07/05/2013).
Relating to one of the kernel panics, I updated the ethernet module ixgbe from 3.19.1-k (kernel) to 3.21.2 (intel webpage). But this didn't help me either.
Without SCST, the server runs fine for many days. After installing AND using SCST, I get a panic every few hours
So, is there a possibility to debug this issue? I'm not that close to kernel-debugging (-:
best regards
Danny
|