From: Daniel S. <Dan...@dt...> - 2014-07-22 17:44:45
|
Hi all, we are using SCST as a FC target which works fine for some month. Now, we additionally setup SCST on a new Supermicro (Motherboard X9SRH-7T, Adaptec RAID 72405, Centos 6 x64) server as an ISCSI-target. The scst.conf defines 4 targets (each one LUN), each bound ('allowed_portal' + CHAP) to one IP-address (3x VLAN eth1.x, 1x eth0). The current active initiators (2 x RHEL6, 1 x Debian) are able to use the target LUN's well. >From time to time, Linux kernel panics and the server stop working. "Time to time" means - while having 2-3 hours high IO-load - sometimes while restarting the open- ISCSI initiators - sometimes just after some hours of running 3 initiators without load. The /var/log/messages shows nothing, so the kernel was not able to write anything to this file. Also, enabling remote syslogging to another server didn't show anything related to the panic )-: On the IPMI console, I can see only something like do_IRQ: 0.154 No irq handler for vector (irq -1) Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3 CPU: 3 PID: 2763 Comm: irqbalance Tainted: G W 3.14.13 #1 Hardware name: Supermicro X9SRH-7F/7TF/X9SRH-7F77TF, BIOS 3.00 07/05/2013 Another crash, the console told me: ixgbe 0000:01:00.0 eth0: Reset adapter Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 CPU: 1 PID: 0 Comm: swapp/1 Trainted: G W 3.10.48 #1 Hardware name: Supermicro X9SRH-7F/7TF/X9SRH-7F77TF, BIOS 3.00 07/05/2013 Call Trace dump_stack panic nativ_sched_clock watchdog_overflow_callback __oerf_event_overflow ... We tried kernel 3.10.48 and also I updated to 3.14.13 (patched with the corresponding scst patches scst-3.0/scst/kernel/scst_exec_req_fifo-3.XX.patch scst-3.0/iscsi-scst/kernel/patches/put_page_callback-3.xx.xx.patch We use the current SCST version r5701 (branches/3.0.x). The motherboard BIOS is up-to-date (BIOS 3.00 07/05/2013). Relating to one of the kernel panics, I updated the ethernet module ixgbe from 3.19.1-k (kernel) to 3.21.2 (intel webpage). But this didn't help me either. Without SCST, the server runs fine for many days. After installing AND using SCST, I get a panic every few hours So, is there a possibility to debug this issue? I'm not that close to kernel-debugging (-: best regards Danny |