From: John A. <joh...@av...> - 2012-04-30 22:46:11
|
Dear e1000-devel, I'm wondering what kernel versions people are happily using in production with the ixgbe driver? I'm having network stability and performance issues with a 2.6.32-131 modified Red Hat el6 on a quad core Xeon Jasper Forest cpu. My nic is X520/82599 dual port. I wonder if this could be an ixgbe or ioatdma problem. Ixgbe is not mentioned in my stack traces. Hoping for advice. I could try a later kernel, especially one recommended by a happy ixgbe user. Any comment is much appreciated. Here's what I see. (just one cpu for brevity). This has been reported when using an old version of ixgbe as well as 3.9.15-NAPI. ioatdma 0000:00:0a.1: Channel halted, chanerr = 2 ioatdma 0000:00:0a.1: Channel halted, chanerr = 2 ioatdma 0000:00:0a.1: Channel halted, chanerr = 2 ioatdma 0000:00:0a.1: Channel halted, chanerr = 2 ioatdma 0000:00:0a.1: Channel halted, chanerr = 2 ioatdma 0000:00:0a.1: ioat2_timer_event: Channel halted (2) BUG: scheduling while atomic: process_name/6888/0x10000301 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler sunrpc tcp_htcp sr_mod cdrom raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ses enclosure sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e ioatdma ixgbe(U) dca pm8001(U) libsas scsi_transport_sas ext3 jbd mbcache sd_mod crc_t10dif usb_storage pata_acpi ata_generic ata_piix [last unloaded: scsi_wait_scan] Pid: 6888, comm: process_name Not tainted 2.6.32-foo-0 #7 Call Trace: <IRQ> [<ffffffff8104dab6>] ? __schedule_bug+0x66/0x70 [<ffffffff81477502>] ? thread_return+0x5db/0x779 [<ffffffff8104f05d>] ? scheduler_tick+0xdd/0x280 [<ffffffff810128e9>] ? read_tsc+0x9/0x20 [<ffffffff81090d03>] ? ktime_get+0x63/0xe0 [<ffffffff81029a2d>] ? lapic_next_event+0x1d/0x30 [<ffffffffa01c558c>] ? ioat2_timer_event+0x25c/0x270 [ioatdma] [<ffffffff8105748a>] ? __cond_resched+0x2a/0x40 [<ffffffffa01c558c>] ? ioat2_timer_event+0x25c/0x270 [ioatdma] [<ffffffff814777f0>] ? _cond_resched+0x30/0x40 [<ffffffff8100df96>] ? is_valid_bugaddr+0x16/0x40 [<ffffffff8124e4df>] ? report_bug+0x1f/0xc0 [<ffffffff8100f2af>] ? die+0x7f/0x90 [<ffffffff8147a184>] ? do_trap+0xc4/0x160 [<ffffffffa01c5330>] ? ioat2_timer_event+0x0/0x270 [ioatdma] [<ffffffffa01c5330>] ? ioat2_timer_event+0x0/0x270 [ioatdma] [<ffffffff8100ce55>] ? do_invalid_op+0x95/0xb0 [<ffffffffa01c558c>] ? ioat2_timer_event+0x25c/0x270 [ioatdma] [<ffffffff8105ff11>] ? vprintk+0x1d1/0x4f0 [<ffffffff81028e89>] ? native_send_call_func_single_ipi+0x39/0x40 [<ffffffff8109c081>] ? generic_exec_single+0xb1/0xc0 [<ffffffff8100befb>] ? invalid_op+0x1b/0x20 [<ffffffffa01c5330>] ? ioat2_timer_event+0x0/0x270 [ioatdma] [<ffffffffa01c558c>] ? ioat2_timer_event+0x25c/0x270 [ioatdma] [<ffffffffa01c5579>] ? ioat2_timer_event+0x249/0x270 [ioatdma] [<ffffffff810128e9>] ? read_tsc+0x9/0x20 [<ffffffff81071ea7>] ? run_timer_softirq+0x197/0x340 [<ffffffff810676a1>] ? __do_softirq+0xc1/0x1d0 [<ffffffff8100c26c>] ? call_softirq+0x1c/0x30 <EOI> [<ffffffff8100dea5>] ? do_softirq+0x65/0xa0 [<ffffffff81067fe8>] ? local_bh_enable_ip+0x98/0xa0 [<ffffffff814798fb>] ? _spin_unlock_bh+0x1b/0x20 [<ffffffffa01c486f>] ? ioat2_cleanup_tasklet+0x8f/0xa0 [ioatdma] [<ffffffffa01c3743>] ? ioat2_is_complete+0x83/0xd0 [ioatdma] [<ffffffff8141c38f>] ? tcp_recvmsg+0x75f/0xe90 [<ffffffff81476f75>] ? thread_return+0x4e/0x779 [<ffffffff8143c55c>] ? inet_recvmsg+0x5c/0x90 [<ffffffff813d53b3>] ? sock_recvmsg+0x133/0x160 [<ffffffff81086100>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8109810e>] ? futex_wake+0x10e/0x120 [<ffffffff8109a071>] ? do_futex+0x121/0xb00 [<ffffffff8104ed13>] ? perf_event_task_sched_out+0x33/0x80 [<ffffffff81168779>] ? fget_light+0x9/0x90 [<ffffffff813d570e>] ? sys_recvfrom+0xee/0x180 [<ffffffff810097ac>] ? __switch_to+0x1ac/0x320 [<ffffffff81476f75>] ? thread_return+0x4e/0x779 [<ffffffff8109aacb>] ? sys_futex+0x7b/0x170 [<ffffffff8100c5d5>] ? math_state_restore+0x45/0x60 [<ffffffff8100b132>] ? system_call_fastpath+0x16/0x1b ------------[ cut here ]------------ kernel BUG at drivers/dma/ioat/dma_v2.c:315! In my sources that line is in ioat2_timer_event and it looks like it thinks a setup problem happened elsewhere. /* when halted due to errors check for channel * programming errors before advancing the completion state */ if (is_ioat_halted(status)) { u32 chanerr; chanerr = readl(chan->reg_base + IOAT_CHANERR_OFFSET); dev_err(to_dev(chan), "%s: Channel halted (%x)\n", __func__, chanerr); if (test_bit(IOAT_RUN, &chan->state)) BUG_ON(is_ioat_bug(chanerr)); else /* we never got off the ground */ return; } Thanks much, |