#22 IGB LSO is using hardcoded 0 NUMA node.

open
3.4.7 (1)
igb
1
2015-02-07
2012-07-17
No

Hi,

on my new AMD based hardware, igb is causing kernel crash because of bug in LRO + NUMA code.

My new hardware vendor started shipping box with no NUMA node-0 available. All I see is node 1 and 3. while LRO code while initializing uses hardcoded 0 node.

The fix is very simple, I just had to use what adapter already has. Following is the changes I did to avoid this crash.

--- igb_main.c 2012-04-09 14:10:06.000000000 -0700
+++ /home/sp/igb-3.4.7/src/igb_main.c 2012-07-16 17:08:16.469571709 -0700
@@ -1153,6 +1153,7 @@
netif_napi_add(adapter->netdev, &q_vector->napi, igb_poll, 64);
adapter->q_vector[v_idx] = q_vector;
#ifndef IGB_NO_LRO
+ q_vector->numa_node = adapter->node;
if (v_idx < adapter->num_rx_queues) {
int size = sizeof(struct igb_lro_list);
q_vector->lrolist = vzalloc_node(size, q_vector->numa_node);

You guys might want to add HAVE_DEVICE_NUMA_NODE functionality but above patch is what I need for now.

numactl --hardware
available: 2 nodes (1,3)
node 1 size: 16381 MB
node 1 free: 15130 MB
node 3 size: 16368 MB
node 3 free: 15822 MB

Kernel Crash associated with it.

BUG: unable to handle kernel paging request at 0000000000001c08
IP: [<ffffffff8108c435>] __alloc_pages_nodemask+0x95/0x790
PGD 80db23067 PUD 80d5e9067 PMD 0
Oops: 0000 [#1]
e1000e: Intel(R) PRO/1000 Network Driver - 1.6.3-NAPI-SL
e1000e: Copyright(c) 1999 - 2011 Intel Corporation.
e1000e 0000:03:00.0: Disabling ASPM L0s
e1000e 0000:03:00.0: PCI INT A -> GSI 48 (level, low) -> IRQ 48
e1000e 0000:03:00.0: setting latency timer to 64
PREEMPT SMP oot filesystem in read-only mode:
CPU 12 vpart script:
Modules linked in: e1000e(+) igb(+) streamline
Restarting system.

Pid: 1424, comm: modprobe Not tainted 3.1.10 #1 empty empty/S8236-IL
RIP: 0010:[<ffffffff8108c435>] [<ffffffff8108c435>] alloc_pages_nodemask+0x95/0x790
RSP: 0018:ffff88080d5d5b48 EFLAGS: 00010202
RAX: 0000000000000010 RBX: 00000000000082d2 RCX: 0000000000000000
RDX: 0000000000001c00 RSI: 0000000000000000 RDI: 00000000000082d2
RBP: 0000000000001c00 R08: 0000000000000000 R09: ffff88080df883e0
R10: 0000000000000000 R11: 0000000000000010 R12: 00000000000082d2
R13: 8000000000000163 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f364778c6e0(0000) GS:ffff88081ea00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000001c08 CR3: 000000080d422000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 1424, threadinfo ffff88080d5d4000, task ffff88080d53c800)
Stack:
0000000000000000 ffffffff8118088e ffff88080df80358 ffffffff810b9113
000492d000000001 000000001efd9c08 0000000000000010 ffff88041f400080
0000000000000002 00000000000080d0 ffff88080d5d5bc8 ffffffff810b8f27
Call Trace:
[<ffffffff8118088e>] ? rb_insert_color+0xde/0x110
[<ffffffff810b9113>] ?
cache_alloc_node+0x103/0x140
[<ffffffff810b8f27>] ? fallback_alloc+0xe7/0x1d0
[<ffffffff810b8b33>] ? kmem_cache_alloc_node+0xf3/0x120
[<ffffffff810ad464>] ?
vmalloc_node_range+0x174/0x230
[<ffffffffa00dbcd2>] ? igb_alloc_q_vectors+0xe2/0x1e0 [igb]
[<ffffffff810ad54c>] ? vmalloc_node+0x2c/0x40
[<ffffffffa00dbcd2>] ? igb_alloc_q_vectors+0xe2/0x1e0 [igb]
[<ffffffffa00dbcd2>] ? igb_alloc_q_vectors+0xe2/0x1e0 [igb]
[<ffffffffa00eb55c>] ? igb_probe+0x61c/0x14b0 [igb]
[<ffffffff8111d36a>] ? sysfs_addrm_finish+0x1a/0xc0
[<ffffffff8111cc27>] ? sysfs_add_one+0x27/0xd0
[<ffffffff8119fa72>] ? local_pci_probe+0x12/0x20
[<ffffffff8119fd31>] ? pci_device_probe+0x101/0x110
[<ffffffff81204785>] ? driver_probe_device+0x115/0x1a0
[<ffffffff81204a6b>] ?
driver_attach+0x8b/0x90
[<ffffffff812049e0>] ? device_attach+0x90/0x90
[<ffffffff8120386d>] ? bus_for_each_dev+0x4d/0x80
[<ffffffff81204210>] ? bus_add_driver+0xd0/0x270
[<ffffffff81204d80>] ? driver_register+0x60/0x140
[<ffffffff8119ffc5>] ? pci_register_driver+0x55/0xc0
[<ffffffffa00f9000>] ? 0xffffffffa00f8fff
[<ffffffff810002b2>] ? do_one_initcall+0x122/0x170
[<ffffffff8106bfee>] ? sys_init_module+0x8e/0x1d0
[<ffffffff8140d53b>] ? system_call_fastpath+0x16/0x1b
Code: f8 01 89 d8 45 19 f6 c1 e8 13 41 f7 d6 83 e0 01 41 83 e6 02 41 09 c6 41 89 dc 44 23 25 b5 b2 92 00 44 89 e0 83 e0 10 89 44 24 30
83 7d 08 00 0f 84 c1 00 00 00 4d 85 ff 48 c7 c2 e0 75 9b 81
RIP
[<ffffffff8108c435>]
alloc_pages_nodemask+0x95/0x790
RSP <ffff88080d5d5b48>
CR2: 0000000000001c08
---[ end trace b76969c731f633a9 ]---

Related

Patches: #1

Discussion

  • Masood Mehmood

    Masood Mehmood - 2012-07-17
    • group: e100 --> igb
     
  • Jesse Brandeburg

    • assigned_to: Carolyn Wyborny
     
  • Carolyn Wyborny

    Carolyn Wyborny - 2012-07-19

    Thanks for the report. I'll take a look at your patch and the issue and get back to you on what our solution will be.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks