From: Andrew J. S. <as...@te...> - 2012-06-28 16:24:31
|
Hi, I sent this a week ago, and haven't gotten any response. Does anybody have any thoughts to share on these issues? It seems to me that the driver should be able to provide affinity_hint information to the kernel... thanks, Andy On Thu, Jun 21, 2012 at 10:59:13AM -0400, Andrew J. Schorr wrote: > Hi, > > The affinity_hint patch works as expected. It enables me to deliver the > interrupts to selected cpus. For example, I am configuring with the following > options: > > sh-4.2$ cat /etc/modprobe.d/igb.conf > options igb RSS=4,4,4,4 > options igb AFFINITY=1,1,1,1 > > I can see in /proc/interrupts that all network interrupts are going to CPU0. > This is basically equivalent to the situation I had in Fedora 14. > Unfortunately, the Fedora 14 system (with kernel 2.6.35.14-106.fc14.x86_64 and > igb version 2.1.0-k2) still seems to outperform this Fedora 16 system with a > fairly current 3.4.2 kernel and newest 3.4.7 igb driver. > > The problem with steady packet loss that I think was due to the deep C-states > on various cpus receiving interrupts has been fixed. Using the affinity_hint > patch seems to solve that problem by concentrating all the interrupts on CPU0. > But I am still having occasional hiccups resulting in kernel backtraces like > this one (using kernel 3.4.2-1.fc16.x86_64): > > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.487732] swapper/0: page allocation failure: order:0, mode:0x4020 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.501726] Pid: 0, comm: swapper/0 Tainted: G O 3.4.2-1.fc16.x86_64 #1 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.518207] Call Trace: > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.523579] <IRQ> [<ffffffff81123ef6>] warn_alloc_failed+0xf6/0x160 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.537775] [<ffffffff81547b1a>] ? __udp_queue_rcv_skb+0x4a/0x160 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.551383] [<ffffffff81127ac8>] __alloc_pages_nodemask+0x6f8/0x950 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.565376] [<ffffffff8115f866>] alloc_pages_current+0xb6/0x120 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.578663] [<ffffffff8116ab90>] new_slab+0x2e0/0x2f0 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.590051] [<ffffffff815f044e>] __slab_alloc+0x30f/0x424 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.602206] [<ffffffff8151f470>] ? ip_rcv_finish+0x370/0x370 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.614933] [<ffffffff814dcfa4>] ? __netdev_alloc_skb+0x24/0x50 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.628235] [<ffffffff8151f87a>] ? ip_local_deliver+0x4a/0x90 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.641153] [<ffffffff8116dab7>] __kmalloc_node_track_caller+0x97/0x1f0 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.655983] [<ffffffff814dc93b>] ? __alloc_skb+0x4b/0x230 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.668134] [<ffffffff814dcfa4>] ? __netdev_alloc_skb+0x24/0x50 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.681436] [<ffffffff814dc968>] __alloc_skb+0x78/0x230 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.693208] [<ffffffff814dcfa4>] __netdev_alloc_skb+0x24/0x50 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.706132] [<ffffffffa01ef27d>] igb_alloc_rx_buffers+0xbd/0x2d0 [igb] > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.720776] [<ffffffffa01f000e>] igb_poll+0xb7e/0x1220 [igb] > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.733503] [<ffffffff810e11b0>] ? handle_irq_event+0x50/0x70 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.746422] [<ffffffff8101b903>] ? native_sched_clock+0x13/0x80 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.759725] [<ffffffff8101b979>] ? sched_clock+0x9/0x10 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.771494] [<ffffffff8108d60d>] ? sched_clock_cpu+0xbd/0x110 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.784416] [<ffffffff814ea27b>] net_rx_action+0x12b/0x220 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.796758] [<ffffffff8105ee78>] __do_softirq+0xb8/0x230 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.808718] [<ffffffff810e11b0>] ? handle_irq_event+0x50/0x70 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.821643] [<ffffffff8160171c>] call_softirq+0x1c/0x30 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.833411] [<ffffffff81016215>] do_softirq+0x65/0xa0 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.844795] [<ffffffff8105f28e>] irq_exit+0x9e/0xc0 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.855797] [<ffffffff81601f73>] do_IRQ+0x63/0xe0 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.866415] [<ffffffff815f85ea>] common_interrupt+0x6a/0x6a > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.878951] <EOI> [<ffffffff8131a57a>] ? intel_idle+0xea/0x150 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.892263] [<ffffffff8131a55c>] ? intel_idle+0xcc/0x150 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.904222] [<ffffffff814a7129>] cpuidle_enter+0x19/0x20 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.916183] [<ffffffff814a773c>] cpuidle_idle_call+0xac/0x2a0 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.929106] [<ffffffff8101d61f>] cpu_idle+0xcf/0x120 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.940305] [<ffffffff815d510e>] rest_init+0x72/0x74 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.951500] [<ffffffff81cf6c11>] start_kernel+0x3b7/0x3c4 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.963651] [<ffffffff81cf665a>] ? repair_env_string+0x5a/0x5a > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.976760] [<ffffffff81cf6346>] x86_64_start_reservations+0x131/0x135 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57520.991399] [<ffffffff81cf6140>] ? early_idt_handlers+0x140/0x140 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57521.005083] [<ffffffff81cf644c>] x86_64_start_kernel+0x102/0x111 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57521.018576] SLUB: Unable to allocate memory on node -1 (gfp=0x20) > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57521.032068] cache: kmalloc-1024, object size: 1024, buffer size: 1024, default order: 3, min order: 0 > Jun 21 09:48:07 ti136 kernel: [ID kern.warning] [57521.052908] node 0: slabs: 4812, objs: 66204, free: 0 > > This sort of thing seems to happen pretty consistently once or twice a day. > I'm not sure whether this is a kernel problem or something pecular to > the igb driver. Can anybody offer any guidance on how to troubleshoot > this? According to turbostat, CPU0 is not spending much time in C6 states > (usually <= .1%), but it is in C3 a few percent of the time. > > Is it possible that disabling C3 and C6 on CPU0 could help with this problem? > > The patch at https://lkml.org/lkml/2012/6/13/615 should allow disabling > C3 and C6 on CPU0 only, so in combination with my affinity_hint patch > to direct all interrupts to CPU0, that would be relatively energy-efficient. > > Finally, is there any thought to enhancing the driver to set affinity_hint as > in my patch? If there is interest in this, I could work on fixing some of the > patch's shortcomings. But for now, I think it serves my purposes. > > Thanks, > Andy |