From: Erik H. <eah...@gm...> - 2005-05-12 09:54:18
|
On 5/7/05, Dale Harris <ro...@ma...> wrote: >=20 > Hey I'm seeing some page allocation errors. Has anyone seen anything > like this. Of course it a 2.6.9 vanilla kernel patched for bproc, and > myrinet GM driver is running. I don't think this has anything to do with BProc per se. I've seen stuff like this before whenever I turn on jumbo frames on any machine and start shoving a lot of data through. I'm not 100% sure I'm right about what's going on but here's my guess: Once you start using jumbo frame sizes (~9k) it gets harder to allocate skuffs in the kernel. These are the buffers that hold network packets. Pages for this kind of stuff are allocated in powers of two. 9k will require 4 pages. Also, since it's kernel space stuff that will be used for DMA buffers, it will want 4 contiguous pages.=20 That can be hard to find since memory gets fragmented. Normally it would be possible for the swapper to page some stuff out (e.g. disk blocks) but kmalloc was called from an interrupt in this case which makes that impossible. The allocator has no options left so it gives up and the allocation fails. Allocation failures in cases like this shouldn't be treated as a major problem. Network drivers need to be able to deal with this sort of thing - and they do. I think the message below is supposed to be a helpful debugging aid. It's considered a warning. I don't know if BProc is causing this to happen in some subtle way.=20 If anything, I would expect that BProc could cause bigger traffic bursts than would normally be experienced by most servers. - Erik > swapper: page allocation failure. order:2, mode:0x20 > [<c013e28e>] __alloc_pages+0x1b3/0x358 > [<c013e458>] __get_free_pages+0x25/0x3f > [<c01416dc>] kmem_getpages+0x21/0xc9 > [<c01423bb>] cache_grow+0xab/0x14d > [<c01425d1>] cache_alloc_refill+0x174/0x219 > [<c0142a24>] __kmalloc+0x85/0x8c > [<c0219da1>] alloc_skb+0x47/0xe0 > [<f8febfb8>] gmip_recv_interrupt+0x216/0x4c7 [gm] > [<c011a76a>] load_balance+0x15c/0x170 > [<f8ff1691>] __gm_ethernet_wake_callback+0x6a/0x9c [gm] > [<f8fde584>] gm_handle_claimed_interrupt+0x580/0x62e [gm] > [<c025d0ae>] udp_queue_rcv_skb+0x174/0x2a4 > [<c025d6c6>] udp_rcv+0x164/0x407 > [<c023b6d8>] ip_defrag+0x112/0x1bf > [<c0239d4d>] ip_local_deliver+0xe8/0x279 > [<c023a26d>] ip_rcv+0x38f/0x510 > [<c021ffe3>] netif_receive_skb+0x1c7/0x2a1 > [<c022013b>] process_backlog+0x7e/0x10b > [<c022023f>] net_rx_action+0x77/0xf6 > [<f8fea40e>] gm_linux_intr+0x9c/0xac [gm] > [<c010899d>] handle_IRQ_event+0x31/0x65 > [<c0108d0b>] do_IRQ+0x9e/0x130 > [<c0106b6c>] common_interrupt+0x18/0x20 > [<c010401e>] default_idle+0x0/0x2c > [<c0104047>] default_idle+0x29/0x2c > [<c01040bc>] cpu_idle+0x3f/0x58 > [<c034a895>] start_kernel+0x197/0x1d5 > [<c034a336>] unknown_bootoption+0x0/0x15c [ snip ] |