From: <gor...@ph...> - 2003-06-25 13:42:09
|
Some of the oops people have been seeing in the 2.4.20 kernels may have been due to an RPC race condition. Here's a kernel thread on the topic, and a couple of ksymoops from my systems: http://www.ussg.iu.edu/hypermail/linux/kernel/0302.0/1146.html These oops are from systems running 3.2.5, but the also occurred under 3.2.4. They appear to be perciptated by simultaneous spikes in CPU and network (NFS?) load, which happens regularly on compute clusters. I've contacted the original poster, who hasn't seen the problem since upgrading to 2.4.21-rc6. Has anyone tried bproc on 2.4.21 yet? ====================================================================================== ksymoops 2.4.8 on i686 2.4.20. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.20/ (default) -m /boot/System.map (specified) Unable to handle kernel NULL pointer dereference at virtual address 00000058 c0303206 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[<c0303206>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010202 eax: 0000002c ebx: 00000000 ecx: 00000008 edx: 00000001 esi: f7611078 edi: e7bc2480 ebp: f7611000 esp: c2837edc ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c2837000) Stack: e7bc2480 c03031c0 00000020 00000000 c0304442 e7bc2480 e7bc24d4 c03043c0 c0123b47 e7bc2480 c2837f0c 00000001 c882a0e4 f7bfbd78 00000000 00000001 00000020 00000000 c011fafb c0433660 c011f9a1 00000000 00000001 c04095e0 Call Trace: [<c03031c0>] [<c0304442>] [<c03043c0>] [<c0123b47>] [<c011fafb>] [<c011f9a1>] [<c011f72b>] [<c010a8ad>] [<c0106e60>] [<c0106e60>] [<c0106e60>] [<c0106e60>] [<c0106e8c>] [<c0106f12>] [<c011af6b>] Code: 8b 40 2c 83 f8 09 0f 4c c8 b8 01 00 00 00 d3 e0 39 c2 7d 16 >>EIP; c0303206 <xprt_timer+46/e0> <===== >>esi; f7611078 <_end+371a0bfc/384b3b84> >>edi; e7bc2480 <_end+27752004/384b3b84> >>ebp; f7611000 <_end+371a0b84/384b3b84> >>esp; c2837edc <_end+23c7a60/384b3b84> Trace; c03031c0 <xprt_timer+0/e0> Trace; c0304442 <rpc_run_timer+82/90> Trace; c03043c0 <rpc_run_timer+0/90> Trace; c0123b47 <timer_bh+2b7/3f0> Trace; c011fafb <bh_action+4b/80> Trace; c011f9a1 <tasklet_hi_action+61/a0> Trace; c011f72b <do_softirq+7b/e0> Trace; c010a8ad <do_IRQ+dd/f0> Trace; c0106e60 <default_idle+0/40> Trace; c0106e60 <default_idle+0/40> Trace; c0106e60 <default_idle+0/40> Trace; c0106e60 <default_idle+0/40> Trace; c0106e8c <default_idle+2c/40> Trace; c0106f12 <cpu_idle+52/70> Trace; c011af6b <call_console_drivers+eb/100> Code; c0303206 <xprt_timer+46/e0> 00000000 <_EIP>: Code; c0303206 <xprt_timer+46/e0> <===== 0: 8b 40 2c mov 0x2c(%eax),%eax <===== Code; c0303209 <xprt_timer+49/e0> 3: 83 f8 09 cmp $0x9,%eax Code; c030320c <xprt_timer+4c/e0> 6: 0f 4c c8 cmovl %eax,%ecx Code; c030320f <xprt_timer+4f/e0> 9: b8 01 00 00 00 mov $0x1,%eax Code; c0303214 <xprt_timer+54/e0> e: d3 e0 shl %cl,%eax Code; c0303216 <xprt_timer+56/e0> 10: 39 c2 cmp %eax,%edx Code; c0303218 <xprt_timer+58/e0> 12: 7d 16 jge 2a <_EIP+0x2a> c0303230 <xprt_timer+70/e0> ============================================================================================== <0>Kernel panic: Aiee, killing interrupt handler!ksymoops 2.4.8 on i686 2.4.20. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.20/ (default) -m /boot/System.map (specified) Unable to handle kernel NULL pointer dereference at virtual address 00000058 c0303206 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[<c0303206>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010202 eax: 0000002c ebx: 00000000 ecx: 00000008 edx: 00000001 esi: f75d82b8 edi: f423ee40 ebp: f75d8000 esp: f53f3f24 ds: 0018 es: 0018 ss: 0018 Process blastall (pid: 2246, stackpage=f53f3000) Stack: f423ee40 c03031c0 00000000 00000000 c0304442 f423ee40 f423ee94 c03043c0 c0123b47 f423ee40 f53f3f54 00000086 c35a7a98 d901a0e4 00000000 00000001 00000000 00000000 c011fafb c0433660 c011f9a1 00000000 00000001 c04095e0 Call Trace: [<c03031c0>] [<c0304442>] [<c03043c0>] [<c0123b47>] [<c011fafb>] [<c011f9a1>] [<c011f72b>] [<c010a8ad>] Code: 8b 40 2c 83 f8 09 0f 4c c8 b8 01 00 00 00 d3 e0 39 c2 7d 16 >>EIP; c0303206 <xprt_timer+46/e0> <===== >>esi; f75d82b8 <_end+37167e3c/384b3b84> >>edi; f423ee40 <_end+33dce9c4/384b3b84> >>ebp; f75d8000 <_end+37167b84/384b3b84> >>esp; f53f3f24 <_end+34f83aa8/384b3b84> Trace; c03031c0 <xprt_timer+0/e0> Trace; c0304442 <rpc_run_timer+82/90> Trace; c03043c0 <rpc_run_timer+0/90> Trace; c0123b47 <timer_bh+2b7/3f0> Trace; c011fafb <bh_action+4b/80> Trace; c011f9a1 <tasklet_hi_action+61/a0> Trace; c011f72b <do_softirq+7b/e0> Trace; c010a8ad <do_IRQ+dd/f0> Code; c0303206 <xprt_timer+46/e0> 00000000 <_EIP>: Code; c0303206 <xprt_timer+46/e0> <===== 0: 8b 40 2c mov 0x2c(%eax),%eax <===== Code; c0303209 <xprt_timer+49/e0> 3: 83 f8 09 cmp $0x9,%eax Code; c030320c <xprt_timer+4c/e0> 6: 0f 4c c8 cmovl %eax,%ecx Code; c030320f <xprt_timer+4f/e0> 9: b8 01 00 00 00 mov $0x1,%eax Code; c0303214 <xprt_timer+54/e0> e: d3 e0 shl %cl,%eax Code; c0303216 <xprt_timer+56/e0> 10: 39 c2 cmp %eax,%edx Code; c0303218 <xprt_timer+58/e0> 12: 7d 16 jge 2a <_EIP+0x2a> c0303230 <xprt_timer+70/e0> <0>Kernel panic: Aiee, killing interrupt handler! |