[SSI-devel] [ ssic-linux-Bugs-1842982 ] vproc_hold_movement oops
Brought to you by:
brucewalker,
rogertsang
From: SourceForge.net <no...@so...> - 2008-01-02 01:16:45
|
Bugs item #1842982, was opened at 2007-12-02 18:32 Message generated for change (Settings changed) made by rogertsang You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1842982&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Process Management Group: v1.9.1 >Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Roger Tsang (rogertsang) Assigned to: Nobody/Anonymous (nobody) Summary: vproc_hold_movement oops Initial Comment: <4>procfs: impossible type (25)<7>Assertion failed! vp != ((void *)0), cluster/ssi/vproc/dvp_vpops.c, vpop_report_state, line=1369 <1>Unable to handle kernel NULL pointer dereference at virtual address 0000000c <1> printing eip: <4>c029226c <1>*pde = 00000000 <1>Oops: 0000 [#1] <4>SMP <4>Modules linked in: loop nfsd tun ipt_REJECT ipt_state ipt_multiport iptable_filter ipt_MASQUERADE iptable_nat ip_conntrack ip_tables softdog nls_iso8859_1 nls_cp437 vfat fat usb_storage binfmt_misc uhci_hcd ehci_hcd usbcore floppy drbd via_rhine sk98lin r8169 forcedeth dm_mod <4>CPU: 1 <4>EIP: 0060:[<c029226c>] Not tainted VLI <4>EFLAGS: 00010296 (2.6.11-ssi5.31) <4>EIP is at vproc_hold_movement+0xc/0x1f0 <4>eax: 00000000 ebx: 00000000 ecx: c04c3a10 edx: 00000000 <4>esi: 00000001 edi: c4c6d400 ebp: f7d43b6c esp: f7d43b0c <4>ds: 007b es: 007b ss: 0068 <4>Process child_reaper (pid: 2, threadinfo=f7d42000 task=f7d41630) <4>Stack: 00000000 f7d43b30 c0136f7c f536df2c 00000001 00000000 00000000 00000000 <4> c04c3a10 f7d43b58 c011b6c1 f536df2c 00000001 00000000 00000000 c04c3a10 <4> c04c3a0c 00000001 00000286 f7d43b84 c011b738 00000000 00000001 c4c6d400 <4>Call Trace: <4> [<c0104eff>] show_stack+0x7f/0xa0 <4> [<c01050a6>] show_registers+0x166/0x230 [1]more> Only 'q' or 'Q' are processed at more prompt, input ignored <4> [<c0105446>] die+0xf6/0x1c0 <4> [<c011850d>] do_page_fault+0x45d/0x652 <4> [<c0104b5f>] error_code+0x2b/0x30 <4> [<c0296f82>] pvpop_report_state+0x32/0x690 <4> [<c029e191>] vpop_report_state+0x1b1/0x3c0 <4> [<c01214fa>] release_task+0x17a/0x1c0 <4> [<c01227b6>] wait_task_zombie+0xe6/0x240 <4> [<c0122ddb>] pproc_reap+0x29b/0x380 <4> [<c0293704>] pvpop_reap+0x204/0x500 <4> [<c0292e7d>] dpvproc_nocldwait_async_handler+0x13d/0x2f0 <4> [<c02771a5>] async_cleanup_task_structs+0x55/0x90 <4> [<c02b5005>] initproc_postroot_init+0x145/0x230 <4> [<c027d872>] ssisys_cluster_initproc+0x12/0x20 <4> [<c027bd7b>] do_ssisys+0x9b/0x1f0 <4> [<c027bf1e>] sys_ssisys+0x4e/0x70 <4> [<c0103fc5>] sysenter_past_esp+0x52/0x75 <4>Code: 89 42 08 c9 c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 c9 c3 8d 74 26 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 54 8b 45 08 <8b> 58 0c 8d 93 c4 00 00 00 89 d0 89 55 b0 e8 f1 a6 1b 00 8b 4d <4> [1]kdb> bt Stack traceback for pid 2 0xf7d41630 2 0 1 1 R 0xf7d41800 *child_reaper EBP EIP Function (args) 0xf7d43b6c 0xc029226c vproc_hold_movement+0xc (0x0, 0x0, 0xc047ad88, 0x292, 0xf7d43ba4) 0xf7d43c00 0xc0296f82 pvpop_report_state+0x32 (0x0, 0xc4c6d400, 0xf7d43c54, 0x0, 0x1) 0xf7d43c48 0xc029e191 vpop_report_state+0x1b1 (0xc4c6d400, 0x11, 0x0, 0x1, 0x0) 0xf7d43c84 0xc01214fa release_task+0x17a (0xe51239f0, 0x0, 0xf7d43cb8, 0x0, 0x0) 0xf7d43cc8 0xc01227b6 wait_task_zombie+0xe6 (0xe51239f0, 0x0, 0x0, 0xf7d43e4c, 0xf7d43e50) 0xf7d43d18 0xc0122ddb pproc_reap+0x29b (0xe51239f0, 0x0, 0xf7d43e4c, 0xf7d43e50, 0x313d3) 0xf7d43e28 0xc0293704 pvpop_reap+0x204 (0xcfc34000, 0xffffffff, 0x20, 0x313d3, 0xf7d43e4c) 0xf7d43efc 0xc0292e7d dpvproc_nocldwait_async_handler+0x13d (0xc6a4c218, 0xf7d42000, 0xf7d42000, 0xf7d41630, 0x8) 0xf7d43f18 0xc02771a5 async_cleanup_task_structs+0x55 (0xf7d41630, 0x0, 0x40000001, 0x0, 0xc02b4eb0) 0xf7d43f58 0xc02b5005 initproc_postroot_init+0x145 0xf7d43f60 0xc027d872 ssisys_cluster_initproc+0x12 ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2007-12-26 23:08 Message: Logged In: YES user_id=1246761 Originator: YES The original oops is produced on 2.0.0pre1, but affected code dates back to 1.9.1 ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2007-12-10 21:58 Message: Logged In: YES user_id=1246761 Originator: YES I cannot reproduce this bug; and I'm not using IB interconnect. Post your OOPS if you can reproduce it with the new VPROC_HASH_LIST code. ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2007-12-10 21:49 Message: Logged In: YES user_id=1246761 Originator: YES Latest checkin marked #ifdef VPROC_HASH_LIST includes SMP bug fix for possible vproc hash corruption due to duplicate vproc release. ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2007-12-06 02:14 Message: Logged In: YES user_id=1246761 Originator: YES Not using new ATOMIC_VPROC_REFCNT code. Also reported to occur during Infiniband IPC bring up (with 1.9.3) - which means bug can be reproduced? ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2007-12-02 18:41 Message: Logged In: YES user_id=1246761 Originator: YES Assert at vproc_origin_inform_nodedown_node() is related to oops on node 3? ng over master from node 3. <4>Node 3 has gone down!!! <7>Assertion failed! surrogate_origin_node == this_node, cluster/ssi/vproc/nd_origin.c, vproc_origin_inform_nodedown_done, line=289 <4>passed the first scan in ipcname_pull_data <4>num_objects[MSG] = 0 <4>num_objects[SEM] = 2 <4>num_objects[SHM] = 9 <4>ipcnameserver ready completed ---------------------------------------------------------------------- Comment By: Roger Tsang (rogertsang) Date: 2007-12-02 18:34 Message: Logged In: YES user_id=1246761 Originator: YES Oops on dual-core AMD Opteron at vproc_hold_movement() due to null vp returned by tnc_locate_vproc_pid() when pid is at origin node but tnc_locate_vproc_pid() thinks it is not in the vproc hash and should be in the vproc hash. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1842982&group_id=32541 |