Re: [SSI-devel] SSI-1.9 oops child_reaper for process under gdb
Brought to you by:
brucewalker,
rogertsang
From: Roger T. <rog...@gm...> - 2005-12-30 06:28:08
|
Hi Laura, I got an oops which seems related since the process is exiting. Maybe one of the semaphores got corrupted on this initnode before init failover because initnode failover at the other node got stuck in UCLEANUP waiting for `pidof` to exit. This is the same pidof process spawned by ntp service that we earlier wanted to do a trace to find who had sem. Back then when I turned on the debug messages I didn't find anything because I wasn't able to reproduce the problem and the debug messages slowed down everything to an unusable level. Because I'm away from these machines I think I'm gonna recover this node soon. If you catch this email and want more kdb output let me know soon. Thanks! Roger Unable to handle kernel NULL pointer dereference at virtual address 0000000= 4 printing eip: c01c359d *pde =3D 00000000 Oops: 0000 [#1] Modules linked in: ipt_MASQUERADE nfsd exportfs tun ipt_REJECT ipt_state ipt_multiport iptable_filter iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd ehci_hcd usbcore floppy drbd bonding sata_via libata sk98lin r8169 via_rhine dm_mod CPU: 0 EIP: 0060:[<c01c359d>] Not tainted VLI EFLAGS: 00010246 (2.6.10-bk7-ssi27) EIP is at ssi_semexit+0x3d/0xd0 eax: d8d6cb08 ebx: 00015c87 ecx: 00000000 edx: eb2a3f20 esi: d8d6cb08 edi: f0c22840 ebp: d5de3ee0 esp: d5de3ec4 ds: 007b es: 007b ss: 0068 Process httpd (pid: 89223, threadinfo=3Dd5de2000 task=3Dcdb82550) Stack: c061c260 00140000 f7ffebc0 00000286 00140000 d5de2000 f0c22840 d5de3= f70 c01c318d 00140000 00015c87 f16fd954 cdb82550 00000296 c014d011 f7fff= f00 e362fd24 e362fd20 cdb82550 d5de3f2c 00000001 f7ff2e80 f16fd078 f5c07= 1e0 Call Trace: [<c0104a8f>] show_stack+0x7f/0xa0 [<c0104c25>] show_registers+0x155/0x220 [<c0104fac>] die+0xcc/0x190 [<c011682d>] do_page_fault+0x46d/0x66b [<c010470b>] error_code+0x2b/0x30 [<c01c318d>] exit_sem+0x15d/0x190 [<c011d418>] do_exit+0x148/0x3b0 [<c011d6f5>] do_group_exit+0x35/0x80 [<c011d755>] sys_exit_group+0x15/0x20 [<c0103c55>] sysenter_past_esp+0x52/0x75 Code: c0 8b 5d 0c 89 44 24 04 e8 c1 aa ff ff 85 c0 89 c6 74 33 8b 48 3c 8d 50 3c eb 0c 8d 76 00 39 59 04 74 2b 89 ca 8b 09 85 c9 75 f3 <8b> 41 04 c7 04 24 bc 3c 3e c0 89 44 24 04 e8 50 85 f5 ff 89 34 Entering kdb (current=3D0xcdb82550, pid 89223) Oops: Oops due to oops @ 0xc01c359d eax =3D 0xd8d6cb08 ebx =3D 0x00015c87 ecx =3D 0x00000000 edx =3D 0xeb2a3f20 esi =3D 0xd8d6cb08 edi =3D 0xf0c22840 esp =3D 0xd5de3ec4 eip =3D 0xc01c359d ebp =3D 0xd5de3ee0 xss =3D 0x00000068 xcs =3D 0x00000060 eflags =3D 0x00010= 246 xds =3D 0x0000007b xes =3D 0x0000007b origeax =3D 0xffffffff ®s =3D 0xd5= de3e90 kdb> bt Stack traceback for pid 89223 0xcdb82550 89223 69012 1 0 R 0xcdb82710 *httpd EBP EIP Function (args) 0xd5de3ee0 0xc01c359d ssi_semexit+0x3d (0x140000, 0x15c87, 0xf16fd954, 0xcdb82550, 0x296) 0xd5de3f70 0xc01c318d exit_sem+0x15d (0xcdb82550, 0xebd1d940, 0xc060c010, 0x1c3e2, 0x1) 0xd5de3f9c 0xc011d418 do_exit+0x148 (0x4aa60fb4, 0x0, 0x0) 0xd5de3fb0 0xc011d6f5 do_group_exit+0x35 (0x0) 0xd5de3fbc 0xc011d755 sys_exit_group+0x15 0xc0103c55 sysenter_past_esp+0x52 kdb> id 0xc01c359d 0xc01c359d ssi_semexit+0x3d: mov 0x4(%ecx),%eax 0xc01c35a0 ssi_semexit+0x40: movl $0xc03e3cbc,(%esp,1) 0xc01c35a7 ssi_semexit+0x47: mov %eax,0x4(%esp,1) 0xc01c35ab ssi_semexit+0x4b: call 0xc011bb00 printk 0xc01c35b0 ssi_semexit+0x50: mov %esi,(%esp,1) 0xc01c35b3 ssi_semexit+0x53: call 0xc01be0a0 ipc_unlock 0xc01c35b8 ssi_semexit+0x58: add $0x10,%esp 0xc01c35bb ssi_semexit+0x5b: pop %ebx 0xc01c35bc ssi_semexit+0x5c: pop %esi 0xc01c35bd ssi_semexit+0x5d: pop %edi 0xc01c35be ssi_semexit+0x5e: pop %ebp 0xc01c35bf ssi_semexit+0x5f: ret 0xc01c35c0 ssi_semexit+0x60: mov (%ecx),%eax 0xc01c35c2 ssi_semexit+0x62: xor %edi,%edi 0xc01c35c4 ssi_semexit+0x64: mov %eax,(%edx) 0xc01c35c6 ssi_semexit+0x66: mov 0x40(%esi),%eax On 12/6/05, Laura Ramirez <lau...@hp...> wrote: > > Hi Roger, > > A quick look at the code, there seems to be a comment > about the ptrace vproc path, may need to be reworked for 2.6 merge. > I dont quite remember what the issue was, it obviously hitting > BUG_ON with exit_signal =3D=3D -1. Can you print the vproc and the > pvproc? Below, you printed the pvproc using the vproc ptr which > made it look corrupt, but it really isnt, it was just the wrong print > call. > > laura > > Roger Tsang wrote: > > Laura, > > > > I got an oops exiting from gdb while attached to check_bacula which had > > segfaulted. I eventually fixed the bug in check_bacula, but take a loo= k at > > the oops below. child_reaper was waiting for check_bacula which is in = E > > state. It looks like pvproc got corrupted. > > > > I'll leave this in kdb until tomorrow just in case I left out something= . > > > > Roger > > > > > > pe (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)ptrace_unli= nk: > > vpop_reclaim failed > > <4>------------[ cut here ]------------ > > <1>kernel BUG at kernel/exit.c:1343! > > <1>invalid operand: 0000 [#1] > > <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd exportfs > > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport iptable_filter > > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd ehci_hcd usbcor= e > > floppy drbd bonding sata_via libata sk98lin r8169 via_rhine dm_mod > > <4>CPU: 0 > > <4>EIP: 0060:[<c011da00>] Not tainted VLI > > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) > > <4>EIP is at wait_task_zombie+0x220/0x230 > > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: c0430a2c > > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: f7d11c98 > > <4>ds: 007b es: 007b ss: 0068 > > <4>Process child_reaper (pid: 2, threadinfo=3Df7d10000 task=3Dc1aeea80) > > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 00000000 0000000= 0 > > 00000000 > > <4> 00000001 0001183d 00011667 dd600600 00000000 00000286 f7d11d1= c > > c011de68 > > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 f7d10000 c3378a4= 0 > > f7d11d1c > > more> > > <4>Call Trace: > > <4> [<c0104a8f>] show_stack+0x7f/0xa0 > > <4> [<c0104c25>] show_registers+0x155/0x220 > > <4> [<c0104fac>] die+0xcc/0x190 > > <4> [<c01050f6>] do_trap+0x86/0xd0 > > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 > > <4> [<c010470b>] error_code+0x2b/0x30 > > <4> [<c011de68>] pproc_reap+0x228/0x2f0 > > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 > > <4> [<c020ea9f>] dpvproc_nocldwait_async_handler+0x13f/0x300 > > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 > > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 > > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 > > <4> [<c01f77c9>] do_ssisys+0x99/0x200 > > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 > > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 > > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 77 a0 ff ff e9= 24 > > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 00 00 <0f> 0b = 3f 05 > > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 > > <4> > > kdb> > > kdb> bt > > Stack traceback for pid 2 > > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 *child_reaper > > EBP EIP Function (args) > > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 (0xc8709020, 0x0, 0x0, > > 0xf7d11e4c, 0xf7d11e50) > > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, 0xf7d11e4c, > > 0xf7d11e50, 0x11666) > > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, 0xffffffff, 0x20, > > 0x11666, 0xf7d11e4c) > > 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f (0xed3c9574= , > > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) > > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 (0xc1aeea80, 0x0, > > 0x40000001, 0x0, 0xc022f870) > > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 > > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 > > kdb> call print_task_struct 0xc8709020 > > state=3D0x20 > > flags=3D0x44c > > ptrace=3D0x0 > > lock_depth=3D-1 > > prio=3D116 > > static_prio=3D120 > > array=3D00000000 > > sleep_avg=3D899989756 > > interactive_credit=3D1 > > timestamp=3D269658653961877 > > activated=3D0x0 > > policy=3D0 > > &cpus_allowed=3D0xc870906c > > time_slice=3D49 > > first_time_slice=3D1 > > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 > > mm=3D00000000 > > active_mm=3D00000000 > > binfmt=3Dc04b3e68 > > exit_code=3D9 > > exit_signal=3D-1 > > pdeath_signal=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > personality=3D0x0 > > did_exec=3D0 > > pid=3D71274 > > epid=3D71274 > > ppid=3D71270 > > tgid=3D71271 > > cltnode=3D0 > > p_vproc=3D0xdd600600 > > p_vfparent=3D0x00000000 > > group_leader=3D0xf22f2550 > > &pids=3D0xc87090c4 > > set_child_tid 0x00000000 > > clear_child_tid 0x00000000 > > rt_priority=3D0x0 > > it_real_value=3D0x0 > > it_prof_value=3D0x0 > > it_virt_value=3D0x0 > > it_real_incr=3D0x0 > > it_prof_incr=3D0x0 > > it_virt_incr=3D0x0 > > utime=3D0 > > stime=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > nvcsw=3D3 > > nivcsw=3D0 > > sig_utime=3D0 > > sig_stime=3D0 > > cutime=3D0 > > cstime=3D0 > > sig_nvcsw=3D0 > > sig_nivcsw=3D0 > > cnvcsw=3D0 > > cnivcsw=3D0 > > start_time.tv_sec=3D270328 > > start_time.tv_nsec=3D615704896 > > min_flt=3D0 > > maj_flt=3D0 > > sig_min_flt=3D0 > > sig_maj_flt=3D0 > > cmin_flt=3D0 > > cmaj_flt=3D0 > > uid=3D0 > > euid=3D0 > > suid=3D0 > > fsuid=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > gid=3D0 > > egid=3D0 > > sgid=3D0 > > fsgid=3D0 > > group_info=3D0xeaf0b980 > > cap_effective=3D0xfffffeff > > cap_inheritable=3D0x0 > > cap_permitted=3D0xfffffeff > > keep_capabilities=3D0 > > user=3D0xc0431ae0 > > &rlim=3D0xc1af02c4 > > used_math=3D1 > > comm=3Dcheck_bacula > > locks=3D0 > > link_count=3D0 > > total_link_count=3D1 > > semvsem.undo_list=3Df4648200 > > fs=3D0x00000000 > > files=3D0x00000000 > > namespace=3D0x00000000 > > signal=3D0xc1af0240 > > sighand=3D0xf6c86580 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > &blocked=3D0xc8709484 > > &real_blocked=3D0xc870948c > > &pending=3D0xc8709494 > > sas_ss_sp=3D0x0 > > sas_ss_size=3D0x00000000 > > notifier_data=3D0x00000000 > > notifier_mask=3D0x00000000 > > security=3D0x00000000 > > audit_context=3D0x00000000 > > parent_exec_id=3D0x16 > > self_exec_id=3D0x16 > > journal_info=3D0x00000000 > > proc_dentry=3D0xcb5fe8d4 > > backing_dev_info=3D0x00000000 > > io_context=3D0x00000000 > > ptrace_message=3D0x0 > > last_siginfo=3D0x00000000 > > p_nodetime=3D0 > > p_ticks_delta=3D0 > > icsprio=3D0x0 > > execnode=3D0x00000000 > > node_context=3D1 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > rcopy_task_size=3D0 > > &mosix=3D0xc8709504 > > Function print_task_struct returned 0x0 > > kdb> btp 71274 > > Stack traceback for pid 71274 > > 0xc8709020 71274 71270 0 0 E 0xc87091e0 check_bacula > > EBP EIP Function (args) > > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, 0xf22f4dc0, 0xc012523= f, > > 0xd8017e9c, 0x0) > > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, 0xd8016000) > > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, 0xd8016000, > > 0xd8016000) > > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db (0xd8017f18, 0xd8017e= f8, > > 0xd8017fc4, 0x0, 0xc0218c59) > > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, 0xcc7be550, > > 0xc8709020) > > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 > > 0xc0103cf6 work_notifysig+0x13 > > kdb> btp 71270 > > Stack traceback for pid 71270 > > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb > > EBP EIP Function (args) > > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, 0xe2d7fecc, > > 0xf3eade68) > > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, 0xa0, 0x0, > > 0xf3eadedc) > > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, 0x0, 0x0, 0x0) > > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, 0x80000000, 0x0) > > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 > > 0xc0103c55 sysenter_past_esp+0x52 > > kdb> btp 117331 > > Stack traceback for pid 117331 > > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash > > EBP EIP Function (args) > > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, 0xffffffff, 0x4, 0x1c= a53, > > 0xd5399e94) > > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, 0xffffffff, 0x24, > > 0xbffff038, 0xd5399edc) > > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, 0xbffff038, 0= x0) > > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, 0xbffff038, 0xa, 0x0) > > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 > > 0xc0103c55 sysenter_past_esp+0x52 > > kdb> > > kdb> call print_pvproc 0xdd600600 > > pvp_flag=3D0x63727076 > > pvp_wstate=3D0x1166a > > pvp_pproc=3D0x00000007 > > pvp_head_childl=3D0xdd60061c > > pvp_childl=3D0x00000000 > > pvp_head_pgrpl=3D0x00000000 > > pvp_pgrpl=3D0x00000000 > > pvp_sessionl=3D0x00083049 > > pvp_head_oclist=3D0x00000005 > > pvp_oclist=3D0xc8709020 > > pvp_ppid=3D0 > > pvp_oppid=3D0 > > pvp_sid=3D0 > > pvp_pgid=3D-489161216 > > pvp_pp_sid=3D0 > > pvp_pp_pgid=3D0 > > pvp_fromnode=3D0 > > pvp_tonode=3D71270 > > pvp_cttynode=3D71271 > > pvp_cttydev=3D0x1ca53 > > pvp_jobc=3D71271 > > pvp_pgrp_ldr_seqno=3D1 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > pvp_pgrp_mem_seqno=3D-580909344 > > pvp_fork_sigmigarg=3D-580909332 > > pvp.ml.ml_flag=3D1 > > pvp.ml.ml_shr_count=3D-580909376 > > pvp.ml.ml_excl_count=3D0 > > pvp_loadlevel=3D-580909332 > > pvp_pin=3D0 > > pvp_localview=3D0 > > Function print_pvproc returned 0x0 > > kdb> > > > |