Re: [SSI-devel] SSI-1.9 oops child_reaper for process under gdb
Brought to you by:
brucewalker,
rogertsang
From: Vladimir R. <one...@gm...> - 2006-05-24 21:44:48
|
Laura, Roger, I've got the same problem trying to debug a multithreaded program with gdb: BUG_ON with exit_signal =3D=3D -1 in wait_task_zombie(). I applied a patch for kernel/kernel/exit.c (rev. 1.14 8 Dec 2005) - Is it a correct one? - and now I have dpvproc_nocldwait_async_handler() in an infinite loop, calling pvpop_reap() and receiving -EAGAIN as an error code. Should wait_task_zombie() return (sometimes ;) ) p->pid istead of -EAGAIN? Thanks Vladimir I can provide some info about the processes... On 12/6/05, Laura Ramirez <lau...@hp...> wrote: > > Hi Roger, > > A quick look at the code, there seems to be a comment > about the ptrace vproc path, may need to be reworked for 2.6 merge. > I dont quite remember what the issue was, it obviously hitting > BUG_ON with exit_signal =3D=3D -1. Can you print the vproc and the > pvproc? Below, you printed the pvproc using the vproc ptr which > made it look corrupt, but it really isnt, it was just the wrong print > call. > > laura > > Roger Tsang wrote: > > Laura, > > > > I got an oops exiting from gdb while attached to check_bacula which had > > segfaulted. I eventually fixed the bug in check_bacula, but take a loo= k at > > the oops below. child_reaper was waiting for check_bacula which is in = E > > state. It looks like pvproc got corrupted. > > > > I'll leave this in kdb until tomorrow just in case I left out something= . > > > > Roger > > > > > > pe (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)ptrace_unli= nk: > > vpop_reclaim failed > > <4>------------[ cut here ]------------ > > <1>kernel BUG at kernel/exit.c:1343! > > <1>invalid operand: 0000 [#1] > > <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd exportfs > > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport iptable_filter > > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd ehci_hcd usbcor= e > > floppy drbd bonding sata_via libata sk98lin r8169 via_rhine dm_mod > > <4>CPU: 0 > > <4>EIP: 0060:[<c011da00>] Not tainted VLI > > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) > > <4>EIP is at wait_task_zombie+0x220/0x230 > > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: c0430a2c > > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: f7d11c98 > > <4>ds: 007b es: 007b ss: 0068 > > <4>Process child_reaper (pid: 2, threadinfo=3Df7d10000 task=3Dc1aeea80) > > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 00000000 0000000= 0 > > 00000000 > > <4> 00000001 0001183d 00011667 dd600600 00000000 00000286 f7d11d1= c > > c011de68 > > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 f7d10000 c3378a4= 0 > > f7d11d1c > > more> > > <4>Call Trace: > > <4> [<c0104a8f>] show_stack+0x7f/0xa0 > > <4> [<c0104c25>] show_registers+0x155/0x220 > > <4> [<c0104fac>] die+0xcc/0x190 > > <4> [<c01050f6>] do_trap+0x86/0xd0 > > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 > > <4> [<c010470b>] error_code+0x2b/0x30 > > <4> [<c011de68>] pproc_reap+0x228/0x2f0 > > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 > > <4> [<c020ea9f>] dpvproc_nocldwait_async_handler+0x13f/0x300 > > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 > > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 > > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 > > <4> [<c01f77c9>] do_ssisys+0x99/0x200 > > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 > > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 > > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 77 a0 ff ff e9= 24 > > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 00 00 <0f> 0b = 3f 05 > > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 > > <4> > > kdb> > > kdb> bt > > Stack traceback for pid 2 > > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 *child_reaper > > EBP EIP Function (args) > > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 (0xc8709020, 0x0, 0x0, > > 0xf7d11e4c, 0xf7d11e50) > > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, 0xf7d11e4c, > > 0xf7d11e50, 0x11666) > > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, 0xffffffff, 0x20, > > 0x11666, 0xf7d11e4c) > > 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f (0xed3c9574= , > > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) > > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 (0xc1aeea80, 0x0, > > 0x40000001, 0x0, 0xc022f870) > > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 > > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 > > kdb> call print_task_struct 0xc8709020 > > state=3D0x20 > > flags=3D0x44c > > ptrace=3D0x0 > > lock_depth=3D-1 > > prio=3D116 > > static_prio=3D120 > > array=3D00000000 > > sleep_avg=3D899989756 > > interactive_credit=3D1 > > timestamp=3D269658653961877 > > activated=3D0x0 > > policy=3D0 > > &cpus_allowed=3D0xc870906c > > time_slice=3D49 > > first_time_slice=3D1 > > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 > > mm=3D00000000 > > active_mm=3D00000000 > > binfmt=3Dc04b3e68 > > exit_code=3D9 > > exit_signal=3D-1 > > pdeath_signal=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > personality=3D0x0 > > did_exec=3D0 > > pid=3D71274 > > epid=3D71274 > > ppid=3D71270 > > tgid=3D71271 > > cltnode=3D0 > > p_vproc=3D0xdd600600 > > p_vfparent=3D0x00000000 > > group_leader=3D0xf22f2550 > > &pids=3D0xc87090c4 > > set_child_tid 0x00000000 > > clear_child_tid 0x00000000 > > rt_priority=3D0x0 > > it_real_value=3D0x0 > > it_prof_value=3D0x0 > > it_virt_value=3D0x0 > > it_real_incr=3D0x0 > > it_prof_incr=3D0x0 > > it_virt_incr=3D0x0 > > utime=3D0 > > stime=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > nvcsw=3D3 > > nivcsw=3D0 > > sig_utime=3D0 > > sig_stime=3D0 > > cutime=3D0 > > cstime=3D0 > > sig_nvcsw=3D0 > > sig_nivcsw=3D0 > > cnvcsw=3D0 > > cnivcsw=3D0 > > start_time.tv_sec=3D270328 > > start_time.tv_nsec=3D615704896 > > min_flt=3D0 > > maj_flt=3D0 > > sig_min_flt=3D0 > > sig_maj_flt=3D0 > > cmin_flt=3D0 > > cmaj_flt=3D0 > > uid=3D0 > > euid=3D0 > > suid=3D0 > > fsuid=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > gid=3D0 > > egid=3D0 > > sgid=3D0 > > fsgid=3D0 > > group_info=3D0xeaf0b980 > > cap_effective=3D0xfffffeff > > cap_inheritable=3D0x0 > > cap_permitted=3D0xfffffeff > > keep_capabilities=3D0 > > user=3D0xc0431ae0 > > &rlim=3D0xc1af02c4 > > used_math=3D1 > > comm=3Dcheck_bacula > > locks=3D0 > > link_count=3D0 > > total_link_count=3D1 > > semvsem.undo_list=3Df4648200 > > fs=3D0x00000000 > > files=3D0x00000000 > > namespace=3D0x00000000 > > signal=3D0xc1af0240 > > sighand=3D0xf6c86580 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > &blocked=3D0xc8709484 > > &real_blocked=3D0xc870948c > > &pending=3D0xc8709494 > > sas_ss_sp=3D0x0 > > sas_ss_size=3D0x00000000 > > notifier_data=3D0x00000000 > > notifier_mask=3D0x00000000 > > security=3D0x00000000 > > audit_context=3D0x00000000 > > parent_exec_id=3D0x16 > > self_exec_id=3D0x16 > > journal_info=3D0x00000000 > > proc_dentry=3D0xcb5fe8d4 > > backing_dev_info=3D0x00000000 > > io_context=3D0x00000000 > > ptrace_message=3D0x0 > > last_siginfo=3D0x00000000 > > p_nodetime=3D0 > > p_ticks_delta=3D0 > > icsprio=3D0x0 > > execnode=3D0x00000000 > > node_context=3D1 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > rcopy_task_size=3D0 > > &mosix=3D0xc8709504 > > Function print_task_struct returned 0x0 > > kdb> btp 71274 > > Stack traceback for pid 71274 > > 0xc8709020 71274 71270 0 0 E 0xc87091e0 check_bacula > > EBP EIP Function (args) > > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, 0xf22f4dc0, 0xc012523= f, > > 0xd8017e9c, 0x0) > > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, 0xd8016000) > > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, 0xd8016000, > > 0xd8016000) > > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db (0xd8017f18, 0xd8017e= f8, > > 0xd8017fc4, 0x0, 0xc0218c59) > > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, 0xcc7be550, > > 0xc8709020) > > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 > > 0xc0103cf6 work_notifysig+0x13 > > kdb> btp 71270 > > Stack traceback for pid 71270 > > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb > > EBP EIP Function (args) > > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, 0xe2d7fecc, > > 0xf3eade68) > > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, 0xa0, 0x0, > > 0xf3eadedc) > > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, 0x0, 0x0, 0x0) > > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, 0x80000000, 0x0) > > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 > > 0xc0103c55 sysenter_past_esp+0x52 > > kdb> btp 117331 > > Stack traceback for pid 117331 > > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash > > EBP EIP Function (args) > > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, 0xffffffff, 0x4, 0x1c= a53, > > 0xd5399e94) > > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, 0xffffffff, 0x24, > > 0xbffff038, 0xd5399edc) > > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, 0xbffff038, 0= x0) > > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, 0xbffff038, 0xa, 0x0) > > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 > > 0xc0103c55 sysenter_past_esp+0x52 > > kdb> > > kdb> call print_pvproc 0xdd600600 > > pvp_flag=3D0x63727076 > > pvp_wstate=3D0x1166a > > pvp_pproc=3D0x00000007 > > pvp_head_childl=3D0xdd60061c > > pvp_childl=3D0x00000000 > > pvp_head_pgrpl=3D0x00000000 > > pvp_pgrpl=3D0x00000000 > > pvp_sessionl=3D0x00083049 > > pvp_head_oclist=3D0x00000005 > > pvp_oclist=3D0xc8709020 > > pvp_ppid=3D0 > > pvp_oppid=3D0 > > pvp_sid=3D0 > > pvp_pgid=3D-489161216 > > pvp_pp_sid=3D0 > > pvp_pp_pgid=3D0 > > pvp_fromnode=3D0 > > pvp_tonode=3D71270 > > pvp_cttynode=3D71271 > > pvp_cttydev=3D0x1ca53 > > pvp_jobc=3D71271 > > pvp_pgrp_ldr_seqno=3D1 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > pvp_pgrp_mem_seqno=3D-580909344 > > pvp_fork_sigmigarg=3D-580909332 > > pvp.ml.ml_flag=3D1 > > pvp.ml.ml_shr_count=3D-580909376 > > pvp.ml.ml_excl_count=3D0 > > pvp_loadlevel=3D-580909332 > > pvp_pin=3D0 > > pvp_localview=3D0 > > Function print_pvproc returned 0x0 > > kdb> > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log fi= les > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > |