Thread: [SSI-devel] SSI-1.9 oops child_reaper for process under gdb
Brought to you by:
brucewalker,
rogertsang
From: Roger T. <rog...@gm...> - 2005-12-06 09:33:16
|
Laura, I got an oops exiting from gdb while attached to check_bacula which had segfaulted. I eventually fixed the bug in check_bacula, but take a look at the oops below. child_reaper was waiting for check_bacula which is in E state. It looks like pvproc got corrupted. I'll leave this in kdb until tomorrow just in case I left out something. Roger pe (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)procfs: impossible type (11)ptrace_unlink: vpop_reclaim failed <4>------------[ cut here ]------------ <1>kernel BUG at kernel/exit.c:1343! <1>invalid operand: 0000 [#1] <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd exportfs ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport iptable_filter iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd ehci_hcd usbcore floppy drbd bonding sata_via libata sk98lin r8169 via_rhine dm_mod <4>CPU: 0 <4>EIP: 0060:[<c011da00>] Not tainted VLI <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) <4>EIP is at wait_task_zombie+0x220/0x230 <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: c0430a2c <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: f7d11c98 <4>ds: 007b es: 007b ss: 0068 <4>Process child_reaper (pid: 2, threadinfo=3Df7d10000 task=3Dc1aeea80) <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 00000000 00000000 00000000 <4> 00000001 0001183d 00011667 dd600600 00000000 00000286 f7d11d1c c011de68 <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 f7d10000 c3378a40 f7d11d1c more> <4>Call Trace: <4> [<c0104a8f>] show_stack+0x7f/0xa0 <4> [<c0104c25>] show_registers+0x155/0x220 <4> [<c0104fac>] die+0xcc/0x190 <4> [<c01050f6>] do_trap+0x86/0xd0 <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 <4> [<c010470b>] error_code+0x2b/0x30 <4> [<c011de68>] pproc_reap+0x228/0x2f0 <4> [<c020f327>] pvpop_reap+0x1d7/0x480 <4> [<c020ea9f>] dpvproc_nocldwait_async_handler+0x13f/0x300 <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 <4> [<c01f77c9>] do_ssisys+0x99/0x200 <4> [<c01f797f>] sys_ssisys+0x4f/0x70 <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 77 a0 ff ff e9 24 ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 00 00 <0f> 0b 3f 0= 5 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 <4> kdb> kdb> bt Stack traceback for pid 2 0xc1aeea80 2 0 1 0 R 0xc1aeec40 *child_reaper EBP EIP Function (args) 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 (0xc8709020, 0x0, 0x0, 0xf7d11e4c, 0xf7d11e50) 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, 0xf7d11e4c, 0xf7d11e50, 0x11666) 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, 0xffffffff, 0x20, 0x11666, 0xf7d11e4c) 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f (0xed3c9574, 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 (0xc1aeea80, 0x0, 0x40000001, 0x0, 0xc022f870) 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 kdb> call print_task_struct 0xc8709020 state=3D0x20 flags=3D0x44c ptrace=3D0x0 lock_depth=3D-1 prio=3D116 static_prio=3D120 array=3D00000000 sleep_avg=3D899989756 interactive_credit=3D1 timestamp=3D269658653961877 activated=3D0x0 policy=3D0 &cpus_allowed=3D0xc870906c time_slice=3D49 first_time_slice=3D1 tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 mm=3D00000000 active_mm=3D00000000 binfmt=3Dc04b3e68 exit_code=3D9 exit_signal=3D-1 pdeath_signal=3D0 more> Only 'q' or 'Q' are processed at more prompt, input ignored personality=3D0x0 did_exec=3D0 pid=3D71274 epid=3D71274 ppid=3D71270 tgid=3D71271 cltnode=3D0 p_vproc=3D0xdd600600 p_vfparent=3D0x00000000 group_leader=3D0xf22f2550 &pids=3D0xc87090c4 set_child_tid 0x00000000 clear_child_tid 0x00000000 rt_priority=3D0x0 it_real_value=3D0x0 it_prof_value=3D0x0 it_virt_value=3D0x0 it_real_incr=3D0x0 it_prof_incr=3D0x0 it_virt_incr=3D0x0 utime=3D0 stime=3D0 more> Only 'q' or 'Q' are processed at more prompt, input ignored nvcsw=3D3 nivcsw=3D0 sig_utime=3D0 sig_stime=3D0 cutime=3D0 cstime=3D0 sig_nvcsw=3D0 sig_nivcsw=3D0 cnvcsw=3D0 cnivcsw=3D0 start_time.tv_sec=3D270328 start_time.tv_nsec=3D615704896 min_flt=3D0 maj_flt=3D0 sig_min_flt=3D0 sig_maj_flt=3D0 cmin_flt=3D0 cmaj_flt=3D0 uid=3D0 euid=3D0 suid=3D0 fsuid=3D0 more> Only 'q' or 'Q' are processed at more prompt, input ignored gid=3D0 egid=3D0 sgid=3D0 fsgid=3D0 group_info=3D0xeaf0b980 cap_effective=3D0xfffffeff cap_inheritable=3D0x0 cap_permitted=3D0xfffffeff keep_capabilities=3D0 user=3D0xc0431ae0 &rlim=3D0xc1af02c4 used_math=3D1 comm=3Dcheck_bacula locks=3D0 link_count=3D0 total_link_count=3D1 semvsem.undo_list=3Df4648200 fs=3D0x00000000 files=3D0x00000000 namespace=3D0x00000000 signal=3D0xc1af0240 sighand=3D0xf6c86580 more> Only 'q' or 'Q' are processed at more prompt, input ignored &blocked=3D0xc8709484 &real_blocked=3D0xc870948c &pending=3D0xc8709494 sas_ss_sp=3D0x0 sas_ss_size=3D0x00000000 notifier_data=3D0x00000000 notifier_mask=3D0x00000000 security=3D0x00000000 audit_context=3D0x00000000 parent_exec_id=3D0x16 self_exec_id=3D0x16 journal_info=3D0x00000000 proc_dentry=3D0xcb5fe8d4 backing_dev_info=3D0x00000000 io_context=3D0x00000000 ptrace_message=3D0x0 last_siginfo=3D0x00000000 p_nodetime=3D0 p_ticks_delta=3D0 icsprio=3D0x0 execnode=3D0x00000000 node_context=3D1 more> Only 'q' or 'Q' are processed at more prompt, input ignored rcopy_task_size=3D0 &mosix=3D0xc8709504 Function print_task_struct returned 0x0 kdb> btp 71274 Stack traceback for pid 71274 0xc8709020 71274 71270 0 0 E 0xc87091e0 check_bacula EBP EIP Function (args) 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, 0xf22f4dc0, 0xc012523f, 0xd8017e9c, 0x0) 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, 0xd8016000) 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, 0xd8016000, 0xd8016000) 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db (0xd8017f18, 0xd8017ef8, 0xd8017fc4, 0x0, 0xc0218c59) 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, 0xcc7be550, 0xc8709020) 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 0xc0103cf6 work_notifysig+0x13 kdb> btp 71270 Stack traceback for pid 71270 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb EBP EIP Function (args) 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, 0xe2d7fecc, 0xf3eade68) 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, 0xa0, 0x0, 0xf3eadedc) 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, 0x0, 0x0, 0x0) 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, 0x80000000, 0x0) 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 0xc0103c55 sysenter_past_esp+0x52 kdb> btp 117331 Stack traceback for pid 117331 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash EBP EIP Function (args) 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, 0xffffffff, 0x4, 0x1ca53, 0xd5399e94) 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, 0xffffffff, 0x24, 0xbffff038, 0xd5399edc) 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, 0xbffff038, 0x0) 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, 0xbffff038, 0xa, 0x0) 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 0xc0103c55 sysenter_past_esp+0x52 kdb> kdb> call print_pvproc 0xdd600600 pvp_flag=3D0x63727076 pvp_wstate=3D0x1166a pvp_pproc=3D0x00000007 pvp_head_childl=3D0xdd60061c pvp_childl=3D0x00000000 pvp_head_pgrpl=3D0x00000000 pvp_pgrpl=3D0x00000000 pvp_sessionl=3D0x00083049 pvp_head_oclist=3D0x00000005 pvp_oclist=3D0xc8709020 pvp_ppid=3D0 pvp_oppid=3D0 pvp_sid=3D0 pvp_pgid=3D-489161216 pvp_pp_sid=3D0 pvp_pp_pgid=3D0 pvp_fromnode=3D0 pvp_tonode=3D71270 pvp_cttynode=3D71271 pvp_cttydev=3D0x1ca53 pvp_jobc=3D71271 pvp_pgrp_ldr_seqno=3D1 more> Only 'q' or 'Q' are processed at more prompt, input ignored pvp_pgrp_mem_seqno=3D-580909344 pvp_fork_sigmigarg=3D-580909332 pvp.ml.ml_flag=3D1 pvp.ml.ml_shr_count=3D-580909376 pvp.ml.ml_excl_count=3D0 pvp_loadlevel=3D-580909332 pvp_pin=3D0 pvp_localview=3D0 Function print_pvproc returned 0x0 kdb> |
From: Laura R. <lau...@hp...> - 2005-12-06 19:44:25
|
Hi Roger, A quick look at the code, there seems to be a comment about the ptrace vproc path, may need to be reworked for 2.6 merge. I dont quite remember what the issue was, it obviously hitting BUG_ON with exit_signal == -1. Can you print the vproc and the pvproc? Below, you printed the pvproc using the vproc ptr which made it look corrupt, but it really isnt, it was just the wrong print call. laura Roger Tsang wrote: > Laura, > > I got an oops exiting from gdb while attached to check_bacula which had > segfaulted. I eventually fixed the bug in check_bacula, but take a look at > the oops below. child_reaper was waiting for check_bacula which is in E > state. It looks like pvproc got corrupted. > > I'll leave this in kdb until tomorrow just in case I left out something. > > Roger > > > pe (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > impossible type (11)procfs: impossible type (11)procfs: impossible type > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > impossible type (11)procfs: impossible type (11)procfs: impossible type > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > impossible type (11)procfs: impossible type (11)procfs: impossible type > (11)procfs: impossible type (11)procfs: impossible type (11)ptrace_unlink: > vpop_reclaim failed > <4>------------[ cut here ]------------ > <1>kernel BUG at kernel/exit.c:1343! > <1>invalid operand: 0000 [#1] > <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd exportfs > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport iptable_filter > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd ehci_hcd usbcore > floppy drbd bonding sata_via libata sk98lin r8169 via_rhine dm_mod > <4>CPU: 0 > <4>EIP: 0060:[<c011da00>] Not tainted VLI > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) > <4>EIP is at wait_task_zombie+0x220/0x230 > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: c0430a2c > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: f7d11c98 > <4>ds: 007b es: 007b ss: 0068 > <4>Process child_reaper (pid: 2, threadinfo=f7d10000 task=c1aeea80) > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 00000000 00000000 > 00000000 > <4> 00000001 0001183d 00011667 dd600600 00000000 00000286 f7d11d1c > c011de68 > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 f7d10000 c3378a40 > f7d11d1c > more> > <4>Call Trace: > <4> [<c0104a8f>] show_stack+0x7f/0xa0 > <4> [<c0104c25>] show_registers+0x155/0x220 > <4> [<c0104fac>] die+0xcc/0x190 > <4> [<c01050f6>] do_trap+0x86/0xd0 > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 > <4> [<c010470b>] error_code+0x2b/0x30 > <4> [<c011de68>] pproc_reap+0x228/0x2f0 > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 > <4> [<c020ea9f>] dpvproc_nocldwait_async_handler+0x13f/0x300 > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 > <4> [<c01f77c9>] do_ssisys+0x99/0x200 > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 77 a0 ff ff e9 24 > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 00 00 <0f> 0b 3f 05 > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 > <4> > kdb> > kdb> bt > Stack traceback for pid 2 > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 *child_reaper > EBP EIP Function (args) > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 (0xc8709020, 0x0, 0x0, > 0xf7d11e4c, 0xf7d11e50) > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, 0xf7d11e4c, > 0xf7d11e50, 0x11666) > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, 0xffffffff, 0x20, > 0x11666, 0xf7d11e4c) > 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f (0xed3c9574, > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 (0xc1aeea80, 0x0, > 0x40000001, 0x0, 0xc022f870) > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 > kdb> call print_task_struct 0xc8709020 > state=0x20 > flags=0x44c > ptrace=0x0 > lock_depth=-1 > prio=116 > static_prio=120 > array=00000000 > sleep_avg=899989756 > interactive_credit=1 > timestamp=269658653961877 > activated=0x0 > policy=0 > &cpus_allowed=0xc870906c > time_slice=49 > first_time_slice=1 > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 > mm=00000000 > active_mm=00000000 > binfmt=c04b3e68 > exit_code=9 > exit_signal=-1 > pdeath_signal=0 > more> > Only 'q' or 'Q' are processed at more prompt, input ignored > personality=0x0 > did_exec=0 > pid=71274 > epid=71274 > ppid=71270 > tgid=71271 > cltnode=0 > p_vproc=0xdd600600 > p_vfparent=0x00000000 > group_leader=0xf22f2550 > &pids=0xc87090c4 > set_child_tid 0x00000000 > clear_child_tid 0x00000000 > rt_priority=0x0 > it_real_value=0x0 > it_prof_value=0x0 > it_virt_value=0x0 > it_real_incr=0x0 > it_prof_incr=0x0 > it_virt_incr=0x0 > utime=0 > stime=0 > more> > Only 'q' or 'Q' are processed at more prompt, input ignored > nvcsw=3 > nivcsw=0 > sig_utime=0 > sig_stime=0 > cutime=0 > cstime=0 > sig_nvcsw=0 > sig_nivcsw=0 > cnvcsw=0 > cnivcsw=0 > start_time.tv_sec=270328 > start_time.tv_nsec=615704896 > min_flt=0 > maj_flt=0 > sig_min_flt=0 > sig_maj_flt=0 > cmin_flt=0 > cmaj_flt=0 > uid=0 > euid=0 > suid=0 > fsuid=0 > more> > Only 'q' or 'Q' are processed at more prompt, input ignored > gid=0 > egid=0 > sgid=0 > fsgid=0 > group_info=0xeaf0b980 > cap_effective=0xfffffeff > cap_inheritable=0x0 > cap_permitted=0xfffffeff > keep_capabilities=0 > user=0xc0431ae0 > &rlim=0xc1af02c4 > used_math=1 > comm=check_bacula > locks=0 > link_count=0 > total_link_count=1 > semvsem.undo_list=f4648200 > fs=0x00000000 > files=0x00000000 > namespace=0x00000000 > signal=0xc1af0240 > sighand=0xf6c86580 > more> > Only 'q' or 'Q' are processed at more prompt, input ignored > &blocked=0xc8709484 > &real_blocked=0xc870948c > &pending=0xc8709494 > sas_ss_sp=0x0 > sas_ss_size=0x00000000 > notifier_data=0x00000000 > notifier_mask=0x00000000 > security=0x00000000 > audit_context=0x00000000 > parent_exec_id=0x16 > self_exec_id=0x16 > journal_info=0x00000000 > proc_dentry=0xcb5fe8d4 > backing_dev_info=0x00000000 > io_context=0x00000000 > ptrace_message=0x0 > last_siginfo=0x00000000 > p_nodetime=0 > p_ticks_delta=0 > icsprio=0x0 > execnode=0x00000000 > node_context=1 > more> > Only 'q' or 'Q' are processed at more prompt, input ignored > rcopy_task_size=0 > &mosix=0xc8709504 > Function print_task_struct returned 0x0 > kdb> btp 71274 > Stack traceback for pid 71274 > 0xc8709020 71274 71270 0 0 E 0xc87091e0 check_bacula > EBP EIP Function (args) > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, 0xf22f4dc0, 0xc012523f, > 0xd8017e9c, 0x0) > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, 0xd8016000) > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, 0xd8016000, > 0xd8016000) > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db (0xd8017f18, 0xd8017ef8, > 0xd8017fc4, 0x0, 0xc0218c59) > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, 0xcc7be550, > 0xc8709020) > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 > 0xc0103cf6 work_notifysig+0x13 > kdb> btp 71270 > Stack traceback for pid 71270 > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb > EBP EIP Function (args) > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, 0xe2d7fecc, > 0xf3eade68) > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, 0xa0, 0x0, > 0xf3eadedc) > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, 0x0, 0x0, 0x0) > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, 0x80000000, 0x0) > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 > 0xc0103c55 sysenter_past_esp+0x52 > kdb> btp 117331 > Stack traceback for pid 117331 > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash > EBP EIP Function (args) > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, 0xffffffff, 0x4, 0x1ca53, > 0xd5399e94) > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, 0xffffffff, 0x24, > 0xbffff038, 0xd5399edc) > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, 0xbffff038, 0x0) > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, 0xbffff038, 0xa, 0x0) > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 > 0xc0103c55 sysenter_past_esp+0x52 > kdb> > kdb> call print_pvproc 0xdd600600 > pvp_flag=0x63727076 > pvp_wstate=0x1166a > pvp_pproc=0x00000007 > pvp_head_childl=0xdd60061c > pvp_childl=0x00000000 > pvp_head_pgrpl=0x00000000 > pvp_pgrpl=0x00000000 > pvp_sessionl=0x00083049 > pvp_head_oclist=0x00000005 > pvp_oclist=0xc8709020 > pvp_ppid=0 > pvp_oppid=0 > pvp_sid=0 > pvp_pgid=-489161216 > pvp_pp_sid=0 > pvp_pp_pgid=0 > pvp_fromnode=0 > pvp_tonode=71270 > pvp_cttynode=71271 > pvp_cttydev=0x1ca53 > pvp_jobc=71271 > pvp_pgrp_ldr_seqno=1 > more> > Only 'q' or 'Q' are processed at more prompt, input ignored > pvp_pgrp_mem_seqno=-580909344 > pvp_fork_sigmigarg=-580909332 > pvp.ml.ml_flag=1 > pvp.ml.ml_shr_count=-580909376 > pvp.ml.ml_excl_count=0 > pvp_loadlevel=-580909332 > pvp_pin=0 > pvp_localview=0 > Function print_pvproc returned 0x0 > kdb> > |
From: Roger T. <rog...@gm...> - 2005-12-06 20:07:56
|
Laura, vproc and pvproc kdb> call print_vproc 0xdd600600 vp_magic=3D0x63727076 (should be 0x63727076) vp_pid=3D71274 vp_ref_cnt=3D7 vp_data=3D0xdd60061c vp_hashfwd=3D0x00000000 vp_hashbwd=3D0x00000000 Function print_vproc returned 0x0 kdb> call print_pvproc 0xdd60061c pvp_flag=3D0x83049 pvp_wstate=3D0x5 pvp_pproc=3D0xc8709020 pvp_head_childl=3D0x00000000 pvp_childl=3D0x00000000 pvp_head_pgrpl=3D0x00000000 pvp_pgrpl=3D0xe2d7fe00 pvp_sessionl=3D0x00000000 pvp_head_oclist=3D0x00000000 pvp_oclist=3D0x00000000 pvp_ppid=3D71270 pvp_oppid=3D71271 pvp_sid=3D117331 pvp_pgid=3D71271 pvp_pp_sid=3D117331 pvp_pp_pgid=3D71271 pvp_fromnode=3D1 pvp_tonode=3D1 pvp_cttynode=3D1 pvp_cttydev=3D0x8800001 pvp_jobc=3D0 pvp_pgrp_ldr_seqno=3D0 more> pvp_pgrp_mem_seqno=3D0 pvp_fork_sigmigarg=3D0 pvp.ml.ml_flag=3D2 pvp.ml.ml_shr_count=3D1 pvp.ml.ml_excl_count=3D0 pvp_loadlevel=3D0 pvp_pin=3D0 pvp_localview=3D0 Function print_pvproc returned 0x0 kdb> On 12/6/05, Laura Ramirez <lau...@hp...> wrote: > > > Hi Roger, > > A quick look at the code, there seems to be a comment > about the ptrace vproc path, may need to be reworked for 2.6 merge. > I dont quite remember what the issue was, it obviously hitting > BUG_ON with exit_signal =3D=3D -1. Can you print the vproc and the > pvproc? Below, you printed the pvproc using the vproc ptr which > made it look corrupt, but it really isnt, it was just the wrong print > call. > > laura > > Roger Tsang wrote: > > Laura, > > > > I got an oops exiting from gdb while attached to check_bacula which had > > segfaulted. I eventually fixed the bug in check_bacula, but take a loo= k > at > > the oops below. child_reaper was waiting for check_bacula which is in = E > > state. It looks like pvproc got corrupted. > > > > I'll leave this in kdb until tomorrow just in case I left out something= . > > > > Roger > > > > > > pe (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type > (11)ptrace_unlink: > > vpop_reclaim failed > > <4>------------[ cut here ]------------ > > <1>kernel BUG at kernel/exit.c:1343! > > <1>invalid operand: 0000 [#1] > > <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd exportfs > > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport iptable_filter > > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd ehci_hcd usbcor= e > > floppy drbd bonding sata_via libata sk98lin r8169 via_rhine dm_mod > > <4>CPU: 0 > > <4>EIP: 0060:[<c011da00>] Not tainted VLI > > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) > > <4>EIP is at wait_task_zombie+0x220/0x230 > > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: c0430a2c > > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: f7d11c98 > > <4>ds: 007b es: 007b ss: 0068 > > <4>Process child_reaper (pid: 2, threadinfo=3Df7d10000 task=3Dc1aeea80) > > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 00000000 0000000= 0 > > 00000000 > > <4> 00000001 0001183d 00011667 dd600600 00000000 00000286 f7d11d1= c > > c011de68 > > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 f7d10000 c3378a4= 0 > > f7d11d1c > > more> > > <4>Call Trace: > > <4> [<c0104a8f>] show_stack+0x7f/0xa0 > > <4> [<c0104c25>] show_registers+0x155/0x220 > > <4> [<c0104fac>] die+0xcc/0x190 > > <4> [<c01050f6>] do_trap+0x86/0xd0 > > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 > > <4> [<c010470b>] error_code+0x2b/0x30 > > <4> [<c011de68>] pproc_reap+0x228/0x2f0 > > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 > > <4> [<c020ea9f>] dpvproc_nocldwait_async_handler+0x13f/0x300 > > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 > > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 > > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 > > <4> [<c01f77c9>] do_ssisys+0x99/0x200 > > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 > > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 > > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 77 a0 ff ff e9 > 24 > > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 00 00 <0f> 0b > 3f 05 > > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 > > <4> > > kdb> > > kdb> bt > > Stack traceback for pid 2 > > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 *child_reaper > > EBP EIP Function (args) > > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 (0xc8709020, 0x0, 0x0, > > 0xf7d11e4c, 0xf7d11e50) > > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, 0xf7d11e4c, > > 0xf7d11e50, 0x11666) > > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, 0xffffffff, 0x20, > > 0x11666, 0xf7d11e4c) > > 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f (0xed3c9574= , > > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) > > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 (0xc1aeea80, 0x0, > > 0x40000001, 0x0, 0xc022f870) > > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 > > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 > > kdb> call print_task_struct 0xc8709020 > > state=3D0x20 > > flags=3D0x44c > > ptrace=3D0x0 > > lock_depth=3D-1 > > prio=3D116 > > static_prio=3D120 > > array=3D00000000 > > sleep_avg=3D899989756 > > interactive_credit=3D1 > > timestamp=3D269658653961877 > > activated=3D0x0 > > policy=3D0 > > &cpus_allowed=3D0xc870906c > > time_slice=3D49 > > first_time_slice=3D1 > > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 > > mm=3D00000000 > > active_mm=3D00000000 > > binfmt=3Dc04b3e68 > > exit_code=3D9 > > exit_signal=3D-1 > > pdeath_signal=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > personality=3D0x0 > > did_exec=3D0 > > pid=3D71274 > > epid=3D71274 > > ppid=3D71270 > > tgid=3D71271 > > cltnode=3D0 > > p_vproc=3D0xdd600600 > > p_vfparent=3D0x00000000 > > group_leader=3D0xf22f2550 > > &pids=3D0xc87090c4 > > set_child_tid 0x00000000 > > clear_child_tid 0x00000000 > > rt_priority=3D0x0 > > it_real_value=3D0x0 > > it_prof_value=3D0x0 > > it_virt_value=3D0x0 > > it_real_incr=3D0x0 > > it_prof_incr=3D0x0 > > it_virt_incr=3D0x0 > > utime=3D0 > > stime=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > nvcsw=3D3 > > nivcsw=3D0 > > sig_utime=3D0 > > sig_stime=3D0 > > cutime=3D0 > > cstime=3D0 > > sig_nvcsw=3D0 > > sig_nivcsw=3D0 > > cnvcsw=3D0 > > cnivcsw=3D0 > > start_time.tv_sec=3D270328 > > start_time.tv_nsec=3D615704896 > > min_flt=3D0 > > maj_flt=3D0 > > sig_min_flt=3D0 > > sig_maj_flt=3D0 > > cmin_flt=3D0 > > cmaj_flt=3D0 > > uid=3D0 > > euid=3D0 > > suid=3D0 > > fsuid=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > gid=3D0 > > egid=3D0 > > sgid=3D0 > > fsgid=3D0 > > group_info=3D0xeaf0b980 > > cap_effective=3D0xfffffeff > > cap_inheritable=3D0x0 > > cap_permitted=3D0xfffffeff > > keep_capabilities=3D0 > > user=3D0xc0431ae0 > > &rlim=3D0xc1af02c4 > > used_math=3D1 > > comm=3Dcheck_bacula > > locks=3D0 > > link_count=3D0 > > total_link_count=3D1 > > semvsem.undo_list=3Df4648200 > > fs=3D0x00000000 > > files=3D0x00000000 > > namespace=3D0x00000000 > > signal=3D0xc1af0240 > > sighand=3D0xf6c86580 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > &blocked=3D0xc8709484 > > &real_blocked=3D0xc870948c > > &pending=3D0xc8709494 > > sas_ss_sp=3D0x0 > > sas_ss_size=3D0x00000000 > > notifier_data=3D0x00000000 > > notifier_mask=3D0x00000000 > > security=3D0x00000000 > > audit_context=3D0x00000000 > > parent_exec_id=3D0x16 > > self_exec_id=3D0x16 > > journal_info=3D0x00000000 > > proc_dentry=3D0xcb5fe8d4 > > backing_dev_info=3D0x00000000 > > io_context=3D0x00000000 > > ptrace_message=3D0x0 > > last_siginfo=3D0x00000000 > > p_nodetime=3D0 > > p_ticks_delta=3D0 > > icsprio=3D0x0 > > execnode=3D0x00000000 > > node_context=3D1 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > rcopy_task_size=3D0 > > &mosix=3D0xc8709504 > > Function print_task_struct returned 0x0 > > kdb> btp 71274 > > Stack traceback for pid 71274 > > 0xc8709020 71274 71270 0 0 E 0xc87091e0 check_bacula > > EBP EIP Function (args) > > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, 0xf22f4dc0, > 0xc012523f, > > 0xd8017e9c, 0x0) > > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, 0xd8016000) > > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, 0xd8016000, > > 0xd8016000) > > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db (0xd8017f18, > 0xd8017ef8, > > 0xd8017fc4, 0x0, 0xc0218c59) > > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, 0xcc7be550, > > 0xc8709020) > > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 > > 0xc0103cf6 work_notifysig+0x13 > > kdb> btp 71270 > > Stack traceback for pid 71270 > > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb > > EBP EIP Function (args) > > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, 0xe2d7fecc, > > 0xf3eade68) > > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, 0xa0, 0x0, > > 0xf3eadedc) > > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, 0x0, 0x0, 0x0) > > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, 0x80000000, 0x0) > > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 > > 0xc0103c55 sysenter_past_esp+0x52 > > kdb> btp 117331 > > Stack traceback for pid 117331 > > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash > > EBP EIP Function (args) > > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, 0xffffffff, 0x4, > 0x1ca53, > > 0xd5399e94) > > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, 0xffffffff, 0x24, > > 0xbffff038, 0xd5399edc) > > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, 0xbffff038, > 0x0) > > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, 0xbffff038, 0xa, 0x0) > > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 > > 0xc0103c55 sysenter_past_esp+0x52 > > kdb> > > kdb> call print_pvproc 0xdd600600 > > pvp_flag=3D0x63727076 > > pvp_wstate=3D0x1166a > > pvp_pproc=3D0x00000007 > > pvp_head_childl=3D0xdd60061c > > pvp_childl=3D0x00000000 > > pvp_head_pgrpl=3D0x00000000 > > pvp_pgrpl=3D0x00000000 > > pvp_sessionl=3D0x00083049 > > pvp_head_oclist=3D0x00000005 > > pvp_oclist=3D0xc8709020 > > pvp_ppid=3D0 > > pvp_oppid=3D0 > > pvp_sid=3D0 > > pvp_pgid=3D-489161216 > > pvp_pp_sid=3D0 > > pvp_pp_pgid=3D0 > > pvp_fromnode=3D0 > > pvp_tonode=3D71270 > > pvp_cttynode=3D71271 > > pvp_cttydev=3D0x1ca53 > > pvp_jobc=3D71271 > > pvp_pgrp_ldr_seqno=3D1 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > pvp_pgrp_mem_seqno=3D-580909344 > > pvp_fork_sigmigarg=3D-580909332 > > pvp.ml.ml_flag=3D1 > > pvp.ml.ml_shr_count=3D-580909376 > > pvp.ml.ml_excl_count=3D0 > > pvp_loadlevel=3D-580909332 > > pvp_pin=3D0 > > pvp_localview=3D0 > > Function print_pvproc returned 0x0 > > kdb> > > > |
From: Roger T. <rog...@gm...> - 2005-12-30 06:28:08
|
Hi Laura, I got an oops which seems related since the process is exiting. Maybe one of the semaphores got corrupted on this initnode before init failover because initnode failover at the other node got stuck in UCLEANUP waiting for `pidof` to exit. This is the same pidof process spawned by ntp service that we earlier wanted to do a trace to find who had sem. Back then when I turned on the debug messages I didn't find anything because I wasn't able to reproduce the problem and the debug messages slowed down everything to an unusable level. Because I'm away from these machines I think I'm gonna recover this node soon. If you catch this email and want more kdb output let me know soon. Thanks! Roger Unable to handle kernel NULL pointer dereference at virtual address 0000000= 4 printing eip: c01c359d *pde =3D 00000000 Oops: 0000 [#1] Modules linked in: ipt_MASQUERADE nfsd exportfs tun ipt_REJECT ipt_state ipt_multiport iptable_filter iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd ehci_hcd usbcore floppy drbd bonding sata_via libata sk98lin r8169 via_rhine dm_mod CPU: 0 EIP: 0060:[<c01c359d>] Not tainted VLI EFLAGS: 00010246 (2.6.10-bk7-ssi27) EIP is at ssi_semexit+0x3d/0xd0 eax: d8d6cb08 ebx: 00015c87 ecx: 00000000 edx: eb2a3f20 esi: d8d6cb08 edi: f0c22840 ebp: d5de3ee0 esp: d5de3ec4 ds: 007b es: 007b ss: 0068 Process httpd (pid: 89223, threadinfo=3Dd5de2000 task=3Dcdb82550) Stack: c061c260 00140000 f7ffebc0 00000286 00140000 d5de2000 f0c22840 d5de3= f70 c01c318d 00140000 00015c87 f16fd954 cdb82550 00000296 c014d011 f7fff= f00 e362fd24 e362fd20 cdb82550 d5de3f2c 00000001 f7ff2e80 f16fd078 f5c07= 1e0 Call Trace: [<c0104a8f>] show_stack+0x7f/0xa0 [<c0104c25>] show_registers+0x155/0x220 [<c0104fac>] die+0xcc/0x190 [<c011682d>] do_page_fault+0x46d/0x66b [<c010470b>] error_code+0x2b/0x30 [<c01c318d>] exit_sem+0x15d/0x190 [<c011d418>] do_exit+0x148/0x3b0 [<c011d6f5>] do_group_exit+0x35/0x80 [<c011d755>] sys_exit_group+0x15/0x20 [<c0103c55>] sysenter_past_esp+0x52/0x75 Code: c0 8b 5d 0c 89 44 24 04 e8 c1 aa ff ff 85 c0 89 c6 74 33 8b 48 3c 8d 50 3c eb 0c 8d 76 00 39 59 04 74 2b 89 ca 8b 09 85 c9 75 f3 <8b> 41 04 c7 04 24 bc 3c 3e c0 89 44 24 04 e8 50 85 f5 ff 89 34 Entering kdb (current=3D0xcdb82550, pid 89223) Oops: Oops due to oops @ 0xc01c359d eax =3D 0xd8d6cb08 ebx =3D 0x00015c87 ecx =3D 0x00000000 edx =3D 0xeb2a3f20 esi =3D 0xd8d6cb08 edi =3D 0xf0c22840 esp =3D 0xd5de3ec4 eip =3D 0xc01c359d ebp =3D 0xd5de3ee0 xss =3D 0x00000068 xcs =3D 0x00000060 eflags =3D 0x00010= 246 xds =3D 0x0000007b xes =3D 0x0000007b origeax =3D 0xffffffff ®s =3D 0xd5= de3e90 kdb> bt Stack traceback for pid 89223 0xcdb82550 89223 69012 1 0 R 0xcdb82710 *httpd EBP EIP Function (args) 0xd5de3ee0 0xc01c359d ssi_semexit+0x3d (0x140000, 0x15c87, 0xf16fd954, 0xcdb82550, 0x296) 0xd5de3f70 0xc01c318d exit_sem+0x15d (0xcdb82550, 0xebd1d940, 0xc060c010, 0x1c3e2, 0x1) 0xd5de3f9c 0xc011d418 do_exit+0x148 (0x4aa60fb4, 0x0, 0x0) 0xd5de3fb0 0xc011d6f5 do_group_exit+0x35 (0x0) 0xd5de3fbc 0xc011d755 sys_exit_group+0x15 0xc0103c55 sysenter_past_esp+0x52 kdb> id 0xc01c359d 0xc01c359d ssi_semexit+0x3d: mov 0x4(%ecx),%eax 0xc01c35a0 ssi_semexit+0x40: movl $0xc03e3cbc,(%esp,1) 0xc01c35a7 ssi_semexit+0x47: mov %eax,0x4(%esp,1) 0xc01c35ab ssi_semexit+0x4b: call 0xc011bb00 printk 0xc01c35b0 ssi_semexit+0x50: mov %esi,(%esp,1) 0xc01c35b3 ssi_semexit+0x53: call 0xc01be0a0 ipc_unlock 0xc01c35b8 ssi_semexit+0x58: add $0x10,%esp 0xc01c35bb ssi_semexit+0x5b: pop %ebx 0xc01c35bc ssi_semexit+0x5c: pop %esi 0xc01c35bd ssi_semexit+0x5d: pop %edi 0xc01c35be ssi_semexit+0x5e: pop %ebp 0xc01c35bf ssi_semexit+0x5f: ret 0xc01c35c0 ssi_semexit+0x60: mov (%ecx),%eax 0xc01c35c2 ssi_semexit+0x62: xor %edi,%edi 0xc01c35c4 ssi_semexit+0x64: mov %eax,(%edx) 0xc01c35c6 ssi_semexit+0x66: mov 0x40(%esi),%eax On 12/6/05, Laura Ramirez <lau...@hp...> wrote: > > Hi Roger, > > A quick look at the code, there seems to be a comment > about the ptrace vproc path, may need to be reworked for 2.6 merge. > I dont quite remember what the issue was, it obviously hitting > BUG_ON with exit_signal =3D=3D -1. Can you print the vproc and the > pvproc? Below, you printed the pvproc using the vproc ptr which > made it look corrupt, but it really isnt, it was just the wrong print > call. > > laura > > Roger Tsang wrote: > > Laura, > > > > I got an oops exiting from gdb while attached to check_bacula which had > > segfaulted. I eventually fixed the bug in check_bacula, but take a loo= k at > > the oops below. child_reaper was waiting for check_bacula which is in = E > > state. It looks like pvproc got corrupted. > > > > I'll leave this in kdb until tomorrow just in case I left out something= . > > > > Roger > > > > > > pe (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)ptrace_unli= nk: > > vpop_reclaim failed > > <4>------------[ cut here ]------------ > > <1>kernel BUG at kernel/exit.c:1343! > > <1>invalid operand: 0000 [#1] > > <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd exportfs > > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport iptable_filter > > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd ehci_hcd usbcor= e > > floppy drbd bonding sata_via libata sk98lin r8169 via_rhine dm_mod > > <4>CPU: 0 > > <4>EIP: 0060:[<c011da00>] Not tainted VLI > > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) > > <4>EIP is at wait_task_zombie+0x220/0x230 > > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: c0430a2c > > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: f7d11c98 > > <4>ds: 007b es: 007b ss: 0068 > > <4>Process child_reaper (pid: 2, threadinfo=3Df7d10000 task=3Dc1aeea80) > > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 00000000 0000000= 0 > > 00000000 > > <4> 00000001 0001183d 00011667 dd600600 00000000 00000286 f7d11d1= c > > c011de68 > > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 f7d10000 c3378a4= 0 > > f7d11d1c > > more> > > <4>Call Trace: > > <4> [<c0104a8f>] show_stack+0x7f/0xa0 > > <4> [<c0104c25>] show_registers+0x155/0x220 > > <4> [<c0104fac>] die+0xcc/0x190 > > <4> [<c01050f6>] do_trap+0x86/0xd0 > > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 > > <4> [<c010470b>] error_code+0x2b/0x30 > > <4> [<c011de68>] pproc_reap+0x228/0x2f0 > > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 > > <4> [<c020ea9f>] dpvproc_nocldwait_async_handler+0x13f/0x300 > > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 > > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 > > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 > > <4> [<c01f77c9>] do_ssisys+0x99/0x200 > > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 > > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 > > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 77 a0 ff ff e9= 24 > > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 00 00 <0f> 0b = 3f 05 > > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 > > <4> > > kdb> > > kdb> bt > > Stack traceback for pid 2 > > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 *child_reaper > > EBP EIP Function (args) > > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 (0xc8709020, 0x0, 0x0, > > 0xf7d11e4c, 0xf7d11e50) > > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, 0xf7d11e4c, > > 0xf7d11e50, 0x11666) > > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, 0xffffffff, 0x20, > > 0x11666, 0xf7d11e4c) > > 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f (0xed3c9574= , > > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) > > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 (0xc1aeea80, 0x0, > > 0x40000001, 0x0, 0xc022f870) > > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 > > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 > > kdb> call print_task_struct 0xc8709020 > > state=3D0x20 > > flags=3D0x44c > > ptrace=3D0x0 > > lock_depth=3D-1 > > prio=3D116 > > static_prio=3D120 > > array=3D00000000 > > sleep_avg=3D899989756 > > interactive_credit=3D1 > > timestamp=3D269658653961877 > > activated=3D0x0 > > policy=3D0 > > &cpus_allowed=3D0xc870906c > > time_slice=3D49 > > first_time_slice=3D1 > > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 > > mm=3D00000000 > > active_mm=3D00000000 > > binfmt=3Dc04b3e68 > > exit_code=3D9 > > exit_signal=3D-1 > > pdeath_signal=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > personality=3D0x0 > > did_exec=3D0 > > pid=3D71274 > > epid=3D71274 > > ppid=3D71270 > > tgid=3D71271 > > cltnode=3D0 > > p_vproc=3D0xdd600600 > > p_vfparent=3D0x00000000 > > group_leader=3D0xf22f2550 > > &pids=3D0xc87090c4 > > set_child_tid 0x00000000 > > clear_child_tid 0x00000000 > > rt_priority=3D0x0 > > it_real_value=3D0x0 > > it_prof_value=3D0x0 > > it_virt_value=3D0x0 > > it_real_incr=3D0x0 > > it_prof_incr=3D0x0 > > it_virt_incr=3D0x0 > > utime=3D0 > > stime=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > nvcsw=3D3 > > nivcsw=3D0 > > sig_utime=3D0 > > sig_stime=3D0 > > cutime=3D0 > > cstime=3D0 > > sig_nvcsw=3D0 > > sig_nivcsw=3D0 > > cnvcsw=3D0 > > cnivcsw=3D0 > > start_time.tv_sec=3D270328 > > start_time.tv_nsec=3D615704896 > > min_flt=3D0 > > maj_flt=3D0 > > sig_min_flt=3D0 > > sig_maj_flt=3D0 > > cmin_flt=3D0 > > cmaj_flt=3D0 > > uid=3D0 > > euid=3D0 > > suid=3D0 > > fsuid=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > gid=3D0 > > egid=3D0 > > sgid=3D0 > > fsgid=3D0 > > group_info=3D0xeaf0b980 > > cap_effective=3D0xfffffeff > > cap_inheritable=3D0x0 > > cap_permitted=3D0xfffffeff > > keep_capabilities=3D0 > > user=3D0xc0431ae0 > > &rlim=3D0xc1af02c4 > > used_math=3D1 > > comm=3Dcheck_bacula > > locks=3D0 > > link_count=3D0 > > total_link_count=3D1 > > semvsem.undo_list=3Df4648200 > > fs=3D0x00000000 > > files=3D0x00000000 > > namespace=3D0x00000000 > > signal=3D0xc1af0240 > > sighand=3D0xf6c86580 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > &blocked=3D0xc8709484 > > &real_blocked=3D0xc870948c > > &pending=3D0xc8709494 > > sas_ss_sp=3D0x0 > > sas_ss_size=3D0x00000000 > > notifier_data=3D0x00000000 > > notifier_mask=3D0x00000000 > > security=3D0x00000000 > > audit_context=3D0x00000000 > > parent_exec_id=3D0x16 > > self_exec_id=3D0x16 > > journal_info=3D0x00000000 > > proc_dentry=3D0xcb5fe8d4 > > backing_dev_info=3D0x00000000 > > io_context=3D0x00000000 > > ptrace_message=3D0x0 > > last_siginfo=3D0x00000000 > > p_nodetime=3D0 > > p_ticks_delta=3D0 > > icsprio=3D0x0 > > execnode=3D0x00000000 > > node_context=3D1 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > rcopy_task_size=3D0 > > &mosix=3D0xc8709504 > > Function print_task_struct returned 0x0 > > kdb> btp 71274 > > Stack traceback for pid 71274 > > 0xc8709020 71274 71270 0 0 E 0xc87091e0 check_bacula > > EBP EIP Function (args) > > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, 0xf22f4dc0, 0xc012523= f, > > 0xd8017e9c, 0x0) > > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, 0xd8016000) > > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, 0xd8016000, > > 0xd8016000) > > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db (0xd8017f18, 0xd8017e= f8, > > 0xd8017fc4, 0x0, 0xc0218c59) > > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, 0xcc7be550, > > 0xc8709020) > > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 > > 0xc0103cf6 work_notifysig+0x13 > > kdb> btp 71270 > > Stack traceback for pid 71270 > > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb > > EBP EIP Function (args) > > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, 0xe2d7fecc, > > 0xf3eade68) > > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, 0xa0, 0x0, > > 0xf3eadedc) > > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, 0x0, 0x0, 0x0) > > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, 0x80000000, 0x0) > > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 > > 0xc0103c55 sysenter_past_esp+0x52 > > kdb> btp 117331 > > Stack traceback for pid 117331 > > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash > > EBP EIP Function (args) > > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, 0xffffffff, 0x4, 0x1c= a53, > > 0xd5399e94) > > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, 0xffffffff, 0x24, > > 0xbffff038, 0xd5399edc) > > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, 0xbffff038, 0= x0) > > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, 0xbffff038, 0xa, 0x0) > > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 > > 0xc0103c55 sysenter_past_esp+0x52 > > kdb> > > kdb> call print_pvproc 0xdd600600 > > pvp_flag=3D0x63727076 > > pvp_wstate=3D0x1166a > > pvp_pproc=3D0x00000007 > > pvp_head_childl=3D0xdd60061c > > pvp_childl=3D0x00000000 > > pvp_head_pgrpl=3D0x00000000 > > pvp_pgrpl=3D0x00000000 > > pvp_sessionl=3D0x00083049 > > pvp_head_oclist=3D0x00000005 > > pvp_oclist=3D0xc8709020 > > pvp_ppid=3D0 > > pvp_oppid=3D0 > > pvp_sid=3D0 > > pvp_pgid=3D-489161216 > > pvp_pp_sid=3D0 > > pvp_pp_pgid=3D0 > > pvp_fromnode=3D0 > > pvp_tonode=3D71270 > > pvp_cttynode=3D71271 > > pvp_cttydev=3D0x1ca53 > > pvp_jobc=3D71271 > > pvp_pgrp_ldr_seqno=3D1 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > pvp_pgrp_mem_seqno=3D-580909344 > > pvp_fork_sigmigarg=3D-580909332 > > pvp.ml.ml_flag=3D1 > > pvp.ml.ml_shr_count=3D-580909376 > > pvp.ml.ml_excl_count=3D0 > > pvp_loadlevel=3D-580909332 > > pvp_pin=3D0 > > pvp_localview=3D0 > > Function print_pvproc returned 0x0 > > kdb> > > > |
From: Vladimir R. <one...@gm...> - 2006-05-24 21:44:48
|
Laura, Roger, I've got the same problem trying to debug a multithreaded program with gdb: BUG_ON with exit_signal =3D=3D -1 in wait_task_zombie(). I applied a patch for kernel/kernel/exit.c (rev. 1.14 8 Dec 2005) - Is it a correct one? - and now I have dpvproc_nocldwait_async_handler() in an infinite loop, calling pvpop_reap() and receiving -EAGAIN as an error code. Should wait_task_zombie() return (sometimes ;) ) p->pid istead of -EAGAIN? Thanks Vladimir I can provide some info about the processes... On 12/6/05, Laura Ramirez <lau...@hp...> wrote: > > Hi Roger, > > A quick look at the code, there seems to be a comment > about the ptrace vproc path, may need to be reworked for 2.6 merge. > I dont quite remember what the issue was, it obviously hitting > BUG_ON with exit_signal =3D=3D -1. Can you print the vproc and the > pvproc? Below, you printed the pvproc using the vproc ptr which > made it look corrupt, but it really isnt, it was just the wrong print > call. > > laura > > Roger Tsang wrote: > > Laura, > > > > I got an oops exiting from gdb while attached to check_bacula which had > > segfaulted. I eventually fixed the bug in check_bacula, but take a loo= k at > > the oops below. child_reaper was waiting for check_bacula which is in = E > > state. It looks like pvproc got corrupted. > > > > I'll leave this in kdb until tomorrow just in case I left out something= . > > > > Roger > > > > > > pe (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > > impossible type (11)procfs: impossible type (11)procfs: impossible type > > (11)procfs: impossible type (11)procfs: impossible type (11)ptrace_unli= nk: > > vpop_reclaim failed > > <4>------------[ cut here ]------------ > > <1>kernel BUG at kernel/exit.c:1343! > > <1>invalid operand: 0000 [#1] > > <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd exportfs > > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport iptable_filter > > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd ehci_hcd usbcor= e > > floppy drbd bonding sata_via libata sk98lin r8169 via_rhine dm_mod > > <4>CPU: 0 > > <4>EIP: 0060:[<c011da00>] Not tainted VLI > > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) > > <4>EIP is at wait_task_zombie+0x220/0x230 > > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: c0430a2c > > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: f7d11c98 > > <4>ds: 007b es: 007b ss: 0068 > > <4>Process child_reaper (pid: 2, threadinfo=3Df7d10000 task=3Dc1aeea80) > > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 00000000 0000000= 0 > > 00000000 > > <4> 00000001 0001183d 00011667 dd600600 00000000 00000286 f7d11d1= c > > c011de68 > > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 f7d10000 c3378a4= 0 > > f7d11d1c > > more> > > <4>Call Trace: > > <4> [<c0104a8f>] show_stack+0x7f/0xa0 > > <4> [<c0104c25>] show_registers+0x155/0x220 > > <4> [<c0104fac>] die+0xcc/0x190 > > <4> [<c01050f6>] do_trap+0x86/0xd0 > > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 > > <4> [<c010470b>] error_code+0x2b/0x30 > > <4> [<c011de68>] pproc_reap+0x228/0x2f0 > > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 > > <4> [<c020ea9f>] dpvproc_nocldwait_async_handler+0x13f/0x300 > > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 > > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 > > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 > > <4> [<c01f77c9>] do_ssisys+0x99/0x200 > > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 > > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 > > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 77 a0 ff ff e9= 24 > > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 00 00 <0f> 0b = 3f 05 > > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 > > <4> > > kdb> > > kdb> bt > > Stack traceback for pid 2 > > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 *child_reaper > > EBP EIP Function (args) > > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 (0xc8709020, 0x0, 0x0, > > 0xf7d11e4c, 0xf7d11e50) > > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, 0xf7d11e4c, > > 0xf7d11e50, 0x11666) > > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, 0xffffffff, 0x20, > > 0x11666, 0xf7d11e4c) > > 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f (0xed3c9574= , > > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) > > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 (0xc1aeea80, 0x0, > > 0x40000001, 0x0, 0xc022f870) > > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 > > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 > > kdb> call print_task_struct 0xc8709020 > > state=3D0x20 > > flags=3D0x44c > > ptrace=3D0x0 > > lock_depth=3D-1 > > prio=3D116 > > static_prio=3D120 > > array=3D00000000 > > sleep_avg=3D899989756 > > interactive_credit=3D1 > > timestamp=3D269658653961877 > > activated=3D0x0 > > policy=3D0 > > &cpus_allowed=3D0xc870906c > > time_slice=3D49 > > first_time_slice=3D1 > > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 > > mm=3D00000000 > > active_mm=3D00000000 > > binfmt=3Dc04b3e68 > > exit_code=3D9 > > exit_signal=3D-1 > > pdeath_signal=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > personality=3D0x0 > > did_exec=3D0 > > pid=3D71274 > > epid=3D71274 > > ppid=3D71270 > > tgid=3D71271 > > cltnode=3D0 > > p_vproc=3D0xdd600600 > > p_vfparent=3D0x00000000 > > group_leader=3D0xf22f2550 > > &pids=3D0xc87090c4 > > set_child_tid 0x00000000 > > clear_child_tid 0x00000000 > > rt_priority=3D0x0 > > it_real_value=3D0x0 > > it_prof_value=3D0x0 > > it_virt_value=3D0x0 > > it_real_incr=3D0x0 > > it_prof_incr=3D0x0 > > it_virt_incr=3D0x0 > > utime=3D0 > > stime=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > nvcsw=3D3 > > nivcsw=3D0 > > sig_utime=3D0 > > sig_stime=3D0 > > cutime=3D0 > > cstime=3D0 > > sig_nvcsw=3D0 > > sig_nivcsw=3D0 > > cnvcsw=3D0 > > cnivcsw=3D0 > > start_time.tv_sec=3D270328 > > start_time.tv_nsec=3D615704896 > > min_flt=3D0 > > maj_flt=3D0 > > sig_min_flt=3D0 > > sig_maj_flt=3D0 > > cmin_flt=3D0 > > cmaj_flt=3D0 > > uid=3D0 > > euid=3D0 > > suid=3D0 > > fsuid=3D0 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > gid=3D0 > > egid=3D0 > > sgid=3D0 > > fsgid=3D0 > > group_info=3D0xeaf0b980 > > cap_effective=3D0xfffffeff > > cap_inheritable=3D0x0 > > cap_permitted=3D0xfffffeff > > keep_capabilities=3D0 > > user=3D0xc0431ae0 > > &rlim=3D0xc1af02c4 > > used_math=3D1 > > comm=3Dcheck_bacula > > locks=3D0 > > link_count=3D0 > > total_link_count=3D1 > > semvsem.undo_list=3Df4648200 > > fs=3D0x00000000 > > files=3D0x00000000 > > namespace=3D0x00000000 > > signal=3D0xc1af0240 > > sighand=3D0xf6c86580 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > &blocked=3D0xc8709484 > > &real_blocked=3D0xc870948c > > &pending=3D0xc8709494 > > sas_ss_sp=3D0x0 > > sas_ss_size=3D0x00000000 > > notifier_data=3D0x00000000 > > notifier_mask=3D0x00000000 > > security=3D0x00000000 > > audit_context=3D0x00000000 > > parent_exec_id=3D0x16 > > self_exec_id=3D0x16 > > journal_info=3D0x00000000 > > proc_dentry=3D0xcb5fe8d4 > > backing_dev_info=3D0x00000000 > > io_context=3D0x00000000 > > ptrace_message=3D0x0 > > last_siginfo=3D0x00000000 > > p_nodetime=3D0 > > p_ticks_delta=3D0 > > icsprio=3D0x0 > > execnode=3D0x00000000 > > node_context=3D1 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > rcopy_task_size=3D0 > > &mosix=3D0xc8709504 > > Function print_task_struct returned 0x0 > > kdb> btp 71274 > > Stack traceback for pid 71274 > > 0xc8709020 71274 71270 0 0 E 0xc87091e0 check_bacula > > EBP EIP Function (args) > > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, 0xf22f4dc0, 0xc012523= f, > > 0xd8017e9c, 0x0) > > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, 0xd8016000) > > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, 0xd8016000, > > 0xd8016000) > > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db (0xd8017f18, 0xd8017e= f8, > > 0xd8017fc4, 0x0, 0xc0218c59) > > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, 0xcc7be550, > > 0xc8709020) > > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 > > 0xc0103cf6 work_notifysig+0x13 > > kdb> btp 71270 > > Stack traceback for pid 71270 > > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb > > EBP EIP Function (args) > > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, 0xe2d7fecc, > > 0xf3eade68) > > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, 0xa0, 0x0, > > 0xf3eadedc) > > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, 0x0, 0x0, 0x0) > > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, 0x80000000, 0x0) > > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 > > 0xc0103c55 sysenter_past_esp+0x52 > > kdb> btp 117331 > > Stack traceback for pid 117331 > > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash > > EBP EIP Function (args) > > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, 0xffffffff, 0x4, 0x1c= a53, > > 0xd5399e94) > > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, 0xffffffff, 0x24, > > 0xbffff038, 0xd5399edc) > > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, 0xbffff038, 0= x0) > > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, 0xbffff038, 0xa, 0x0) > > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 > > 0xc0103c55 sysenter_past_esp+0x52 > > kdb> > > kdb> call print_pvproc 0xdd600600 > > pvp_flag=3D0x63727076 > > pvp_wstate=3D0x1166a > > pvp_pproc=3D0x00000007 > > pvp_head_childl=3D0xdd60061c > > pvp_childl=3D0x00000000 > > pvp_head_pgrpl=3D0x00000000 > > pvp_pgrpl=3D0x00000000 > > pvp_sessionl=3D0x00083049 > > pvp_head_oclist=3D0x00000005 > > pvp_oclist=3D0xc8709020 > > pvp_ppid=3D0 > > pvp_oppid=3D0 > > pvp_sid=3D0 > > pvp_pgid=3D-489161216 > > pvp_pp_sid=3D0 > > pvp_pp_pgid=3D0 > > pvp_fromnode=3D0 > > pvp_tonode=3D71270 > > pvp_cttynode=3D71271 > > pvp_cttydev=3D0x1ca53 > > pvp_jobc=3D71271 > > pvp_pgrp_ldr_seqno=3D1 > > more> > > Only 'q' or 'Q' are processed at more prompt, input ignored > > pvp_pgrp_mem_seqno=3D-580909344 > > pvp_fork_sigmigarg=3D-580909332 > > pvp.ml.ml_flag=3D1 > > pvp.ml.ml_shr_count=3D-580909376 > > pvp.ml.ml_excl_count=3D0 > > pvp_loadlevel=3D-580909332 > > pvp_pin=3D0 > > pvp_localview=3D0 > > Function print_pvproc returned 0x0 > > kdb> > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log fi= les > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > |
From: Laura R. <lau...@hp...> - 2006-05-24 22:27:16
|
Hi Vladimir, The reason why wait_task_zombie() returns EAGAIN is because its current parent is the debugger gdb instead of its original parent. The original parent needs to reap its child, so a __ptrace_unlink() is done to stop tracing the process and set the parent back to it original parent, returns EAGAIN so the reap can be done from the original parent. So __ptrace_unlink() must have failed since otherwise the second time in it wouldn't go into that code path. So if you can find out why the __ptrace_unlink() is failing, that would be a good start. laura Vladimir Razgulin wrote: > Laura, Roger, > > I've got the same problem trying to debug a multithreaded program with gdb: > BUG_ON with exit_signal == -1 in wait_task_zombie(). > > I applied a patch for kernel/kernel/exit.c (rev. 1.14 8 Dec 2005) - Is > it a correct one? - > and now I have dpvproc_nocldwait_async_handler() in an infinite loop, > calling pvpop_reap() and receiving -EAGAIN as an error code. > > Should wait_task_zombie() return (sometimes ;) ) p->pid istead of -EAGAIN? > Thanks > Vladimir > > I can provide some info about the processes... > > > On 12/6/05, Laura Ramirez <lau...@hp...> wrote: >> >> Hi Roger, >> >> A quick look at the code, there seems to be a comment >> about the ptrace vproc path, may need to be reworked for 2.6 merge. >> I dont quite remember what the issue was, it obviously hitting >> BUG_ON with exit_signal == -1. Can you print the vproc and the >> pvproc? Below, you printed the pvproc using the vproc ptr which >> made it look corrupt, but it really isnt, it was just the wrong print >> call. >> >> laura >> >> Roger Tsang wrote: >> > Laura, >> > >> > I got an oops exiting from gdb while attached to check_bacula which had >> > segfaulted. I eventually fixed the bug in check_bacula, but take a >> look at >> > the oops below. child_reaper was waiting for check_bacula which is >> in E >> > state. It looks like pvproc got corrupted. >> > >> > I'll leave this in kdb until tomorrow just in case I left out >> something. >> > >> > Roger >> > >> > >> > pe (11)procfs: impossible type (11)procfs: impossible type (11)procfs: >> > impossible type (11)procfs: impossible type (11)procfs: impossible type >> > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: >> > impossible type (11)procfs: impossible type (11)procfs: impossible type >> > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: >> > impossible type (11)procfs: impossible type (11)procfs: impossible type >> > (11)procfs: impossible type (11)procfs: impossible type >> (11)ptrace_unlink: >> > vpop_reclaim failed >> > <4>------------[ cut here ]------------ >> > <1>kernel BUG at kernel/exit.c:1343! >> > <1>invalid operand: 0000 [#1] >> > <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd exportfs >> > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport iptable_filter >> > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd ehci_hcd >> usbcore >> > floppy drbd bonding sata_via libata sk98lin r8169 via_rhine dm_mod >> > <4>CPU: 0 >> > <4>EIP: 0060:[<c011da00>] Not tainted VLI >> > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) >> > <4>EIP is at wait_task_zombie+0x220/0x230 >> > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: c0430a2c >> > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: f7d11c98 >> > <4>ds: 007b es: 007b ss: 0068 >> > <4>Process child_reaper (pid: 2, threadinfo=f7d10000 task=c1aeea80) >> > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 00000000 >> 00000000 >> > 00000000 >> > <4> 00000001 0001183d 00011667 dd600600 00000000 00000286 >> f7d11d1c >> > c011de68 >> > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 f7d10000 >> c3378a40 >> > f7d11d1c >> > more> >> > <4>Call Trace: >> > <4> [<c0104a8f>] show_stack+0x7f/0xa0 >> > <4> [<c0104c25>] show_registers+0x155/0x220 >> > <4> [<c0104fac>] die+0xcc/0x190 >> > <4> [<c01050f6>] do_trap+0x86/0xd0 >> > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 >> > <4> [<c010470b>] error_code+0x2b/0x30 >> > <4> [<c011de68>] pproc_reap+0x228/0x2f0 >> > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 >> > <4> [<c020ea9f>] dpvproc_nocldwait_async_handler+0x13f/0x300 >> > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 >> > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 >> > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 >> > <4> [<c01f77c9>] do_ssisys+0x99/0x200 >> > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 >> > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 >> > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 77 a0 ff ff >> e9 24 >> > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 00 00 <0f> >> 0b 3f 05 >> > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 >> > <4> >> > kdb> >> > kdb> bt >> > Stack traceback for pid 2 >> > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 *child_reaper >> > EBP EIP Function (args) >> > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 (0xc8709020, 0x0, 0x0, >> > 0xf7d11e4c, 0xf7d11e50) >> > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, 0xf7d11e4c, >> > 0xf7d11e50, 0x11666) >> > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, 0xffffffff, 0x20, >> > 0x11666, 0xf7d11e4c) >> > 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f >> (0xed3c9574, >> > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) >> > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 (0xc1aeea80, 0x0, >> > 0x40000001, 0x0, 0xc022f870) >> > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 >> > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 >> > kdb> call print_task_struct 0xc8709020 >> > state=0x20 >> > flags=0x44c >> > ptrace=0x0 >> > lock_depth=-1 >> > prio=116 >> > static_prio=120 >> > array=00000000 >> > sleep_avg=899989756 >> > interactive_credit=1 >> > timestamp=269658653961877 >> > activated=0x0 >> > policy=0 >> > &cpus_allowed=0xc870906c >> > time_slice=49 >> > first_time_slice=1 >> > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 >> > mm=00000000 >> > active_mm=00000000 >> > binfmt=c04b3e68 >> > exit_code=9 >> > exit_signal=-1 >> > pdeath_signal=0 >> > more> >> > Only 'q' or 'Q' are processed at more prompt, input ignored >> > personality=0x0 >> > did_exec=0 >> > pid=71274 >> > epid=71274 >> > ppid=71270 >> > tgid=71271 >> > cltnode=0 >> > p_vproc=0xdd600600 >> > p_vfparent=0x00000000 >> > group_leader=0xf22f2550 >> > &pids=0xc87090c4 >> > set_child_tid 0x00000000 >> > clear_child_tid 0x00000000 >> > rt_priority=0x0 >> > it_real_value=0x0 >> > it_prof_value=0x0 >> > it_virt_value=0x0 >> > it_real_incr=0x0 >> > it_prof_incr=0x0 >> > it_virt_incr=0x0 >> > utime=0 >> > stime=0 >> > more> >> > Only 'q' or 'Q' are processed at more prompt, input ignored >> > nvcsw=3 >> > nivcsw=0 >> > sig_utime=0 >> > sig_stime=0 >> > cutime=0 >> > cstime=0 >> > sig_nvcsw=0 >> > sig_nivcsw=0 >> > cnvcsw=0 >> > cnivcsw=0 >> > start_time.tv_sec=270328 >> > start_time.tv_nsec=615704896 >> > min_flt=0 >> > maj_flt=0 >> > sig_min_flt=0 >> > sig_maj_flt=0 >> > cmin_flt=0 >> > cmaj_flt=0 >> > uid=0 >> > euid=0 >> > suid=0 >> > fsuid=0 >> > more> >> > Only 'q' or 'Q' are processed at more prompt, input ignored >> > gid=0 >> > egid=0 >> > sgid=0 >> > fsgid=0 >> > group_info=0xeaf0b980 >> > cap_effective=0xfffffeff >> > cap_inheritable=0x0 >> > cap_permitted=0xfffffeff >> > keep_capabilities=0 >> > user=0xc0431ae0 >> > &rlim=0xc1af02c4 >> > used_math=1 >> > comm=check_bacula >> > locks=0 >> > link_count=0 >> > total_link_count=1 >> > semvsem.undo_list=f4648200 >> > fs=0x00000000 >> > files=0x00000000 >> > namespace=0x00000000 >> > signal=0xc1af0240 >> > sighand=0xf6c86580 >> > more> >> > Only 'q' or 'Q' are processed at more prompt, input ignored >> > &blocked=0xc8709484 >> > &real_blocked=0xc870948c >> > &pending=0xc8709494 >> > sas_ss_sp=0x0 >> > sas_ss_size=0x00000000 >> > notifier_data=0x00000000 >> > notifier_mask=0x00000000 >> > security=0x00000000 >> > audit_context=0x00000000 >> > parent_exec_id=0x16 >> > self_exec_id=0x16 >> > journal_info=0x00000000 >> > proc_dentry=0xcb5fe8d4 >> > backing_dev_info=0x00000000 >> > io_context=0x00000000 >> > ptrace_message=0x0 >> > last_siginfo=0x00000000 >> > p_nodetime=0 >> > p_ticks_delta=0 >> > icsprio=0x0 >> > execnode=0x00000000 >> > node_context=1 >> > more> >> > Only 'q' or 'Q' are processed at more prompt, input ignored >> > rcopy_task_size=0 >> > &mosix=0xc8709504 >> > Function print_task_struct returned 0x0 >> > kdb> btp 71274 >> > Stack traceback for pid 71274 >> > 0xc8709020 71274 71270 0 0 E 0xc87091e0 check_bacula >> > EBP EIP Function (args) >> > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, 0xf22f4dc0, >> 0xc012523f, >> > 0xd8017e9c, 0x0) >> > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, 0xd8016000) >> > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, 0xd8016000, >> > 0xd8016000) >> > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db (0xd8017f18, >> 0xd8017ef8, >> > 0xd8017fc4, 0x0, 0xc0218c59) >> > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, 0xcc7be550, >> > 0xc8709020) >> > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 >> > 0xc0103cf6 work_notifysig+0x13 >> > kdb> btp 71270 >> > Stack traceback for pid 71270 >> > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb >> > EBP EIP Function (args) >> > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, 0xe2d7fecc, >> > 0xf3eade68) >> > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, 0xa0, 0x0, >> > 0xf3eadedc) >> > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, 0x0, 0x0, 0x0) >> > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, 0x80000000, 0x0) >> > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 >> > 0xc0103c55 sysenter_past_esp+0x52 >> > kdb> btp 117331 >> > Stack traceback for pid 117331 >> > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash >> > EBP EIP Function (args) >> > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, 0xffffffff, 0x4, >> 0x1ca53, >> > 0xd5399e94) >> > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, 0xffffffff, 0x24, >> > 0xbffff038, 0xd5399edc) >> > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, >> 0xbffff038, 0x0) >> > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, 0xbffff038, 0xa, 0x0) >> > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 >> > 0xc0103c55 sysenter_past_esp+0x52 >> > kdb> >> > kdb> call print_pvproc 0xdd600600 >> > pvp_flag=0x63727076 >> > pvp_wstate=0x1166a >> > pvp_pproc=0x00000007 >> > pvp_head_childl=0xdd60061c >> > pvp_childl=0x00000000 >> > pvp_head_pgrpl=0x00000000 >> > pvp_pgrpl=0x00000000 >> > pvp_sessionl=0x00083049 >> > pvp_head_oclist=0x00000005 >> > pvp_oclist=0xc8709020 >> > pvp_ppid=0 >> > pvp_oppid=0 >> > pvp_sid=0 >> > pvp_pgid=-489161216 >> > pvp_pp_sid=0 >> > pvp_pp_pgid=0 >> > pvp_fromnode=0 >> > pvp_tonode=71270 >> > pvp_cttynode=71271 >> > pvp_cttydev=0x1ca53 >> > pvp_jobc=71271 >> > pvp_pgrp_ldr_seqno=1 >> > more> >> > Only 'q' or 'Q' are processed at more prompt, input ignored >> > pvp_pgrp_mem_seqno=-580909344 >> > pvp_fork_sigmigarg=-580909332 >> > pvp.ml.ml_flag=1 >> > pvp.ml.ml_shr_count=-580909376 >> > pvp.ml.ml_excl_count=0 >> > pvp_loadlevel=-580909332 >> > pvp_pin=0 >> > pvp_localview=0 >> > Function print_pvproc returned 0x0 >> > kdb> >> > >> >> >> ------------------------------------------------------- >> This SF.net email is sponsored by: Splunk Inc. Do you grep through log >> files >> for problems? Stop! Download the new AJAX search engine that makes >> searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >> _______________________________________________ >> ssic-linux-devel mailing list >> ssi...@li... >> https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel >> > > |
From: Vladimir R. <one...@gm...> - 2006-05-25 00:13:30
|
Laura, That's right, __ptrace_unlink(p) fails every time inside pvpop_rmv_child_from_parent() with -ESRCH because it can't find process p in the PVP(w)->pvp_childl list. The process p seems to be unlinked alredy - I saw that tracing pvp_childl with kdb while stopping inside pvpop_rmv_child_from_parent(). Vladimir On 5/24/06, Laura Ramirez <lau...@hp...> wrote: > Hi Vladimir, > > The reason why wait_task_zombie() returns EAGAIN is because > its current parent is the debugger gdb instead of its original > parent. The original parent needs to reap its child, so a > __ptrace_unlink() is done to stop tracing the process and set > the parent back to it original parent, returns EAGAIN so the > reap can be done from the original parent. So __ptrace_unlink() > must have failed since otherwise the second time in it wouldn't > go into that code path. So if you can find out why the > __ptrace_unlink() is failing, that would be a good start. > > laura > > > Vladimir Razgulin wrote: > > Laura, Roger, > > > > I've got the same problem trying to debug a multithreaded program with = gdb: > > BUG_ON with exit_signal =3D=3D -1 in wait_task_zombie(). > > > > I applied a patch for kernel/kernel/exit.c (rev. 1.14 8 Dec 2005) - Is > > it a correct one? - > > and now I have dpvproc_nocldwait_async_handler() in an infinite loop, > > calling pvpop_reap() and receiving -EAGAIN as an error code. > > > > Should wait_task_zombie() return (sometimes ;) ) p->pid istead of -EAGA= IN? > > Thanks > > Vladimir > > > > I can provide some info about the processes... > > > > > > On 12/6/05, Laura Ramirez <lau...@hp...> wrote: > >> > >> Hi Roger, > >> > >> A quick look at the code, there seems to be a comment > >> about the ptrace vproc path, may need to be reworked for 2.6 merge. > >> I dont quite remember what the issue was, it obviously hitting > >> BUG_ON with exit_signal =3D=3D -1. Can you print the vproc and the > >> pvproc? Below, you printed the pvproc using the vproc ptr which > >> made it look corrupt, but it really isnt, it was just the wrong print > >> call. > >> > >> laura > >> > >> Roger Tsang wrote: > >> > Laura, > >> > > >> > I got an oops exiting from gdb while attached to check_bacula which = had > >> > segfaulted. I eventually fixed the bug in check_bacula, but take a > >> look at > >> > the oops below. child_reaper was waiting for check_bacula which is > >> in E > >> > state. It looks like pvproc got corrupted. > >> > > >> > I'll leave this in kdb until tomorrow just in case I left out > >> something. > >> > > >> > Roger > >> > > >> > > >> > pe (11)procfs: impossible type (11)procfs: impossible type (11)procf= s: > >> > impossible type (11)procfs: impossible type (11)procfs: impossible t= ype > >> > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > >> > impossible type (11)procfs: impossible type (11)procfs: impossible t= ype > >> > (11)procfs: impossible type (11)procfs: impossible type (11)procfs: > >> > impossible type (11)procfs: impossible type (11)procfs: impossible t= ype > >> > (11)procfs: impossible type (11)procfs: impossible type > >> (11)ptrace_unlink: > >> > vpop_reclaim failed > >> > <4>------------[ cut here ]------------ > >> > <1>kernel BUG at kernel/exit.c:1343! > >> > <1>invalid operand: 0000 [#1] > >> > <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd exportfs > >> > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport iptable_filter > >> > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd ehci_hcd > >> usbcore > >> > floppy drbd bonding sata_via libata sk98lin r8169 via_rhine dm_mod > >> > <4>CPU: 0 > >> > <4>EIP: 0060:[<c011da00>] Not tainted VLI > >> > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) > >> > <4>EIP is at wait_task_zombie+0x220/0x230 > >> > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: c0430a2c > >> > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: f7d11c98 > >> > <4>ds: 007b es: 007b ss: 0068 > >> > <4>Process child_reaper (pid: 2, threadinfo=3Df7d10000 task=3Dc1aeea= 80) > >> > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 00000000 > >> 00000000 > >> > 00000000 > >> > <4> 00000001 0001183d 00011667 dd600600 00000000 00000286 > >> f7d11d1c > >> > c011de68 > >> > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 f7d10000 > >> c3378a40 > >> > f7d11d1c > >> > more> > >> > <4>Call Trace: > >> > <4> [<c0104a8f>] show_stack+0x7f/0xa0 > >> > <4> [<c0104c25>] show_registers+0x155/0x220 > >> > <4> [<c0104fac>] die+0xcc/0x190 > >> > <4> [<c01050f6>] do_trap+0x86/0xd0 > >> > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 > >> > <4> [<c010470b>] error_code+0x2b/0x30 > >> > <4> [<c011de68>] pproc_reap+0x228/0x2f0 > >> > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 > >> > <4> [<c020ea9f>] dpvproc_nocldwait_async_handler+0x13f/0x300 > >> > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 > >> > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 > >> > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 > >> > <4> [<c01f77c9>] do_ssisys+0x99/0x200 > >> > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 > >> > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 > >> > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 77 a0 ff ff > >> e9 24 > >> > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 00 00 <0f> > >> 0b 3f 05 > >> > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 > >> > <4> > >> > kdb> > >> > kdb> bt > >> > Stack traceback for pid 2 > >> > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 *child_reaper > >> > EBP EIP Function (args) > >> > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 (0xc8709020, 0x0, 0x0, > >> > 0xf7d11e4c, 0xf7d11e50) > >> > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, 0xf7d11e4c, > >> > 0xf7d11e50, 0x11666) > >> > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, 0xffffffff, 0x20= , > >> > 0x11666, 0xf7d11e4c) > >> > 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f > >> (0xed3c9574, > >> > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) > >> > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 (0xc1aeea80, 0= x0, > >> > 0x40000001, 0x0, 0xc022f870) > >> > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 > >> > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 > >> > kdb> call print_task_struct 0xc8709020 > >> > state=3D0x20 > >> > flags=3D0x44c > >> > ptrace=3D0x0 > >> > lock_depth=3D-1 > >> > prio=3D116 > >> > static_prio=3D120 > >> > array=3D00000000 > >> > sleep_avg=3D899989756 > >> > interactive_credit=3D1 > >> > timestamp=3D269658653961877 > >> > activated=3D0x0 > >> > policy=3D0 > >> > &cpus_allowed=3D0xc870906c > >> > time_slice=3D49 > >> > first_time_slice=3D1 > >> > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 > >> > mm=3D00000000 > >> > active_mm=3D00000000 > >> > binfmt=3Dc04b3e68 > >> > exit_code=3D9 > >> > exit_signal=3D-1 > >> > pdeath_signal=3D0 > >> > more> > >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> > personality=3D0x0 > >> > did_exec=3D0 > >> > pid=3D71274 > >> > epid=3D71274 > >> > ppid=3D71270 > >> > tgid=3D71271 > >> > cltnode=3D0 > >> > p_vproc=3D0xdd600600 > >> > p_vfparent=3D0x00000000 > >> > group_leader=3D0xf22f2550 > >> > &pids=3D0xc87090c4 > >> > set_child_tid 0x00000000 > >> > clear_child_tid 0x00000000 > >> > rt_priority=3D0x0 > >> > it_real_value=3D0x0 > >> > it_prof_value=3D0x0 > >> > it_virt_value=3D0x0 > >> > it_real_incr=3D0x0 > >> > it_prof_incr=3D0x0 > >> > it_virt_incr=3D0x0 > >> > utime=3D0 > >> > stime=3D0 > >> > more> > >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> > nvcsw=3D3 > >> > nivcsw=3D0 > >> > sig_utime=3D0 > >> > sig_stime=3D0 > >> > cutime=3D0 > >> > cstime=3D0 > >> > sig_nvcsw=3D0 > >> > sig_nivcsw=3D0 > >> > cnvcsw=3D0 > >> > cnivcsw=3D0 > >> > start_time.tv_sec=3D270328 > >> > start_time.tv_nsec=3D615704896 > >> > min_flt=3D0 > >> > maj_flt=3D0 > >> > sig_min_flt=3D0 > >> > sig_maj_flt=3D0 > >> > cmin_flt=3D0 > >> > cmaj_flt=3D0 > >> > uid=3D0 > >> > euid=3D0 > >> > suid=3D0 > >> > fsuid=3D0 > >> > more> > >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> > gid=3D0 > >> > egid=3D0 > >> > sgid=3D0 > >> > fsgid=3D0 > >> > group_info=3D0xeaf0b980 > >> > cap_effective=3D0xfffffeff > >> > cap_inheritable=3D0x0 > >> > cap_permitted=3D0xfffffeff > >> > keep_capabilities=3D0 > >> > user=3D0xc0431ae0 > >> > &rlim=3D0xc1af02c4 > >> > used_math=3D1 > >> > comm=3Dcheck_bacula > >> > locks=3D0 > >> > link_count=3D0 > >> > total_link_count=3D1 > >> > semvsem.undo_list=3Df4648200 > >> > fs=3D0x00000000 > >> > files=3D0x00000000 > >> > namespace=3D0x00000000 > >> > signal=3D0xc1af0240 > >> > sighand=3D0xf6c86580 > >> > more> > >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> > &blocked=3D0xc8709484 > >> > &real_blocked=3D0xc870948c > >> > &pending=3D0xc8709494 > >> > sas_ss_sp=3D0x0 > >> > sas_ss_size=3D0x00000000 > >> > notifier_data=3D0x00000000 > >> > notifier_mask=3D0x00000000 > >> > security=3D0x00000000 > >> > audit_context=3D0x00000000 > >> > parent_exec_id=3D0x16 > >> > self_exec_id=3D0x16 > >> > journal_info=3D0x00000000 > >> > proc_dentry=3D0xcb5fe8d4 > >> > backing_dev_info=3D0x00000000 > >> > io_context=3D0x00000000 > >> > ptrace_message=3D0x0 > >> > last_siginfo=3D0x00000000 > >> > p_nodetime=3D0 > >> > p_ticks_delta=3D0 > >> > icsprio=3D0x0 > >> > execnode=3D0x00000000 > >> > node_context=3D1 > >> > more> > >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> > rcopy_task_size=3D0 > >> > &mosix=3D0xc8709504 > >> > Function print_task_struct returned 0x0 > >> > kdb> btp 71274 > >> > Stack traceback for pid 71274 > >> > 0xc8709020 71274 71270 0 0 E 0xc87091e0 check_bacula > >> > EBP EIP Function (args) > >> > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, 0xf22f4dc0, > >> 0xc012523f, > >> > 0xd8017e9c, 0x0) > >> > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, 0xd8016000) > >> > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, 0xd8016000, > >> > 0xd8016000) > >> > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db (0xd8017f18, > >> 0xd8017ef8, > >> > 0xd8017fc4, 0x0, 0xc0218c59) > >> > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, 0xcc7be550, > >> > 0xc8709020) > >> > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 > >> > 0xc0103cf6 work_notifysig+0x13 > >> > kdb> btp 71270 > >> > Stack traceback for pid 71270 > >> > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb > >> > EBP EIP Function (args) > >> > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, 0xe2d7fecc, > >> > 0xf3eade68) > >> > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, 0xa0, 0x= 0, > >> > 0xf3eadedc) > >> > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, 0x0, 0x0, 0= x0) > >> > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, 0x80000000, 0x0) > >> > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 > >> > 0xc0103c55 sysenter_past_esp+0x52 > >> > kdb> btp 117331 > >> > Stack traceback for pid 117331 > >> > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash > >> > EBP EIP Function (args) > >> > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, 0xffffffff, 0x4, > >> 0x1ca53, > >> > 0xd5399e94) > >> > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, 0xffffffff, 0x24, > >> > 0xbffff038, 0xd5399edc) > >> > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, > >> 0xbffff038, 0x0) > >> > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, 0xbffff038, 0xa, 0= x0) > >> > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 > >> > 0xc0103c55 sysenter_past_esp+0x52 > >> > kdb> > >> > kdb> call print_pvproc 0xdd600600 > >> > pvp_flag=3D0x63727076 > >> > pvp_wstate=3D0x1166a > >> > pvp_pproc=3D0x00000007 > >> > pvp_head_childl=3D0xdd60061c > >> > pvp_childl=3D0x00000000 > >> > pvp_head_pgrpl=3D0x00000000 > >> > pvp_pgrpl=3D0x00000000 > >> > pvp_sessionl=3D0x00083049 > >> > pvp_head_oclist=3D0x00000005 > >> > pvp_oclist=3D0xc8709020 > >> > pvp_ppid=3D0 > >> > pvp_oppid=3D0 > >> > pvp_sid=3D0 > >> > pvp_pgid=3D-489161216 > >> > pvp_pp_sid=3D0 > >> > pvp_pp_pgid=3D0 > >> > pvp_fromnode=3D0 > >> > pvp_tonode=3D71270 > >> > pvp_cttynode=3D71271 > >> > pvp_cttydev=3D0x1ca53 > >> > pvp_jobc=3D71271 > >> > pvp_pgrp_ldr_seqno=3D1 > >> > more> > >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> > pvp_pgrp_mem_seqno=3D-580909344 > >> > pvp_fork_sigmigarg=3D-580909332 > >> > pvp.ml.ml_flag=3D1 > >> > pvp.ml.ml_shr_count=3D-580909376 > >> > pvp.ml.ml_excl_count=3D0 > >> > pvp_loadlevel=3D-580909332 > >> > pvp_pin=3D0 > >> > pvp_localview=3D0 > >> > Function print_pvproc returned 0x0 > >> > kdb> > >> > > >> > >> > >> ------------------------------------------------------- > >> This SF.net email is sponsored by: Splunk Inc. Do you grep through log > >> files > >> for problems? Stop! Download the new AJAX search engine that makes > >> searching your log files as easy as surfing the web. DOWNLOAD SPLUNK= ! > >> http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > >> _______________________________________________ > >> ssic-linux-devel mailing list > >> ssi...@li... > >> https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > >> > > > > > |
From: Roger T. <rog...@gm...> - 2006-05-25 04:59:10
|
Just as an experiment maybe you can try re-enabling the following in wait_task_zombie() because __ptrace_unlink() temporarily releases the tasklist_lock while in vpop_reclaim_child(). #ifndef CONFIG_VPROC /* With vprocs, processes cannot reap themselves, so -1 * isnt a valid check. */ if (unlikely(p->exit_signal =3D=3D -1 && p->ptrace =3D=3D 0)) { /* * This can only happen in a race with a ptraced thread * dying on another processor. */ return 0; } Roger On 5/24/06, Vladimir Razgulin <one...@gm...> wrote: > Laura, > > That's right, > __ptrace_unlink(p) fails every time inside > pvpop_rmv_child_from_parent() with -ESRCH > because it can't find process p in the PVP(w)->pvp_childl list. > The process p seems to be unlinked alredy - I saw that tracing > pvp_childl with kdb while stopping inside pvpop_rmv_child_from_parent(). > > Vladimir > > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: > > Hi Vladimir, > > > > The reason why wait_task_zombie() returns EAGAIN is because > > its current parent is the debugger gdb instead of its original > > parent. The original parent needs to reap its child, so a > > __ptrace_unlink() is done to stop tracing the process and set > > the parent back to it original parent, returns EAGAIN so the > > reap can be done from the original parent. So __ptrace_unlink() > > must have failed since otherwise the second time in it wouldn't > > go into that code path. So if you can find out why the > > __ptrace_unlink() is failing, that would be a good start. > > > > laura > > > > > > Vladimir Razgulin wrote: > > > Laura, Roger, > > > > > > I've got the same problem trying to debug a multithreaded program wit= h gdb: > > > BUG_ON with exit_signal =3D=3D -1 in wait_task_zombie(). > > > > > > I applied a patch for kernel/kernel/exit.c (rev. 1.14 8 Dec 2005) - I= s > > > it a correct one? - > > > and now I have dpvproc_nocldwait_async_handler() in an infinite loop, > > > calling pvpop_reap() and receiving -EAGAIN as an error code. > > > > > > Should wait_task_zombie() return (sometimes ;) ) p->pid istead of -EA= GAIN? > > > Thanks > > > Vladimir > > > > > > I can provide some info about the processes... > > > > > > > > > On 12/6/05, Laura Ramirez <lau...@hp...> wrote: > > >> > > >> Hi Roger, > > >> > > >> A quick look at the code, there seems to be a comment > > >> about the ptrace vproc path, may need to be reworked for 2.6 merge. > > >> I dont quite remember what the issue was, it obviously hitting > > >> BUG_ON with exit_signal =3D=3D -1. Can you print the vproc and the > > >> pvproc? Below, you printed the pvproc using the vproc ptr which > > >> made it look corrupt, but it really isnt, it was just the wrong prin= t > > >> call. > > >> > > >> laura > > >> > > >> Roger Tsang wrote: > > >> > Laura, > > >> > > > >> > I got an oops exiting from gdb while attached to check_bacula whic= h had > > >> > segfaulted. I eventually fixed the bug in check_bacula, but take = a > > >> look at > > >> > the oops below. child_reaper was waiting for check_bacula which i= s > > >> in E > > >> > state. It looks like pvproc got corrupted. > > >> > > > >> > I'll leave this in kdb until tomorrow just in case I left out > > >> something. > > >> > > > >> > Roger > > >> > > > >> > > > >> > pe (11)procfs: impossible type (11)procfs: impossible type (11)pro= cfs: > > >> > impossible type (11)procfs: impossible type (11)procfs: impossible= type > > >> > (11)procfs: impossible type (11)procfs: impossible type (11)procfs= : > > >> > impossible type (11)procfs: impossible type (11)procfs: impossible= type > > >> > (11)procfs: impossible type (11)procfs: impossible type (11)procfs= : > > >> > impossible type (11)procfs: impossible type (11)procfs: impossible= type > > >> > (11)procfs: impossible type (11)procfs: impossible type > > >> (11)ptrace_unlink: > > >> > vpop_reclaim failed > > >> > <4>------------[ cut here ]------------ > > >> > <1>kernel BUG at kernel/exit.c:1343! > > >> > <1>invalid operand: 0000 [#1] > > >> > <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd export= fs > > >> > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport iptable_filt= er > > >> > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd ehci_hcd > > >> usbcore > > >> > floppy drbd bonding sata_via libata sk98lin r8169 via_rhine dm_mod > > >> > <4>CPU: 0 > > >> > <4>EIP: 0060:[<c011da00>] Not tainted VLI > > >> > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) > > >> > <4>EIP is at wait_task_zombie+0x220/0x230 > > >> > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: c0430a2c > > >> > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: f7d11c98 > > >> > <4>ds: 007b es: 007b ss: 0068 > > >> > <4>Process child_reaper (pid: 2, threadinfo=3Df7d10000 task=3Dc1ae= ea80) > > >> > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 00000000 > > >> 00000000 > > >> > 00000000 > > >> > <4> 00000001 0001183d 00011667 dd600600 00000000 00000286 > > >> f7d11d1c > > >> > c011de68 > > >> > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 f7d10000 > > >> c3378a40 > > >> > f7d11d1c > > >> > more> > > >> > <4>Call Trace: > > >> > <4> [<c0104a8f>] show_stack+0x7f/0xa0 > > >> > <4> [<c0104c25>] show_registers+0x155/0x220 > > >> > <4> [<c0104fac>] die+0xcc/0x190 > > >> > <4> [<c01050f6>] do_trap+0x86/0xd0 > > >> > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 > > >> > <4> [<c010470b>] error_code+0x2b/0x30 > > >> > <4> [<c011de68>] pproc_reap+0x228/0x2f0 > > >> > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 > > >> > <4> [<c020ea9f>] dpvproc_nocldwait_async_handler+0x13f/0x300 > > >> > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 > > >> > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 > > >> > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 > > >> > <4> [<c01f77c9>] do_ssisys+0x99/0x200 > > >> > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 > > >> > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 > > >> > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 77 a0 ff = ff > > >> e9 24 > > >> > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 00 00 <0f= > > > >> 0b 3f 05 > > >> > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 > > >> > <4> > > >> > kdb> > > >> > kdb> bt > > >> > Stack traceback for pid 2 > > >> > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 *child_reaper > > >> > EBP EIP Function (args) > > >> > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 (0xc8709020, 0x0, 0x0= , > > >> > 0xf7d11e4c, 0xf7d11e50) > > >> > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, 0xf7d11e4= c, > > >> > 0xf7d11e50, 0x11666) > > >> > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, 0xffffffff, 0x= 20, > > >> > 0x11666, 0xf7d11e4c) > > >> > 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f > > >> (0xed3c9574, > > >> > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) > > >> > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 (0xc1aeea80,= 0x0, > > >> > 0x40000001, 0x0, 0xc022f870) > > >> > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 > > >> > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 > > >> > kdb> call print_task_struct 0xc8709020 > > >> > state=3D0x20 > > >> > flags=3D0x44c > > >> > ptrace=3D0x0 > > >> > lock_depth=3D-1 > > >> > prio=3D116 > > >> > static_prio=3D120 > > >> > array=3D00000000 > > >> > sleep_avg=3D899989756 > > >> > interactive_credit=3D1 > > >> > timestamp=3D269658653961877 > > >> > activated=3D0x0 > > >> > policy=3D0 > > >> > &cpus_allowed=3D0xc870906c > > >> > time_slice=3D49 > > >> > first_time_slice=3D1 > > >> > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 > > >> > mm=3D00000000 > > >> > active_mm=3D00000000 > > >> > binfmt=3Dc04b3e68 > > >> > exit_code=3D9 > > >> > exit_signal=3D-1 > > >> > pdeath_signal=3D0 > > >> > more> > > >> > Only 'q' or 'Q' are processed at more prompt, input ignored > > >> > personality=3D0x0 > > >> > did_exec=3D0 > > >> > pid=3D71274 > > >> > epid=3D71274 > > >> > ppid=3D71270 > > >> > tgid=3D71271 > > >> > cltnode=3D0 > > >> > p_vproc=3D0xdd600600 > > >> > p_vfparent=3D0x00000000 > > >> > group_leader=3D0xf22f2550 > > >> > &pids=3D0xc87090c4 > > >> > set_child_tid 0x00000000 > > >> > clear_child_tid 0x00000000 > > >> > rt_priority=3D0x0 > > >> > it_real_value=3D0x0 > > >> > it_prof_value=3D0x0 > > >> > it_virt_value=3D0x0 > > >> > it_real_incr=3D0x0 > > >> > it_prof_incr=3D0x0 > > >> > it_virt_incr=3D0x0 > > >> > utime=3D0 > > >> > stime=3D0 > > >> > more> > > >> > Only 'q' or 'Q' are processed at more prompt, input ignored > > >> > nvcsw=3D3 > > >> > nivcsw=3D0 > > >> > sig_utime=3D0 > > >> > sig_stime=3D0 > > >> > cutime=3D0 > > >> > cstime=3D0 > > >> > sig_nvcsw=3D0 > > >> > sig_nivcsw=3D0 > > >> > cnvcsw=3D0 > > >> > cnivcsw=3D0 > > >> > start_time.tv_sec=3D270328 > > >> > start_time.tv_nsec=3D615704896 > > >> > min_flt=3D0 > > >> > maj_flt=3D0 > > >> > sig_min_flt=3D0 > > >> > sig_maj_flt=3D0 > > >> > cmin_flt=3D0 > > >> > cmaj_flt=3D0 > > >> > uid=3D0 > > >> > euid=3D0 > > >> > suid=3D0 > > >> > fsuid=3D0 > > >> > more> > > >> > Only 'q' or 'Q' are processed at more prompt, input ignored > > >> > gid=3D0 > > >> > egid=3D0 > > >> > sgid=3D0 > > >> > fsgid=3D0 > > >> > group_info=3D0xeaf0b980 > > >> > cap_effective=3D0xfffffeff > > >> > cap_inheritable=3D0x0 > > >> > cap_permitted=3D0xfffffeff > > >> > keep_capabilities=3D0 > > >> > user=3D0xc0431ae0 > > >> > &rlim=3D0xc1af02c4 > > >> > used_math=3D1 > > >> > comm=3Dcheck_bacula > > >> > locks=3D0 > > >> > link_count=3D0 > > >> > total_link_count=3D1 > > >> > semvsem.undo_list=3Df4648200 > > >> > fs=3D0x00000000 > > >> > files=3D0x00000000 > > >> > namespace=3D0x00000000 > > >> > signal=3D0xc1af0240 > > >> > sighand=3D0xf6c86580 > > >> > more> > > >> > Only 'q' or 'Q' are processed at more prompt, input ignored > > >> > &blocked=3D0xc8709484 > > >> > &real_blocked=3D0xc870948c > > >> > &pending=3D0xc8709494 > > >> > sas_ss_sp=3D0x0 > > >> > sas_ss_size=3D0x00000000 > > >> > notifier_data=3D0x00000000 > > >> > notifier_mask=3D0x00000000 > > >> > security=3D0x00000000 > > >> > audit_context=3D0x00000000 > > >> > parent_exec_id=3D0x16 > > >> > self_exec_id=3D0x16 > > >> > journal_info=3D0x00000000 > > >> > proc_dentry=3D0xcb5fe8d4 > > >> > backing_dev_info=3D0x00000000 > > >> > io_context=3D0x00000000 > > >> > ptrace_message=3D0x0 > > >> > last_siginfo=3D0x00000000 > > >> > p_nodetime=3D0 > > >> > p_ticks_delta=3D0 > > >> > icsprio=3D0x0 > > >> > execnode=3D0x00000000 > > >> > node_context=3D1 > > >> > more> > > >> > Only 'q' or 'Q' are processed at more prompt, input ignored > > >> > rcopy_task_size=3D0 > > >> > &mosix=3D0xc8709504 > > >> > Function print_task_struct returned 0x0 > > >> > kdb> btp 71274 > > >> > Stack traceback for pid 71274 > > >> > 0xc8709020 71274 71270 0 0 E 0xc87091e0 check_bacula > > >> > EBP EIP Function (args) > > >> > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, 0xf22f4dc0, > > >> 0xc012523f, > > >> > 0xd8017e9c, 0x0) > > >> > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, 0xd8016000) > > >> > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, 0xd801600= 0, > > >> > 0xd8016000) > > >> > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db (0xd8017f18, > > >> 0xd8017ef8, > > >> > 0xd8017fc4, 0x0, 0xc0218c59) > > >> > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, 0xcc7be550, > > >> > 0xc8709020) > > >> > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 > > >> > 0xc0103cf6 work_notifysig+0x13 > > >> > kdb> btp 71270 > > >> > Stack traceback for pid 71270 > > >> > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb > > >> > EBP EIP Function (args) > > >> > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, 0xe2d7fecc, > > >> > 0xf3eade68) > > >> > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, 0xa0, = 0x0, > > >> > 0xf3eadedc) > > >> > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, 0x0, 0x0,= 0x0) > > >> > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, 0x80000000, 0x= 0) > > >> > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 > > >> > 0xc0103c55 sysenter_past_esp+0x52 > > >> > kdb> btp 117331 > > >> > Stack traceback for pid 117331 > > >> > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash > > >> > EBP EIP Function (args) > > >> > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, 0xffffffff, 0x4, > > >> 0x1ca53, > > >> > 0xd5399e94) > > >> > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, 0xffffffff, 0x2= 4, > > >> > 0xbffff038, 0xd5399edc) > > >> > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, > > >> 0xbffff038, 0x0) > > >> > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, 0xbffff038, 0xa,= 0x0) > > >> > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 > > >> > 0xc0103c55 sysenter_past_esp+0x52 > > >> > kdb> > > >> > kdb> call print_pvproc 0xdd600600 > > >> > pvp_flag=3D0x63727076 > > >> > pvp_wstate=3D0x1166a > > >> > pvp_pproc=3D0x00000007 > > >> > pvp_head_childl=3D0xdd60061c > > >> > pvp_childl=3D0x00000000 > > >> > pvp_head_pgrpl=3D0x00000000 > > >> > pvp_pgrpl=3D0x00000000 > > >> > pvp_sessionl=3D0x00083049 > > >> > pvp_head_oclist=3D0x00000005 > > >> > pvp_oclist=3D0xc8709020 > > >> > pvp_ppid=3D0 > > >> > pvp_oppid=3D0 > > >> > pvp_sid=3D0 > > >> > pvp_pgid=3D-489161216 > > >> > pvp_pp_sid=3D0 > > >> > pvp_pp_pgid=3D0 > > >> > pvp_fromnode=3D0 > > >> > pvp_tonode=3D71270 > > >> > pvp_cttynode=3D71271 > > >> > pvp_cttydev=3D0x1ca53 > > >> > pvp_jobc=3D71271 > > >> > pvp_pgrp_ldr_seqno=3D1 > > >> > more> > > >> > Only 'q' or 'Q' are processed at more prompt, input ignored > > >> > pvp_pgrp_mem_seqno=3D-580909344 > > >> > pvp_fork_sigmigarg=3D-580909332 > > >> > pvp.ml.ml_flag=3D1 > > >> > pvp.ml.ml_shr_count=3D-580909376 > > >> > pvp.ml.ml_excl_count=3D0 > > >> > pvp_loadlevel=3D-580909332 > > >> > pvp_pin=3D0 > > >> > pvp_localview=3D0 > > >> > Function print_pvproc returned 0x0 > > >> > kdb> > > >> > > > >> > > >> > > >> ------------------------------------------------------- > > >> This SF.net email is sponsored by: Splunk Inc. Do you grep through l= og > > >> files > > >> for problems? Stop! Download the new AJAX search engine that makes > > >> searching your log files as easy as surfing the web. DOWNLOAD SPLU= NK! > > >> http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > > >> _______________________________________________ > > >> ssic-linux-devel mailing list > > >> ssi...@li... > > >> https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > > >> > > > > > > > > > |
From: Vladimir R. <one...@gm...> - 2006-05-30 21:21:18
|
Laura, Thank you for ptrace2_reap.patch I've tried it, and it fixed the original problem. However, now I have something looking like a racing condition. Again, I'm debugging a multithreaded program: gdb (pid=73961) has started ios_mon (pid=73976), which started three other threads (pids=73979, 73980, 73981). The thread 73979 was terminated successfully (thanks to ptrace2_reap.patch) and I've got the following: ----------------------------------------------------------------- 0xf5778bb0 73927 73921 0 0 S 0xf5778d70 bash 0xdfd746b0 73961 73927 0 3 D 0xdfd74870 gdb 0xdf990630 73976 73961 0 1 D 0xdf9907f0 ios_mon 0xdf991130 73980 73961 0 2 E 0xdf9912f0 ios_mon 0xdfeaf330 73981 73961 0 0 Z 0xdfeaf4f0 ios_mon 0xf576c030 74134 72761 0 1 D 0xf576c1f0 bash ----------------------------------------------------------------- gdb (73961) wants to do VPROC_LOCK_EXCL for vproc=0xc3a35800, pvproc=0xc3a35824, which is owned by ios_mon (73976) ---------------------------------------------------------------- Stack traceback for pid 73961 0xdfd746b0 73961 73927 0 3 D 0xdfd74870 gdb EBP EIP Function (args) 0xf3e59b08 0xc03dce0c schedule+0x3fc (0xdfd746b0, 0x1, 0xdfd746b0, 0xc0121dd0, 0xf3cafd04) 0xf3e59b40 0xc03dc7f5 __down+0x75 (0x0, 0x79) 0xf3e59b50 0xc03dc9a6 __down_failed+0xa 0xc0227b04 .text.lock.dvp_pvpops+0x56 0xf3e59bbc 0xc0221007 pvpop_add_child_to_parent+0x247 (0xc3a35800, 0xc3a35a00, 0x5, 0x0, 0x0) 0xf3e59c3c 0xc0221cc0 pvpop_reclaim_child+0x360 (0xc3a35a00, 0x120f8, 0x0, 0x0, 0x0) 0xf3e59c64 0xc022c3a2 vpop_reclaim_child+0x52 (0xc3a35a00, 0x120f8, 0x0, 0x0, 0x0) 0xf3e59c94 0xc012f835 __ptrace_unlink+0x75 (0xdf991130, 0x120f8, 0xf3e59cc4, 0x0, 0x0) 0xf3e59cd4 0xc0129b68 wait_task_zombie+0x1c8 (0xdf991130, 0x0, 0x0, 0xf3e59e8c, 0xf3e59ed4) 0xf3e59d20 0xc012a194 pproc_reap+0x324 (0xdf991130, 0x0, 0xf3e59e8c, 0xf3e59ed4, 0x120e9) 0xf3e59e30 0xc0220a8d pvpop_reap+0x26d (0xc3a35a00, 0xffffffff, 0x20, 0x120e9, 0xf3e59e8c) 0xf3e59eb0 0xc022b8db vpop_wait+0x69b (0xf6431400, 0xffffffff, 0xa1, 0xbfeec4ec, 0xf3e59ed4) 0xf3e59f80 0xc0129cec do_wait+0xbc (0xffffffff, 0x80000005, 0x0, 0xbfeec4ec, 0x0) 0xf3e59f9c 0xc0129dcc sys_wait4+0x3c (0xffffffff, 0xbfeec4ec, 0x80000001, 0x0) 0xf3e59fb4 0xc0129df5 sys_waitpid+0x25 0xc0106ae1 syscall_call+0x7 ----------------------------------------------------------------------- md c3a35824 0xc3a35824 00817063 00000001 df990630 00000000 cp......0....... 0xc3a35834 00000000 c3a4a600 00000000 f6431400 ..............C. 0xc3a35844 c3a35a00 00000000 000120e9 00000000 .Z....... ...... 0xc3a35854 000120c7 000120f8 000120c7 000120e9 . ... ... ... .. 0xc3a35864 00000001 00000001 00000001 08800001 ................ 0xc3a35874 00000001 00000001 dead4ead 00000000 .........N...... 0xc3a35884 ffffffff 00000001 df990630 00000001 ........0....... ^^^^^^^^ 0xc3a35894 dead4ead f3e59b20 f42b5d04 df990630 .N.. ....]+.0... ----------------------------------------------------------------------- At the same time ios_mon (73976) tries to lock vproc=0xc3a35a00, pvproc=0xc3a35a24, which is already locked by gdb (73961) ----------------------------------------------------------------------- Stack traceback for pid 73976 0xdf990630 73976 73961 0 1 D 0xdf9907f0 ios_mon EBP EIP Function (args) 0xf3d89e58 0xc03dce0c schedule+0x3fc (0xdf990630, 0x1, 0xdf990630, 0xc0121dd0, 0xc3a35a98) 0xf3d89e90 0xc03dc7f5 __down+0x75 (0x79, 0xc3a35afc) 0xf3d89ea0 0xc03dc9a6 __down_failed+0xa 0xc0227e6c .text.lock.dvp_pvpops+0x3be 0xf3d89f00 0xc0225fe7 pvpop_add_to_originalchild_list+0x1e7 (0xc3a35a00, 0xc3a4a600, 0x0, 0xf3d89f40, 0xf55b4800) 0xf3d89f28 0xc02263b1 pvpop_reassign_original_parent+0xf1 (0xc3a35800, 0x120fc) 0xf3d89f38 0xc022d2c8 vpop_reassign_original_parent+0x18 (0xc3a35800, 0x120fc, 0xf5e96380, 0xf3d89f4c, 0xf3d89f4c) 0xf3d89f5c 0xc0129272 exit_notify+0xc2 (0xdf990630, 0xdf990630, 0xdfeaf330, 0xf3d89f88, 0x0) 0xf3d89f88 0xc0129638 do_exit+0x1e8 (0xdf990630, 0xf5e28584, 0xf5e28080, 0x7f, 0x0) 0xf3d89fa8 0xc0129879 do_group_exit+0x39 (0x7f00) 0xf3d89fb4 0xc0129915 sys_exit_group+0x15 0xc0106ae1 syscall_call+0x7 [1]kdb> ----------------------------------------------------------------------- md c3a35a24 0xc3a35a24 00083069 00000005 df991130 00000000 i0......0....... 0xc3a35a34 00000000 00000000 c3a35800 00000000 .........X...... 0xc3a35a44 00000000 00000000 000120e9 000120f8 ......... ... .. 0xc3a35a54 000120c7 000120f8 000120c7 000120f8 . ... ... ... .. 0xc3a35a64 00000001 00000001 00000001 08800001 ................ 0xc3a35a74 00000000 00000001 dead4ead 00000000 .........N...... 0xc3a35a84 ffffffff 00000001 dfd746b0 00000001 .........F...... ^^^^^^^^ 0xc3a35a94 dead4ead f3d89e70 f3d89e70 dfd746b0 .N..p...p....F.. ---------------------------------------------------------------------- and 0xc3a4a600 here is vproc for pid=73981 I can provide more information about the issue if it's necesary. Do you think that work should be serialized throught nocldwait queue? Thanks, Vladimir On 5/26/06, Laura Ramirez <lau...@hp...> wrote: > > Hi Vladimir, > > Looking at the code more closely, I believe the "ptrace_reap.patch" > I gave you is NOT correct. Please remove that patch. > > Attached is a different patch "ptrace_reap2.patch". In the original > code, the parent who is the gdb process is not getting a SIGCHLD, > instead the process just gets queued to the nocldwait. I believe, the > correct path should be the gdb parent gets the SIGCHLD, goes to do the > reap, sees its ptraced and does the ptrace_unlink(). Assuming the > ptrace_unlink() works, it should return -EAGAIN, do the > PVPOP_REPORT_STATE() to its original parent, sees its original parent is > not reaping children and gets queued to the nocldwait queue, which > should do reap successfully this time and not return -EAGAIN. With the > attached patch, I'm hoping that's going to happen. > > Apply with -p0, let me know if it works or where it deviated from the > code path i stated above. > > thanks, > > laura > > > > Vladimir Razgulin wrote: > > Laura, > > following your advice I got a snapshot of the moment the process > > becomes a zombie and a work item is added into nocldwait_async_queue: > > > > Please, note, that > > the process id is 74226, it's a thread process created by 74223 under gdb, > > and its ppid and oppid are different - see attached info > > > > Thank you for your help > > Vladimir > > > > Do you think I can CC: our conversation to ssic-linux-devel? > > > > > > ------------------------------------------------------------------------------------------------------------------ > > > > 0xdfd019b0 73995 72325 0 1 S 0xdfd01b70 sshd > > 0xf674b130 74000 73995 0 2 S 0xf674b2f0 bash > > 0xf5f078b0 74210 73171 0 2 S 0xf5f07a70 gdb > > 0xc3bbb930 74223 74210 0 3 S 0xc3bbbaf0 ios_mon > > 0xdfeff230 74226 74210 1 3 R 0xdfeff3f0 *ios_mon > > 0xdfeff7b0 74227 74210 0 0 S 0xdfeff970 ios_mon > > 0xf5dd9830 74229 74210 0 1 S 0xf5dd99f0 ios_mon > > ------------------------------------------------------------------------------------------------------------------ > > > > Stack traceback for pid 74226 > > 0xdfeff230 74226 74210 1 3 R 0xdfeff3f0 *ios_mon > > EBP EIP Function (args) > > 0xec599ef4 0xc0220530 dpvproc_nocldwait_async_queue (0xf6bf1a00, > > 0xf6be1800, 0xc040e203, 0xf6ec5efc, 0xec599e84) > > 0xc0224dd3 pvpop_report_state+0x463 (0xf6bf1a00, > > 0xf6be1800, 0xec599f1c, 0xec599f20, 0x1) > > 0xec599f5c 0xc022b3a4 vpop_cleanup_vproc_relations+0x2a4 (0xf6be1800, > > 0xdfeff230, 0xf5e42980, 0xec599f70, 0xec599f70) > > 0xec599f80 0xc01292f7 exit_notify+0x137 (0xdfeff230, 0xc04048da, 0x4, > > 0xec599fbc, 0x0) > > 0xec599fac 0xc0129648 do_exit+0x1e8 > > 0xec599fb4 0xc012983f sys_exit+0xf > > 0xc0106ae1 syscall_call+0x7 > > ---------------------------------------------------------------- > > call print_task_struct 0xdfeff230 > > > > state=0x0 > > flags=0x802044 > > ptrace=0x1f1 > > lock_depth=-1 > > prio=116 > > static_prio=120 > > array=c383055c > > sleep_avg=933888889 > > timestamp=4295805885000000 > > activated=0x0 > > policy=0 > > &cpus_allowed=0xdfeff280 > > time_slice=17 > > first_time_slice=1 > > tasks.next 0xc044fc9c, tasks.prev 0xf5f0790c > > mm=00000000 > > active_mm=f5e42980 > > binfmt=c04dad74 > > exit_code=0 > > exit_signal=-1 > > pdeath_signal=0 > > personality=0x400000 > > did_exec=0 > > pid=74226 > > epid=74226 > > ppid=74210 > > tgid=74223 > > cltnode=0 > > p_vproc=0xf6be1800 > > p_vfparent=0x00000000 > > group_leader=0xc3bbb930 > > &pids=0xdfeff2d8 > > set_child_tid 0x00000000 > > clear_child_tid 0x00000000 > > rt_priority=0x0 > > utime=0 > > stime=0 > > nvcsw=11 > > nivcsw=0 > > sig_utime=0 > > sig_stime=0 > > cutime=0 > > cstime=0 > > sig_nvcsw=0 > > sig_nivcsw=0 > > cnvcsw=0 > > cnivcsw=0 > > start_time.tv_sec=1138 > > start_time.tv_nsec=381796336 > > min_flt=4 > > maj_flt=0 > > sig_min_flt=0 > > sig_maj_flt=0 > > cmin_flt=0 > > cmaj_flt=0 > > uid=0 > > euid=0 > > suid=0 > > fsuid=0 > > gid=0 > > egid=0 > > sgid=0 > > fsgid=0 > > group_info=0xf502c080 > > cap_effective=0xfffffeff > > cap_inheritable=0x0 > > cap_permitted=0xfffffeff > > keep_capabilities=0 > > user=0xc0454c00 > > &rlim=0xdffdd5d4 > > comm=ios_mon > > locks=0 > > link_count=0 > > total_link_count=0 > > semvsem.undo_list=c3b4aa20 > > fs=0x00000000 > > files=0x00000000 > > namespace=0x00000000 > > signal=0xdffdd500 > > sighand=0xf5e29900 > > &blocked=0xdfeff694 > > &real_blocked=0xdfeff69c > > &pending=0xdfeff6a4 > > sas_ss_sp=0x0 > > sas_ss_size=0x00000000 > > notifier_data=0x00000000 > > notifier_mask=0x00000000 > > security=0x00000000 > > audit_context=0x00000000 > > parent_exec_id=0x12 > > self_exec_id=0x12 > > journal_info=0x00000000 > > proc_dentry=0xecf027a0 > > backing_dev_info=0x00000000 > > io_context=0x00000000 > > ptrace_message=0x0 > > last_siginfo=0x00000000 > > p_nodetime=0 > > p_ticks_delta=0 > > icsprio=0x0 > > execnode=0x00000000 > > node_context=1 > > rcopy_task_size=0 > > &mosix=0xdfeff764 > > ---------------------------------------------------------- > > call print_vproc f6be1800 > > > > vp_magic=0x63727076 (should be 0x63727076) > > vp_pid=74226 > > vp_ref_cnt=8 > > vp_data=0xf6be1824 > > vp_hashfwd=0xc3a60200 > > vp_hashbwd=0x00000000 > > Function print_vproc returned 0x19 > > ---------------------------------------------------------------- > > call print_pvproc f6be1824 > > > > pvp_flag=0x83069 > > pvp_wstate=0x5 > > pvp_pproc=0xdfeff230 > > pvp_head_childl=0x00000000 > > pvp_childl=0xf6ec5e00 > > pvp_head_pgrpl=0x00000000 > > pvp_pgrpl=0xf6ec5e00 > > pvp_sessionl=0x00000000 > > pvp_head_oclist=0x00000000 > > pvp_oclist=0x00000000 > > pvp_ppid=74210 > > pvp_oppid=74223 > > pvp_sid=73171 > > pvp_pgid=74223 > > pvp_pp_sid=73171 > > pvp_pp_pgid=74223 > > pvp_fromnode=1 > > pvp_tonode=1 > > pvp_cttynode=1 > > pvp_cttydev=0x8800000 > > pvp_jobc=0 > > pvp_pgrp_ldr_seqno=0 > > pvp_pgrp_mem_seqno=0 > > pvp_fork_sigmigarg=0 > > pvp.ml.ml_flag=0 > > pvp.ml.ml_shr_count=0 > > pvp.ml.ml_excl_count=0 > > pvp_loadlevel=0 > > pvp_pin=0 > > pvp_localview=0 > > Function print_pvproc returned 0x13 > > ---------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > On 5/25/06, Laura Ramirez <lau...@hp...> wrote: > >> > >> Hi Vladimir, > >> > >> maybe we need to take a step back, I was focused on why the > >> ptrace_unlink() call was failing, which was because the child > >> had already gotten removed from the parent list. Which is why > >> I created the following patch, which only removed the child from > >> the parent list if the child wasnt being ptraced (ie pvp_oppid == 0). > >> So the ! is correct for that logic. > >> > >> But maybe we need to look at why this process is being reaped by > >> the nocld_wait_daemon, instead of the gdb parent. > >> > >> Can you do a " call print_task_struct < address>" of the zombie > >> processes? > >> > >> thanks > >> > >> laura > >> > >> Vladimir Razgulin wrote: > >> > Laura, > >> > The patch you sent doesn't resolve the issue: sometimes everything > >> > works, sometimes gdb hangs with 2 zombies, sometimes the whole system > >> > dies. > >> > I took a look at the code in the patch:: > >> > > >> > + error = 0; > >> > + if (!PVP(vc)->pvp_oppid) { > >> > + error = PVPOP_RMV_CHILD_FROM_PARENT(vp, vc, FALSE); > >> > + ptrace = 1; > >> > + } > >> > > >> > Should it be without "!", liike > >> > + if ( PVP(vc)->pvp_oppid) { > >> > > >> > Thanks > >> > Vladimir > >> > > >> > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: > >> >> > >> >> I believe the __ptrace_unlink is failing because the child > >> >> gets removed from the parent in the beginning of > >> >> dpvproc_nocldwait_async_handler(), which is the right thing > >> >> to do if the process isn't being ptraced. > >> >> > >> >> Attach is a patch that addresses this problem. Can you please > >> >> apply and test. (use -p0 to apply) > >> >> > >> >> laura > >> >> > >> >> > >> >> Vladimir Razgulin wrote: > >> >> > Laura, > >> >> > > >> >> > That's right, > >> >> > __ptrace_unlink(p) fails every time inside > >> >> > pvpop_rmv_child_from_parent() with -ESRCH > >> >> > because it can't find process p in the PVP(w)->pvp_childl list. > >> >> > The process p seems to be unlinked alredy - I saw that tracing > >> >> > pvp_childl with kdb while stopping inside > >> >> pvpop_rmv_child_from_parent(). > >> >> > > >> >> > Vladimir > >> >> > > >> >> > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: > >> >> >> Hi Vladimir, > >> >> >> > >> >> >> The reason why wait_task_zombie() returns EAGAIN is because > >> >> >> its current parent is the debugger gdb instead of its original > >> >> >> parent. The original parent needs to reap its child, so a > >> >> >> __ptrace_unlink() is done to stop tracing the process and set > >> >> >> the parent back to it original parent, returns EAGAIN so the > >> >> >> reap can be done from the original parent. So __ptrace_unlink() > >> >> >> must have failed since otherwise the second time in it wouldn't > >> >> >> go into that code path. So if you can find out why the > >> >> >> __ptrace_unlink() is failing, that would be a good start. > >> >> >> > >> >> >> laura > >> >> >> > >> >> >> > >> >> >> Vladimir Razgulin wrote: > >> >> >> > Laura, Roger, > >> >> >> > > >> >> >> > I've got the same problem trying to debug a multithreaded program > >> >> >> with gdb: > >> >> >> > BUG_ON with exit_signal == -1 in wait_task_zombie(). > >> >> >> > > >> >> >> > I applied a patch for kernel/kernel/exit.c (rev. 1.14 8 Dec 2005) > >> >> - Is > >> >> >> > it a correct one? - > >> >> >> > and now I have dpvproc_nocldwait_async_handler() in an infinite > >> >> loop, > >> >> >> > calling pvpop_reap() and receiving -EAGAIN as an error code. > >> >> >> > > >> >> >> > Should wait_task_zombie() return (sometimes ;) ) p->pid istead of > >> >> >> -EAGAIN? > >> >> >> > Thanks > >> >> >> > Vladimir > >> >> >> > > >> >> >> > I can provide some info about the processes... > >> >> >> > > >> >> >> > > >> >> >> > On 12/6/05, Laura Ramirez <lau...@hp...> wrote: > >> >> >> >> > >> >> >> >> Hi Roger, > >> >> >> >> > >> >> >> >> A quick look at the code, there seems to be a comment > >> >> >> >> about the ptrace vproc path, may need to be reworked for 2.6 > >> merge. > >> >> >> >> I dont quite remember what the issue was, it obviously hitting > >> >> >> >> BUG_ON with exit_signal == -1. Can you print the vproc and the > >> >> >> >> pvproc? Below, you printed the pvproc using the vproc ptr which > >> >> >> >> made it look corrupt, but it really isnt, it was just the wrong > >> >> print > >> >> >> >> call. > >> >> >> >> > >> >> >> >> laura > >> >> >> >> > >> >> >> >> Roger Tsang wrote: > >> >> >> >> > Laura, > >> >> >> >> > > >> >> >> >> > I got an oops exiting from gdb while attached to check_bacula > >> >> >> which had > >> >> >> >> > segfaulted. I eventually fixed the bug in check_bacula, but > >> >> take a > >> >> >> >> look at > >> >> >> >> > the oops below. child_reaper was waiting for check_bacula > >> >> which is > >> >> >> >> in E > >> >> >> >> > state. It looks like pvproc got corrupted. > >> >> >> >> > > >> >> >> >> > I'll leave this in kdb until tomorrow just in case I left out > >> >> >> >> something. > >> >> >> >> > > >> >> >> >> > Roger > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > pe (11)procfs: impossible type (11)procfs: impossible type > >> >> >> (11)procfs: > >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: > >> >> >> impossible type > >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type > >> >> (11)procfs: > >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: > >> >> >> impossible type > >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type > >> >> (11)procfs: > >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: > >> >> >> impossible type > >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type > >> >> >> >> (11)ptrace_unlink: > >> >> >> >> > vpop_reclaim failed > >> >> >> >> > <4>------------[ cut here ]------------ > >> >> >> >> > <1>kernel BUG at kernel/exit.c:1343! > >> >> >> >> > <1>invalid operand: 0000 [#1] > >> >> >> >> > <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd > >> >> exportfs > >> >> >> >> > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport > >> >> iptable_filter > >> >> >> >> > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd > >> ehci_hcd > >> >> >> >> usbcore > >> >> >> >> > floppy drbd bonding sata_via libata sk98lin r8169 via_rhine > >> >> dm_mod > >> >> >> >> > <4>CPU: 0 > >> >> >> >> > <4>EIP: 0060:[<c011da00>] Not tainted VLI > >> >> >> >> > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) > >> >> >> >> > <4>EIP is at wait_task_zombie+0x220/0x230 > >> >> >> >> > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: > >> c0430a2c > >> >> >> >> > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: > >> f7d11c98 > >> >> >> >> > <4>ds: 007b es: 007b ss: 0068 > >> >> >> >> > <4>Process child_reaper (pid: 2, threadinfo=f7d10000 > >> >> task=c1aeea80) > >> >> >> >> > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 > >> 00000000 > >> >> >> >> 00000000 > >> >> >> >> > 00000000 > >> >> >> >> > <4> 00000001 0001183d 00011667 dd600600 00000000 > >> 00000286 > >> >> >> >> f7d11d1c > >> >> >> >> > c011de68 > >> >> >> >> > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 > >> f7d10000 > >> >> >> >> c3378a40 > >> >> >> >> > f7d11d1c > >> >> >> >> > more> > >> >> >> >> > <4>Call Trace: > >> >> >> >> > <4> [<c0104a8f>] show_stack+0x7f/0xa0 > >> >> >> >> > <4> [<c0104c25>] show_registers+0x155/0x220 > >> >> >> >> > <4> [<c0104fac>] die+0xcc/0x190 > >> >> >> >> > <4> [<c01050f6>] do_trap+0x86/0xd0 > >> >> >> >> > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 > >> >> >> >> > <4> [<c010470b>] error_code+0x2b/0x30 > >> >> >> >> > <4> [<c011de68>] pproc_reap+0x228/0x2f0 > >> >> >> >> > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 > >> >> >> >> > <4> [<c020ea9f>] dpvproc_nocldwait_async_handler+0x13f/0x300 > >> >> >> >> > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 > >> >> >> >> > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 > >> >> >> >> > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 > >> >> >> >> > <4> [<c01f77c9>] do_ssisys+0x99/0x200 > >> >> >> >> > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 > >> >> >> >> > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 > >> >> >> >> > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 77 a0 > >> >> ff ff > >> >> >> >> e9 24 > >> >> >> >> > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 00 00 > >> >> <0f> > >> >> >> >> 0b 3f 05 > >> >> >> >> > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 > >> >> >> >> > <4> > >> >> >> >> > kdb> > >> >> >> >> > kdb> bt > >> >> >> >> > Stack traceback for pid 2 > >> >> >> >> > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 > >> >> *child_reaper > >> >> >> >> > EBP EIP Function (args) > >> >> >> >> > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 (0xc8709020, 0x0, > >> >> 0x0, > >> >> >> >> > 0xf7d11e4c, 0xf7d11e50) > >> >> >> >> > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, > >> >> 0xf7d11e4c, > >> >> >> >> > 0xf7d11e50, 0x11666) > >> >> >> >> > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, > >> 0xffffffff, > >> >> >> 0x20, > >> >> >> >> > 0x11666, 0xf7d11e4c) > >> >> >> >> > 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f > >> >> >> >> (0xed3c9574, > >> >> >> >> > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) > >> >> >> >> > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 > >> >> >> (0xc1aeea80, 0x0, > >> >> >> >> > 0x40000001, 0x0, 0xc022f870) > >> >> >> >> > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 > >> >> >> >> > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 > >> >> >> >> > kdb> call print_task_struct 0xc8709020 > >> >> >> >> > state=0x20 > >> >> >> >> > flags=0x44c > >> >> >> >> > ptrace=0x0 > >> >> >> >> > lock_depth=-1 > >> >> >> >> > prio=116 > >> >> >> >> > static_prio=120 > >> >> >> >> > array=00000000 > >> >> >> >> > sleep_avg=899989756 > >> >> >> >> > interactive_credit=1 > >> >> >> >> > timestamp=269658653961877 > >> >> >> >> > activated=0x0 > >> >> >> >> > policy=0 > >> >> >> >> > &cpus_allowed=0xc870906c > >> >> >> >> > time_slice=49 > >> >> >> >> > first_time_slice=1 > >> >> >> >> > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 > >> >> >> >> > mm=00000000 > >> >> >> >> > active_mm=00000000 > >> >> >> >> > binfmt=c04b3e68 > >> >> >> >> > exit_code=9 > >> >> >> >> > exit_signal=-1 > >> >> >> >> > pdeath_signal=0 > >> >> >> >> > more> > >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> >> >> >> > personality=0x0 > >> >> >> >> > did_exec=0 > >> >> >> >> > pid=71274 > >> >> >> >> > epid=71274 > >> >> >> >> > ppid=71270 > >> >> >> >> > tgid=71271 > >> >> >> >> > cltnode=0 > >> >> >> >> > p_vproc=0xdd600600 > >> >> >> >> > p_vfparent=0x00000000 > >> >> >> >> > group_leader=0xf22f2550 > >> >> >> >> > &pids=0xc87090c4 > >> >> >> >> > set_child_tid 0x00000000 > >> >> >> >> > clear_child_tid 0x00000000 > >> >> >> >> > rt_priority=0x0 > >> >> >> >> > it_real_value=0x0 > >> >> >> >> > it_prof_value=0x0 > >> >> >> >> > it_virt_value=0x0 > >> >> >> >> > it_real_incr=0x0 > >> >> >> >> > it_prof_incr=0x0 > >> >> >> >> > it_virt_incr=0x0 > >> >> >> >> > utime=0 > >> >> >> >> > stime=0 > >> >> >> >> > more> > >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> >> >> >> > nvcsw=3 > >> >> >> >> > nivcsw=0 > >> >> >> >> > sig_utime=0 > >> >> >> >> > sig_stime=0 > >> >> >> >> > cutime=0 > >> >> >> >> > cstime=0 > >> >> >> >> > sig_nvcsw=0 > >> >> >> >> > sig_nivcsw=0 > >> >> >> >> > cnvcsw=0 > >> >> >> >> > cnivcsw=0 > >> >> >> >> > start_time.tv_sec=270328 > >> >> >> >> > start_time.tv_nsec=615704896 > >> >> >> >> > min_flt=0 > >> >> >> >> > maj_flt=0 > >> >> >> >> > sig_min_flt=0 > >> >> >> >> > sig_maj_flt=0 > >> >> >> >> > cmin_flt=0 > >> >> >> >> > cmaj_flt=0 > >> >> >> >> > uid=0 > >> >> >> >> > euid=0 > >> >> >> >> > suid=0 > >> >> >> >> > fsuid=0 > >> >> >> >> > more> > >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> >> >> >> > gid=0 > >> >> >> >> > egid=0 > >> >> >> >> > sgid=0 > >> >> >> >> > fsgid=0 > >> >> >> >> > group_info=0xeaf0b980 > >> >> >> >> > cap_effective=0xfffffeff > >> >> >> >> > cap_inheritable=0x0 > >> >> >> >> > cap_permitted=0xfffffeff > >> >> >> >> > keep_capabilities=0 > >> >> >> >> > user=0xc0431ae0 > >> >> >> >> > &rlim=0xc1af02c4 > >> >> >> >> > used_math=1 > >> >> >> >> > comm=check_bacula > >> >> >> >> > locks=0 > >> >> >> >> > link_count=0 > >> >> >> >> > total_link_count=1 > >> >> >> >> > semvsem.undo_list=f4648200 > >> >> >> >> > fs=0x00000000 > >> >> >> >> > files=0x00000000 > >> >> >> >> > namespace=0x00000000 > >> >> >> >> > signal=0xc1af0240 > >> >> >> >> > sighand=0xf6c86580 > >> >> >> >> > more> > >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> >> >> >> > &blocked=0xc8709484 > >> >> >> >> > &real_blocked=0xc870948c > >> >> >> >> > &pending=0xc8709494 > >> >> >> >> > sas_ss_sp=0x0 > >> >> >> >> > sas_ss_size=0x00000000 > >> >> >> >> > notifier_data=0x00000000 > >> >> >> >> > notifier_mask=0x00000000 > >> >> >> >> > security=0x00000000 > >> >> >> >> > audit_context=0x00000000 > >> >> >> >> > parent_exec_id=0x16 > >> >> >> >> > self_exec_id=0x16 > >> >> >> >> > journal_info=0x00000000 > >> >> >> >> > proc_dentry=0xcb5fe8d4 > >> >> >> >> > backing_dev_info=0x00000000 > >> >> >> >> > io_context=0x00000000 > >> >> >> >> > ptrace_message=0x0 > >> >> >> >> > last_siginfo=0x00000000 > >> >> >> >> > p_nodetime=0 > >> >> >> >> > p_ticks_delta=0 > >> >> >> >> > icsprio=0x0 > >> >> >> >> > execnode=0x00000000 > >> >> >> >> > node_context=1 > >> >> >> >> > more> > >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> >> >> >> > rcopy_task_size=0 > >> >> >> >> > &mosix=0xc8709504 > >> >> >> >> > Function print_task_struct returned 0x0 > >> >> >> >> > kdb> btp 71274 > >> >> >> >> > Stack traceback for pid 71274 > >> >> >> >> > 0xc8709020 71274 71270 0 0 E 0xc87091e0 > >> >> check_bacula > >> >> >> >> > EBP EIP Function (args) > >> >> >> >> > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, 0xf22f4dc0, > >> >> >> >> 0xc012523f, > >> >> >> >> > 0xd8017e9c, 0x0) > >> >> >> >> > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, > >> 0xd8016000) > >> >> >> >> > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, > >> >> 0xd8016000, > >> >> >> >> > 0xd8016000) > >> >> >> >> > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db (0xd8017f18, > >> >> >> >> 0xd8017ef8, > >> >> >> >> > 0xd8017fc4, 0x0, 0xc0218c59) > >> >> >> >> > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, > >> >> 0xcc7be550, > >> >> >> >> > 0xc8709020) > >> >> >> >> > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 > >> >> >> >> > 0xc0103cf6 work_notifysig+0x13 > >> >> >> >> > kdb> btp 71270 > >> >> >> >> > Stack traceback for pid 71270 > >> >> >> >> > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb > >> >> >> >> > EBP EIP Function (args) > >> >> >> >> > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, > >> >> 0xe2d7fecc, > >> >> >> >> > 0xf3eade68) > >> >> >> >> > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, > >> 0xa0, > >> >> >> 0x0, > >> >> >> >> > 0xf3eadedc) > >> >> >> >> > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, 0x0, > >> >> >> 0x0, 0x0) > >> >> >> >> > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, > >> >> 0x80000000, 0x0) > >> >> >> >> > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 > >> >> >> >> > 0xc0103c55 sysenter_past_esp+0x52 > >> >> >> >> > kdb> btp 117331 > >> >> >> >> > Stack traceback for pid 117331 > >> >> >> >> > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash > >> >> >> >> > EBP EIP Function (args) > >> >> >> >> > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, 0xffffffff, > >> >> 0x4, > >> >> >> >> 0x1ca53, > >> >> >> >> > 0xd5399e94) > >> >> >> >> > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, 0xffffffff, > >> >> 0x24, > >> >> >> >> > 0xbffff038, 0xd5399edc) > >> >> >> >> > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, > >> >> >> >> 0xbffff038, 0x0) > >> >> >> >> > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, 0xbffff038, > >> >> >> 0xa, 0x0) > >> >> >> >> > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 > >> >> >> >> > 0xc0103c55 sysenter_past_esp+0x52 > >> >> >> >> > kdb> > >> >> >> >> > kdb> call print_pvproc 0xdd600600 > >> >> >> >> > pvp_flag=0x63727076 > >> >> >> >> > pvp_wstate=0x1166a > >> >> >> >> > pvp_pproc=0x00000007 > >> >> >> >> > pvp_head_childl=0xdd60061c > >> >> >> >> > pvp_childl=0x00000000 > >> >> >> >> > pvp_head_pgrpl=0x00000000 > >> >> >> >> > pvp_pgrpl=0x00000000 > >> >> >> >> > pvp_sessionl=0x00083049 > >> >> >> >> > pvp_head_oclist=0x00000005 > >> >> >> >> > pvp_oclist=0xc8709020 > >> >> >> >> > pvp_ppid=0 > >> >> >> >> > pvp_oppid=0 > >> >> >> >> > pvp_sid=0 > >> >> >> >> > pvp_pgid=-489161216 > >> >> >> >> > pvp_pp_sid=0 > >> >> >> >> > pvp_pp_pgid=0 > >> >> >> >> > pvp_fromnode=0 > >> >> >> >> > pvp_tonode=71270 > >> >> >> >> > pvp_cttynode=71271 > >> >> >> >> > pvp_cttydev=0x1ca53 > >> >> >> >> > pvp_jobc=71271 > >> >> >> >> > pvp_pgrp_ldr_seqno=1 > >> >> >> >> > more> > >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> >> >> >> > pvp_pgrp_mem_seqno=-580909344 > >> >> >> >> > pvp_fork_sigmigarg=-580909332 > >> >> >> >> > pvp.ml.ml_flag=1 > >> >> >> >> > pvp.ml.ml_shr_count=-580909376 > >> >> >> >> > pvp.ml.ml_excl_count=0 > >> >> >> >> > pvp_loadlevel=-580909332 > >> >> >> >> > pvp_pin=0 > >> >> >> >> > pvp_localview=0 > >> >> >> >> > Function print_pvproc returned 0x0 > >> >> >> >> > kdb> > >> >> >> >> > > >> >> >> >> > >> >> >> >> > >> >> >> >> ------------------------------------------------------- > >> >> >> >> This SF.net email is sponsored by: Splunk Inc. Do you grep > >> >> through log > >> >> >> >> files > >> >> >> >> for problems? Stop! Download the new AJAX search engine that > >> >> makes > >> >> >> >> searching your log files as easy as surfing the web. DOWNLOAD > >> >> >> SPLUNK! > >> >> >> >> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > >> >> >> >> _______________________________________________ > >> >> >> >> ssic-linux-devel mailing list > >> >> >> >> ssi...@li... > >> >> >> >> https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > >> >> >> >> > >> >> >> > > >> >> >> > > >> >> >> > >> >> > > >> >> > > >> >> > >> >> > >> >> > >> > > >> > > >> > > > > > > > |
From: Laura R. <lau...@hp...> - 2006-05-30 23:12:12
Attachments:
thread_deadlock.patch
|
HI Vladmimir, Good to hear that the second patch fixed the original problem. From looking at the stack traces, it does appear to be a deadlock. Pid 73961 is reaping pid 73980. It wants to reassign it to its original parent 73976. It currently has the vproc lock for 73980 and wants the vproc lock for 73976. However, 73976 is currently exiting, and its trying to reassign its children to next thread (ie. 73980). So it currently has its own 73976 lock and wants the vproc lock for 73980, and so it's now deadlocked. Besides the deadlock bug, maybe the real problem is 73976 reassigning the children to 73980, which has already run through its exit code and will never clean up the list again. Attached is a patch that checks the state of the thread, if its zombie, then it reassigns it to the init process. laura Vladimir Razgulin wrote: > Laura, > > Thank you for ptrace2_reap.patch > I've tried it, and it fixed the original problem. > However, now I have something looking like a racing condition. > > Again, I'm debugging a multithreaded program: > > gdb (pid=73961) has started ios_mon (pid=73976), which started three other > threads (pids=73979, 73980, 73981). > The thread 73979 was terminated successfully (thanks to ptrace2_reap.patch) > and I've got the following: > > ----------------------------------------------------------------- > 0xf5778bb0 73927 73921 0 0 S 0xf5778d70 bash > 0xdfd746b0 73961 73927 0 3 D 0xdfd74870 gdb > 0xdf990630 73976 73961 0 1 D 0xdf9907f0 ios_mon > 0xdf991130 73980 73961 0 2 E 0xdf9912f0 ios_mon > 0xdfeaf330 73981 73961 0 0 Z 0xdfeaf4f0 ios_mon > 0xf576c030 74134 72761 0 1 D 0xf576c1f0 bash > ----------------------------------------------------------------- > > gdb (73961) wants to do VPROC_LOCK_EXCL for vproc=0xc3a35800, > pvproc=0xc3a35824, > which is owned by ios_mon (73976) > > ---------------------------------------------------------------- > Stack traceback for pid 73961 > 0xdfd746b0 73961 73927 0 3 D 0xdfd74870 gdb > EBP EIP Function (args) > 0xf3e59b08 0xc03dce0c schedule+0x3fc (0xdfd746b0, 0x1, 0xdfd746b0, > 0xc0121dd0, 0xf3cafd04) > 0xf3e59b40 0xc03dc7f5 __down+0x75 (0x0, 0x79) > 0xf3e59b50 0xc03dc9a6 __down_failed+0xa > 0xc0227b04 .text.lock.dvp_pvpops+0x56 > 0xf3e59bbc 0xc0221007 pvpop_add_child_to_parent+0x247 (0xc3a35800, > 0xc3a35a00, 0x5, 0x0, 0x0) > 0xf3e59c3c 0xc0221cc0 pvpop_reclaim_child+0x360 (0xc3a35a00, 0x120f8, > 0x0, 0x0, 0x0) > 0xf3e59c64 0xc022c3a2 vpop_reclaim_child+0x52 (0xc3a35a00, 0x120f8, > 0x0, 0x0, 0x0) > 0xf3e59c94 0xc012f835 __ptrace_unlink+0x75 (0xdf991130, 0x120f8, > 0xf3e59cc4, 0x0, 0x0) > 0xf3e59cd4 0xc0129b68 wait_task_zombie+0x1c8 (0xdf991130, 0x0, 0x0, > 0xf3e59e8c, 0xf3e59ed4) > 0xf3e59d20 0xc012a194 pproc_reap+0x324 (0xdf991130, 0x0, 0xf3e59e8c, > 0xf3e59ed4, 0x120e9) > 0xf3e59e30 0xc0220a8d pvpop_reap+0x26d (0xc3a35a00, 0xffffffff, 0x20, > 0x120e9, 0xf3e59e8c) > 0xf3e59eb0 0xc022b8db vpop_wait+0x69b (0xf6431400, 0xffffffff, 0xa1, > 0xbfeec4ec, 0xf3e59ed4) > 0xf3e59f80 0xc0129cec do_wait+0xbc (0xffffffff, 0x80000005, 0x0, > 0xbfeec4ec, 0x0) > 0xf3e59f9c 0xc0129dcc sys_wait4+0x3c (0xffffffff, 0xbfeec4ec, > 0x80000001, 0x0) > 0xf3e59fb4 0xc0129df5 sys_waitpid+0x25 > 0xc0106ae1 syscall_call+0x7 > ----------------------------------------------------------------------- > md c3a35824 > > 0xc3a35824 00817063 00000001 df990630 00000000 cp......0....... > 0xc3a35834 00000000 c3a4a600 00000000 f6431400 ..............C. > 0xc3a35844 c3a35a00 00000000 000120e9 00000000 .Z....... ...... > 0xc3a35854 000120c7 000120f8 000120c7 000120e9 . ... ... ... .. > 0xc3a35864 00000001 00000001 00000001 08800001 ................ > 0xc3a35874 00000001 00000001 dead4ead 00000000 .........N...... > 0xc3a35884 ffffffff 00000001 df990630 00000001 ........0....... > ^^^^^^^^ > 0xc3a35894 dead4ead f3e59b20 f42b5d04 df990630 .N.. ....]+.0... > ----------------------------------------------------------------------- > > At the same time ios_mon (73976) tries to lock vproc=0xc3a35a00, > pvproc=0xc3a35a24, > which is already locked by gdb (73961) > > ----------------------------------------------------------------------- > > Stack traceback for pid 73976 > 0xdf990630 73976 73961 0 1 D 0xdf9907f0 ios_mon > EBP EIP Function (args) > 0xf3d89e58 0xc03dce0c schedule+0x3fc (0xdf990630, 0x1, 0xdf990630, > 0xc0121dd0, 0xc3a35a98) > 0xf3d89e90 0xc03dc7f5 __down+0x75 (0x79, 0xc3a35afc) > 0xf3d89ea0 0xc03dc9a6 __down_failed+0xa > 0xc0227e6c .text.lock.dvp_pvpops+0x3be > 0xf3d89f00 0xc0225fe7 pvpop_add_to_originalchild_list+0x1e7 > (0xc3a35a00, 0xc3a4a600, 0x0, 0xf3d89f40, 0xf55b4800) > 0xf3d89f28 0xc02263b1 pvpop_reassign_original_parent+0xf1 (0xc3a35800, > 0x120fc) > 0xf3d89f38 0xc022d2c8 vpop_reassign_original_parent+0x18 (0xc3a35800, > 0x120fc, 0xf5e96380, 0xf3d89f4c, 0xf3d89f4c) > 0xf3d89f5c 0xc0129272 exit_notify+0xc2 (0xdf990630, 0xdf990630, > 0xdfeaf330, 0xf3d89f88, 0x0) > 0xf3d89f88 0xc0129638 do_exit+0x1e8 (0xdf990630, 0xf5e28584, > 0xf5e28080, 0x7f, 0x0) > 0xf3d89fa8 0xc0129879 do_group_exit+0x39 (0x7f00) > 0xf3d89fb4 0xc0129915 sys_exit_group+0x15 > 0xc0106ae1 syscall_call+0x7 > [1]kdb> > ----------------------------------------------------------------------- > > md c3a35a24 > > 0xc3a35a24 00083069 00000005 df991130 00000000 i0......0....... > 0xc3a35a34 00000000 00000000 c3a35800 00000000 .........X...... > 0xc3a35a44 00000000 00000000 000120e9 000120f8 ......... ... .. > 0xc3a35a54 000120c7 000120f8 000120c7 000120f8 . ... ... ... .. > 0xc3a35a64 00000001 00000001 00000001 08800001 ................ > 0xc3a35a74 00000000 00000001 dead4ead 00000000 .........N...... > 0xc3a35a84 ffffffff 00000001 dfd746b0 00000001 .........F...... > ^^^^^^^^ > 0xc3a35a94 dead4ead f3d89e70 f3d89e70 dfd746b0 .N..p...p....F.. > ---------------------------------------------------------------------- > > and 0xc3a4a600 here is vproc for pid=73981 > > I can provide more information about the issue if it's necesary. > Do you think that work should be serialized throught nocldwait queue? > > Thanks, > Vladimir > > > On 5/26/06, Laura Ramirez <lau...@hp...> wrote: >> >> Hi Vladimir, >> >> Looking at the code more closely, I believe the "ptrace_reap.patch" >> I gave you is NOT correct. Please remove that patch. >> >> Attached is a different patch "ptrace_reap2.patch". In the original >> code, the parent who is the gdb process is not getting a SIGCHLD, >> instead the process just gets queued to the nocldwait. I believe, the >> correct path should be the gdb parent gets the SIGCHLD, goes to do the >> reap, sees its ptraced and does the ptrace_unlink(). Assuming the >> ptrace_unlink() works, it should return -EAGAIN, do the >> PVPOP_REPORT_STATE() to its original parent, sees its original parent is >> not reaping children and gets queued to the nocldwait queue, which >> should do reap successfully this time and not return -EAGAIN. With the >> attached patch, I'm hoping that's going to happen. >> >> Apply with -p0, let me know if it works or where it deviated from the >> code path i stated above. >> >> thanks, >> >> laura >> >> >> >> Vladimir Razgulin wrote: >> > Laura, >> > following your advice I got a snapshot of the moment the process >> > becomes a zombie and a work item is added into nocldwait_async_queue: >> > >> > Please, note, that >> > the process id is 74226, it's a thread process created by 74223 >> under gdb, >> > and its ppid and oppid are different - see attached info >> > >> > Thank you for your help >> > Vladimir >> > >> > Do you think I can CC: our conversation to ssic-linux-devel? >> > >> > >> > >> ------------------------------------------------------------------------------------------------------------------ >> >> > >> > 0xdfd019b0 73995 72325 0 1 S 0xdfd01b70 sshd >> > 0xf674b130 74000 73995 0 2 S 0xf674b2f0 bash >> > 0xf5f078b0 74210 73171 0 2 S 0xf5f07a70 gdb >> > 0xc3bbb930 74223 74210 0 3 S 0xc3bbbaf0 ios_mon >> > 0xdfeff230 74226 74210 1 3 R 0xdfeff3f0 *ios_mon >> > 0xdfeff7b0 74227 74210 0 0 S 0xdfeff970 ios_mon >> > 0xf5dd9830 74229 74210 0 1 S 0xf5dd99f0 ios_mon >> > >> ------------------------------------------------------------------------------------------------------------------ >> >> > >> > Stack traceback for pid 74226 >> > 0xdfeff230 74226 74210 1 3 R 0xdfeff3f0 *ios_mon >> > EBP EIP Function (args) >> > 0xec599ef4 0xc0220530 dpvproc_nocldwait_async_queue (0xf6bf1a00, >> > 0xf6be1800, 0xc040e203, 0xf6ec5efc, 0xec599e84) >> > 0xc0224dd3 pvpop_report_state+0x463 (0xf6bf1a00, >> > 0xf6be1800, 0xec599f1c, 0xec599f20, 0x1) >> > 0xec599f5c 0xc022b3a4 vpop_cleanup_vproc_relations+0x2a4 (0xf6be1800, >> > 0xdfeff230, 0xf5e42980, 0xec599f70, 0xec599f70) >> > 0xec599f80 0xc01292f7 exit_notify+0x137 (0xdfeff230, 0xc04048da, 0x4, >> > 0xec599fbc, 0x0) >> > 0xec599fac 0xc0129648 do_exit+0x1e8 >> > 0xec599fb4 0xc012983f sys_exit+0xf >> > 0xc0106ae1 syscall_call+0x7 >> > ---------------------------------------------------------------- >> > call print_task_struct 0xdfeff230 >> > >> > state=0x0 >> > flags=0x802044 >> > ptrace=0x1f1 >> > lock_depth=-1 >> > prio=116 >> > static_prio=120 >> > array=c383055c >> > sleep_avg=933888889 >> > timestamp=4295805885000000 >> > activated=0x0 >> > policy=0 >> > &cpus_allowed=0xdfeff280 >> > time_slice=17 >> > first_time_slice=1 >> > tasks.next 0xc044fc9c, tasks.prev 0xf5f0790c >> > mm=00000000 >> > active_mm=f5e42980 >> > binfmt=c04dad74 >> > exit_code=0 >> > exit_signal=-1 >> > pdeath_signal=0 >> > personality=0x400000 >> > did_exec=0 >> > pid=74226 >> > epid=74226 >> > ppid=74210 >> > tgid=74223 >> > cltnode=0 >> > p_vproc=0xf6be1800 >> > p_vfparent=0x00000000 >> > group_leader=0xc3bbb930 >> > &pids=0xdfeff2d8 >> > set_child_tid 0x00000000 >> > clear_child_tid 0x00000000 >> > rt_priority=0x0 >> > utime=0 >> > stime=0 >> > nvcsw=11 >> > nivcsw=0 >> > sig_utime=0 >> > sig_stime=0 >> > cutime=0 >> > cstime=0 >> > sig_nvcsw=0 >> > sig_nivcsw=0 >> > cnvcsw=0 >> > cnivcsw=0 >> > start_time.tv_sec=1138 >> > start_time.tv_nsec=381796336 >> > min_flt=4 >> > maj_flt=0 >> > sig_min_flt=0 >> > sig_maj_flt=0 >> > cmin_flt=0 >> > cmaj_flt=0 >> > uid=0 >> > euid=0 >> > suid=0 >> > fsuid=0 >> > gid=0 >> > egid=0 >> > sgid=0 >> > fsgid=0 >> > group_info=0xf502c080 >> > cap_effective=0xfffffeff >> > cap_inheritable=0x0 >> > cap_permitted=0xfffffeff >> > keep_capabilities=0 >> > user=0xc0454c00 >> > &rlim=0xdffdd5d4 >> > comm=ios_mon >> > locks=0 >> > link_count=0 >> > total_link_count=0 >> > semvsem.undo_list=c3b4aa20 >> > fs=0x00000000 >> > files=0x00000000 >> > namespace=0x00000000 >> > signal=0xdffdd500 >> > sighand=0xf5e29900 >> > &blocked=0xdfeff694 >> > &real_blocked=0xdfeff69c >> > &pending=0xdfeff6a4 >> > sas_ss_sp=0x0 >> > sas_ss_size=0x00000000 >> > notifier_data=0x00000000 >> > notifier_mask=0x00000000 >> > security=0x00000000 >> > audit_context=0x00000000 >> > parent_exec_id=0x12 >> > self_exec_id=0x12 >> > journal_info=0x00000000 >> > proc_dentry=0xecf027a0 >> > backing_dev_info=0x00000000 >> > io_context=0x00000000 >> > ptrace_message=0x0 >> > last_siginfo=0x00000000 >> > p_nodetime=0 >> > p_ticks_delta=0 >> > icsprio=0x0 >> > execnode=0x00000000 >> > node_context=1 >> > rcopy_task_size=0 >> > &mosix=0xdfeff764 >> > ---------------------------------------------------------- >> > call print_vproc f6be1800 >> > >> > vp_magic=0x63727076 (should be 0x63727076) >> > vp_pid=74226 >> > vp_ref_cnt=8 >> > vp_data=0xf6be1824 >> > vp_hashfwd=0xc3a60200 >> > vp_hashbwd=0x00000000 >> > Function print_vproc returned 0x19 >> > ---------------------------------------------------------------- >> > call print_pvproc f6be1824 >> > >> > pvp_flag=0x83069 >> > pvp_wstate=0x5 >> > pvp_pproc=0xdfeff230 >> > pvp_head_childl=0x00000000 >> > pvp_childl=0xf6ec5e00 >> > pvp_head_pgrpl=0x00000000 >> > pvp_pgrpl=0xf6ec5e00 >> > pvp_sessionl=0x00000000 >> > pvp_head_oclist=0x00000000 >> > pvp_oclist=0x00000000 >> > pvp_ppid=74210 >> > pvp_oppid=74223 >> > pvp_sid=73171 >> > pvp_pgid=74223 >> > pvp_pp_sid=73171 >> > pvp_pp_pgid=74223 >> > pvp_fromnode=1 >> > pvp_tonode=1 >> > pvp_cttynode=1 >> > pvp_cttydev=0x8800000 >> > pvp_jobc=0 >> > pvp_pgrp_ldr_seqno=0 >> > pvp_pgrp_mem_seqno=0 >> > pvp_fork_sigmigarg=0 >> > pvp.ml.ml_flag=0 >> > pvp.ml.ml_shr_count=0 >> > pvp.ml.ml_excl_count=0 >> > pvp_loadlevel=0 >> > pvp_pin=0 >> > pvp_localview=0 >> > Function print_pvproc returned 0x13 >> > ---------------------------------------------------------------- >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > On 5/25/06, Laura Ramirez <lau...@hp...> wrote: >> >> >> >> Hi Vladimir, >> >> >> >> maybe we need to take a step back, I was focused on why the >> >> ptrace_unlink() call was failing, which was because the child >> >> had already gotten removed from the parent list. Which is why >> >> I created the following patch, which only removed the child from >> >> the parent list if the child wasnt being ptraced (ie pvp_oppid == 0). >> >> So the ! is correct for that logic. >> >> >> >> But maybe we need to look at why this process is being reaped by >> >> the nocld_wait_daemon, instead of the gdb parent. >> >> >> >> Can you do a " call print_task_struct < address>" of the zombie >> >> processes? >> >> >> >> thanks >> >> >> >> laura >> >> >> >> Vladimir Razgulin wrote: >> >> > Laura, >> >> > The patch you sent doesn't resolve the issue: sometimes everything >> >> > works, sometimes gdb hangs with 2 zombies, sometimes the whole >> system >> >> > dies. >> >> > I took a look at the code in the patch:: >> >> > >> >> > + error = 0; >> >> > + if (!PVP(vc)->pvp_oppid) { >> >> > + error = PVPOP_RMV_CHILD_FROM_PARENT(vp, vc, FALSE); >> >> > + ptrace = 1; >> >> > + } >> >> > >> >> > Should it be without "!", liike >> >> > + if ( PVP(vc)->pvp_oppid) { >> >> > >> >> > Thanks >> >> > Vladimir >> >> > >> >> > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: >> >> >> >> >> >> I believe the __ptrace_unlink is failing because the child >> >> >> gets removed from the parent in the beginning of >> >> >> dpvproc_nocldwait_async_handler(), which is the right thing >> >> >> to do if the process isn't being ptraced. >> >> >> >> >> >> Attach is a patch that addresses this problem. Can you please >> >> >> apply and test. (use -p0 to apply) >> >> >> >> >> >> laura >> >> >> >> >> >> >> >> >> Vladimir Razgulin wrote: >> >> >> > Laura, >> >> >> > >> >> >> > That's right, >> >> >> > __ptrace_unlink(p) fails every time inside >> >> >> > pvpop_rmv_child_from_parent() with -ESRCH >> >> >> > because it can't find process p in the PVP(w)->pvp_childl list. >> >> >> > The process p seems to be unlinked alredy - I saw that tracing >> >> >> > pvp_childl with kdb while stopping inside >> >> >> pvpop_rmv_child_from_parent(). >> >> >> > >> >> >> > Vladimir >> >> >> > >> >> >> > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: >> >> >> >> Hi Vladimir, >> >> >> >> >> >> >> >> The reason why wait_task_zombie() returns EAGAIN is because >> >> >> >> its current parent is the debugger gdb instead of its original >> >> >> >> parent. The original parent needs to reap its child, so a >> >> >> >> __ptrace_unlink() is done to stop tracing the process and set >> >> >> >> the parent back to it original parent, returns EAGAIN so the >> >> >> >> reap can be done from the original parent. So __ptrace_unlink() >> >> >> >> must have failed since otherwise the second time in it wouldn't >> >> >> >> go into that code path. So if you can find out why the >> >> >> >> __ptrace_unlink() is failing, that would be a good start. >> >> >> >> >> >> >> >> laura >> >> >> >> >> >> >> >> >> >> >> >> Vladimir Razgulin wrote: >> >> >> >> > Laura, Roger, >> >> >> >> > >> >> >> >> > I've got the same problem trying to debug a multithreaded >> program >> >> >> >> with gdb: >> >> >> >> > BUG_ON with exit_signal == -1 in wait_task_zombie(). >> >> >> >> > >> >> >> >> > I applied a patch for kernel/kernel/exit.c (rev. 1.14 8 Dec >> 2005) >> >> >> - Is >> >> >> >> > it a correct one? - >> >> >> >> > and now I have dpvproc_nocldwait_async_handler() in an >> infinite >> >> >> loop, >> >> >> >> > calling pvpop_reap() and receiving -EAGAIN as an error code. >> >> >> >> > >> >> >> >> > Should wait_task_zombie() return (sometimes ;) ) p->pid >> istead of >> >> >> >> -EAGAIN? >> >> >> >> > Thanks >> >> >> >> > Vladimir >> >> >> >> > >> >> >> >> > I can provide some info about the processes... >> >> >> >> > >> >> >> >> > >> >> >> >> > On 12/6/05, Laura Ramirez <lau...@hp...> wrote: >> >> >> >> >> >> >> >> >> >> Hi Roger, >> >> >> >> >> >> >> >> >> >> A quick look at the code, there seems to be a comment >> >> >> >> >> about the ptrace vproc path, may need to be reworked for 2.6 >> >> merge. >> >> >> >> >> I dont quite remember what the issue was, it obviously >> hitting >> >> >> >> >> BUG_ON with exit_signal == -1. Can you print the vproc and >> the >> >> >> >> >> pvproc? Below, you printed the pvproc using the vproc ptr >> which >> >> >> >> >> made it look corrupt, but it really isnt, it was just the >> wrong >> >> >> print >> >> >> >> >> call. >> >> >> >> >> >> >> >> >> >> laura >> >> >> >> >> >> >> >> >> >> Roger Tsang wrote: >> >> >> >> >> > Laura, >> >> >> >> >> > >> >> >> >> >> > I got an oops exiting from gdb while attached to >> check_bacula >> >> >> >> which had >> >> >> >> >> > segfaulted. I eventually fixed the bug in check_bacula, >> but >> >> >> take a >> >> >> >> >> look at >> >> >> >> >> > the oops below. child_reaper was waiting for check_bacula >> >> >> which is >> >> >> >> >> in E >> >> >> >> >> > state. It looks like pvproc got corrupted. >> >> >> >> >> > >> >> >> >> >> > I'll leave this in kdb until tomorrow just in case I >> left out >> >> >> >> >> something. >> >> >> >> >> > >> >> >> >> >> > Roger >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > pe (11)procfs: impossible type (11)procfs: impossible type >> >> >> >> (11)procfs: >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: >> >> >> >> impossible type >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type >> >> >> (11)procfs: >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: >> >> >> >> impossible type >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type >> >> >> (11)procfs: >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: >> >> >> >> impossible type >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type >> >> >> >> >> (11)ptrace_unlink: >> >> >> >> >> > vpop_reclaim failed >> >> >> >> >> > <4>------------[ cut here ]------------ >> >> >> >> >> > <1>kernel BUG at kernel/exit.c:1343! >> >> >> >> >> > <1>invalid operand: 0000 [#1] >> >> >> >> >> > <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd >> >> >> exportfs >> >> >> >> >> > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport >> >> >> iptable_filter >> >> >> >> >> > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd >> >> ehci_hcd >> >> >> >> >> usbcore >> >> >> >> >> > floppy drbd bonding sata_via libata sk98lin r8169 via_rhine >> >> >> dm_mod >> >> >> >> >> > <4>CPU: 0 >> >> >> >> >> > <4>EIP: 0060:[<c011da00>] Not tainted VLI >> >> >> >> >> > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) >> >> >> >> >> > <4>EIP is at wait_task_zombie+0x220/0x230 >> >> >> >> >> > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: >> >> c0430a2c >> >> >> >> >> > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: >> >> f7d11c98 >> >> >> >> >> > <4>ds: 007b es: 007b ss: 0068 >> >> >> >> >> > <4>Process child_reaper (pid: 2, threadinfo=f7d10000 >> >> >> task=c1aeea80) >> >> >> >> >> > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 >> >> 00000000 >> >> >> >> >> 00000000 >> >> >> >> >> > 00000000 >> >> >> >> >> > <4> 00000001 0001183d 00011667 dd600600 00000000 >> >> 00000286 >> >> >> >> >> f7d11d1c >> >> >> >> >> > c011de68 >> >> >> >> >> > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 >> >> f7d10000 >> >> >> >> >> c3378a40 >> >> >> >> >> > f7d11d1c >> >> >> >> >> > more> >> >> >> >> >> > <4>Call Trace: >> >> >> >> >> > <4> [<c0104a8f>] show_stack+0x7f/0xa0 >> >> >> >> >> > <4> [<c0104c25>] show_registers+0x155/0x220 >> >> >> >> >> > <4> [<c0104fac>] die+0xcc/0x190 >> >> >> >> >> > <4> [<c01050f6>] do_trap+0x86/0xd0 >> >> >> >> >> > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 >> >> >> >> >> > <4> [<c010470b>] error_code+0x2b/0x30 >> >> >> >> >> > <4> [<c011de68>] pproc_reap+0x228/0x2f0 >> >> >> >> >> > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 >> >> >> >> >> > <4> [<c020ea9f>] >> dpvproc_nocldwait_async_handler+0x13f/0x300 >> >> >> >> >> > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 >> >> >> >> >> > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 >> >> >> >> >> > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 >> >> >> >> >> > <4> [<c01f77c9>] do_ssisys+0x99/0x200 >> >> >> >> >> > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 >> >> >> >> >> > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 >> >> >> >> >> > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 >> 77 a0 >> >> >> ff ff >> >> >> >> >> e9 24 >> >> >> >> >> > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 >> 00 00 >> >> >> <0f> >> >> >> >> >> 0b 3f 05 >> >> >> >> >> > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 >> >> >> >> >> > <4> >> >> >> >> >> > kdb> >> >> >> >> >> > kdb> bt >> >> >> >> >> > Stack traceback for pid 2 >> >> >> >> >> > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 >> >> >> *child_reaper >> >> >> >> >> > EBP EIP Function (args) >> >> >> >> >> > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 >> (0xc8709020, 0x0, >> >> >> 0x0, >> >> >> >> >> > 0xf7d11e4c, 0xf7d11e50) >> >> >> >> >> > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, >> >> >> 0xf7d11e4c, >> >> >> >> >> > 0xf7d11e50, 0x11666) >> >> >> >> >> > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, >> >> 0xffffffff, >> >> >> >> 0x20, >> >> >> >> >> > 0x11666, 0xf7d11e4c) >> >> >> >> >> > 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f >> >> >> >> >> (0xed3c9574, >> >> >> >> >> > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) >> >> >> >> >> > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 >> >> >> >> (0xc1aeea80, 0x0, >> >> >> >> >> > 0x40000001, 0x0, 0xc022f870) >> >> >> >> >> > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 >> >> >> >> >> > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 >> >> >> >> >> > kdb> call print_task_struct 0xc8709020 >> >> >> >> >> > state=0x20 >> >> >> >> >> > flags=0x44c >> >> >> >> >> > ptrace=0x0 >> >> >> >> >> > lock_depth=-1 >> >> >> >> >> > prio=116 >> >> >> >> >> > static_prio=120 >> >> >> >> >> > array=00000000 >> >> >> >> >> > sleep_avg=899989756 >> >> >> >> >> > interactive_credit=1 >> >> >> >> >> > timestamp=269658653961877 >> >> >> >> >> > activated=0x0 >> >> >> >> >> > policy=0 >> >> >> >> >> > &cpus_allowed=0xc870906c >> >> >> >> >> > time_slice=49 >> >> >> >> >> > first_time_slice=1 >> >> >> >> >> > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 >> >> >> >> >> > mm=00000000 >> >> >> >> >> > active_mm=00000000 >> >> >> >> >> > binfmt=c04b3e68 >> >> >> >> >> > exit_code=9 >> >> >> >> >> > exit_signal=-1 >> >> >> >> >> > pdeath_signal=0 >> >> >> >> >> > more> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored >> >> >> >> >> > personality=0x0 >> >> >> >> >> > did_exec=0 >> >> >> >> >> > pid=71274 >> >> >> >> >> > epid=71274 >> >> >> >> >> > ppid=71270 >> >> >> >> >> > tgid=71271 >> >> >> >> >> > cltnode=0 >> >> >> >> >> > p_vproc=0xdd600600 >> >> >> >> >> > p_vfparent=0x00000000 >> >> >> >> >> > group_leader=0xf22f2550 >> >> >> >> >> > &pids=0xc87090c4 >> >> >> >> >> > set_child_tid 0x00000000 >> >> >> >> >> > clear_child_tid 0x00000000 >> >> >> >> >> > rt_priority=0x0 >> >> >> >> >> > it_real_value=0x0 >> >> >> >> >> > it_prof_value=0x0 >> >> >> >> >> > it_virt_value=0x0 >> >> >> >> >> > it_real_incr=0x0 >> >> >> >> >> > it_prof_incr=0x0 >> >> >> >> >> > it_virt_incr=0x0 >> >> >> >> >> > utime=0 >> >> >> >> >> > stime=0 >> >> >> >> >> > more> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored >> >> >> >> >> > nvcsw=3 >> >> >> >> >> > nivcsw=0 >> >> >> >> >> > sig_utime=0 >> >> >> >> >> > sig_stime=0 >> >> >> >> >> > cutime=0 >> >> >> >> >> > cstime=0 >> >> >> >> >> > sig_nvcsw=0 >> >> >> >> >> > sig_nivcsw=0 >> >> >> >> >> > cnvcsw=0 >> >> >> >> >> > cnivcsw=0 >> >> >> >> >> > start_time.tv_sec=270328 >> >> >> >> >> > start_time.tv_nsec=615704896 >> >> >> >> >> > min_flt=0 >> >> >> >> >> > maj_flt=0 >> >> >> >> >> > sig_min_flt=0 >> >> >> >> >> > sig_maj_flt=0 >> >> >> >> >> > cmin_flt=0 >> >> >> >> >> > cmaj_flt=0 >> >> >> >> >> > uid=0 >> >> >> >> >> > euid=0 >> >> >> >> >> > suid=0 >> >> >> >> >> > fsuid=0 >> >> >> >> >> > more> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored >> >> >> >> >> > gid=0 >> >> >> >> >> > egid=0 >> >> >> >> >> > sgid=0 >> >> >> >> >> > fsgid=0 >> >> >> >> >> > group_info=0xeaf0b980 >> >> >> >> >> > cap_effective=0xfffffeff >> >> >> >> >> > cap_inheritable=0x0 >> >> >> >> >> > cap_permitted=0xfffffeff >> >> >> >> >> > keep_capabilities=0 >> >> >> >> >> > user=0xc0431ae0 >> >> >> >> >> > &rlim=0xc1af02c4 >> >> >> >> >> > used_math=1 >> >> >> >> >> > comm=check_bacula >> >> >> >> >> > locks=0 >> >> >> >> >> > link_count=0 >> >> >> >> >> > total_link_count=1 >> >> >> >> >> > semvsem.undo_list=f4648200 >> >> >> >> >> > fs=0x00000000 >> >> >> >> >> > files=0x00000000 >> >> >> >> >> > namespace=0x00000000 >> >> >> >> >> > signal=0xc1af0240 >> >> >> >> >> > sighand=0xf6c86580 >> >> >> >> >> > more> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored >> >> >> >> >> > &blocked=0xc8709484 >> >> >> >> >> > &real_blocked=0xc870948c >> >> >> >> >> > &pending=0xc8709494 >> >> >> >> >> > sas_ss_sp=0x0 >> >> >> >> >> > sas_ss_size=0x00000000 >> >> >> >> >> > notifier_data=0x00000000 >> >> >> >> >> > notifier_mask=0x00000000 >> >> >> >> >> > security=0x00000000 >> >> >> >> >> > audit_context=0x00000000 >> >> >> >> >> > parent_exec_id=0x16 >> >> >> >> >> > self_exec_id=0x16 >> >> >> >> >> > journal_info=0x00000000 >> >> >> >> >> > proc_dentry=0xcb5fe8d4 >> >> >> >> >> > backing_dev_info=0x00000000 >> >> >> >> >> > io_context=0x00000000 >> >> >> >> >> > ptrace_message=0x0 >> >> >> >> >> > last_siginfo=0x00000000 >> >> >> >> >> > p_nodetime=0 >> >> >> >> >> > p_ticks_delta=0 >> >> >> >> >> > icsprio=0x0 >> >> >> >> >> > execnode=0x00000000 >> >> >> >> >> > node_context=1 >> >> >> >> >> > more> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored >> >> >> >> >> > rcopy_task_size=0 >> >> >> >> >> > &mosix=0xc8709504 >> >> >> >> >> > Function print_task_struct returned 0x0 >> >> >> >> >> > kdb> btp 71274 >> >> >> >> >> > Stack traceback for pid 71274 >> >> >> >> >> > 0xc8709020 71274 71270 0 0 E 0xc87091e0 >> >> >> check_bacula >> >> >> >> >> > EBP EIP Function (args) >> >> >> >> >> > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, >> 0xf22f4dc0, >> >> >> >> >> 0xc012523f, >> >> >> >> >> > 0xd8017e9c, 0x0) >> >> >> >> >> > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, >> >> 0xd8016000) >> >> >> >> >> > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, >> >> >> 0xd8016000, >> >> >> >> >> > 0xd8016000) >> >> >> >> >> > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db >> (0xd8017f18, >> >> >> >> >> 0xd8017ef8, >> >> >> >> >> > 0xd8017fc4, 0x0, 0xc0218c59) >> >> >> >> >> > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, >> >> >> 0xcc7be550, >> >> >> >> >> > 0xc8709020) >> >> >> >> >> > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 >> >> >> >> >> > 0xc0103cf6 work_notifysig+0x13 >> >> >> >> >> > kdb> btp 71270 >> >> >> >> >> > Stack traceback for pid 71270 >> >> >> >> >> > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb >> >> >> >> >> > EBP EIP Function (args) >> >> >> >> >> > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, >> >> >> 0xe2d7fecc, >> >> >> >> >> > 0xf3eade68) >> >> >> >> >> > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, >> >> 0xa0, >> >> >> >> 0x0, >> >> >> >> >> > 0xf3eadedc) >> >> >> >> >> > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, >> 0x0, >> >> >> >> 0x0, 0x0) >> >> >> >> >> > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, >> >> >> 0x80000000, 0x0) >> >> >> >> >> > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 >> >> >> >> >> > 0xc0103c55 sysenter_past_esp+0x52 >> >> >> >> >> > kdb> btp 117331 >> >> >> >> >> > Stack traceback for pid 117331 >> >> >> >> >> > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash >> >> >> >> >> > EBP EIP Function (args) >> >> >> >> >> > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, >> 0xffffffff, >> >> >> 0x4, >> >> >> >> >> 0x1ca53, >> >> >> >> >> > 0xd5399e94) >> >> >> >> >> > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, >> 0xffffffff, >> >> >> 0x24, >> >> >> >> >> > 0xbffff038, 0xd5399edc) >> >> >> >> >> > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, >> >> >> >> >> 0xbffff038, 0x0) >> >> >> >> >> > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, >> 0xbffff038, >> >> >> >> 0xa, 0x0) >> >> >> >> >> > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 >> >> >> >> >> > 0xc0103c55 sysenter_past_esp+0x52 >> >> >> >> >> > kdb> >> >> >> >> >> > kdb> call print_pvproc 0xdd600600 >> >> >> >> >> > pvp_flag=0x63727076 >> >> >> >> >> > pvp_wstate=0x1166a >> >> >> >> >> > pvp_pproc=0x00000007 >> >> >> >> >> > pvp_head_childl=0xdd60061c >> >> >> >> >> > pvp_childl=0x00000000 >> >> >> >> >> > pvp_head_pgrpl=0x00000000 >> >> >> >> >> > pvp_pgrpl=0x00000000 >> >> >> >> >> > pvp_sessionl=0x00083049 >> >> >> >> >> > pvp_head_oclist=0x00000005 >> >> >> >> >> > pvp_oclist=0xc8709020 >> >> >> >> >> > pvp_ppid=0 >> >> >> >> >> > pvp_oppid=0 >> >> >> >> >> > pvp_sid=0 >> >> >> >> >> > pvp_pgid=-489161216 >> >> >> >> >> > pvp_pp_sid=0 >> >> >> >> >> > pvp_pp_pgid=0 >> >> >> >> >> > pvp_fromnode=0 >> >> >> >> >> > pvp_tonode=71270 >> >> >> >> >> > pvp_cttynode=71271 >> >> >> >> >> > pvp_cttydev=0x1ca53 >> >> >> >> >> > pvp_jobc=71271 >> >> >> >> >> > pvp_pgrp_ldr_seqno=1 >> >> >> >> >> > more> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored >> >> >> >> >> > pvp_pgrp_mem_seqno=-580909344 >> >> >> >> >> > pvp_fork_sigmigarg=-580909332 >> >> >> >> >> > pvp.ml.ml_flag=1 >> >> >> >> >> > pvp.ml.ml_shr_count=-580909376 >> >> >> >> >> > pvp.ml.ml_excl_count=0 >> >> >> >> >> > pvp_loadlevel=-580909332 >> >> >> >> >> > pvp_pin=0 >> >> >> >> >> > pvp_localview=0 >> >> >> >> >> > Function print_pvproc returned 0x0 >> >> >> >> >> > kdb> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------- >> >> >> >> >> This SF.net email is sponsored by: Splunk Inc. Do you grep >> >> >> through log >> >> >> >> >> files >> >> >> >> >> for problems? Stop! Download the new AJAX search engine >> that >> >> >> makes >> >> >> >> >> searching your log files as easy as surfing the web. >> DOWNLOAD >> >> >> >> SPLUNK! >> >> >> >> >> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >> >> >> >> >> _______________________________________________ >> >> >> >> >> ssic-linux-devel mailing list >> >> >> >> >> ssi...@li... >> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel >> >> >> >> >> >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> > >> >> > >> >> >> > >> > >> >> >> > > |
From: Vladimir R. <one...@gm...> - 2006-05-31 15:28:38
|
Laura, with thread_deadlock.patch I still get deadlock between the threads and a frozen gdb - see enclosed ionfo. Thanks Vladimir 0xf6526830 73134 72318 0 0 S 0xf65269f0 sshd 0xf6527330 73149 73134 0 0 S 0xf65274f0 bash 0xdfc060b0 73213 73149 0 2 S 0xdfc06270 gdb 0xf645a3b0 73246 73213 0 1 D 0xf645a570 ios_mon 0xdf990630 73250 73213 0 1 D 0xdf9907f0 ios_mon ------------------------------------------------------------------------- btp 73246 Stack traceback for pid 73246 0xf645a3b0 73246 73213 0 1 D 0xf645a570 ios_mon EBP EIP Function (args) 0xdf995e10 0xc03dce4c schedule+0x3fc (0xf645a3b0, 0x1, 0xf645a3b0, 0xc0121dd0, 0xdffea698) 0xdf995e48 0xc03dc835 __down+0x75 (0xdffea600, 0xdffea600) 0xdf995e58 0xc03dc9e6 __down_failed+0xa 0xc0227b9b .text.lock.dvp_pvpops+0xad 0xdf995efc 0xc02214d8 pvpop_reassign_child+0x178 (0xdffea600, 0x10, 0x0, 0xdf994000, 0xdf994000) 0xdf995f28 0xc02263f2 pvpop_reassign_original_parent+0x132 (0xdffeaa00, 0x11e22) 0xdf995f38 0xc022d308 vpop_reassign_original_parent+0x18 (0xdffeaa00, 0x11e22, 0xf65cf980, 0xdf995f4c, 0xdf995f4c) 0xdf995f5c 0xc0129272 exit_notify+0xc2 (0xf645a3b0, 0xf645a3b0, 0xdf990bb0, 0xdf995f88, 0x0) 0xdf995f88 0xc0129638 do_exit+0x1e8 (0xf645a3b0, 0xc3a80684, 0xc3a80180, 0x7f, 0x0) 0xdf995fa8 0xc0129879 do_group_exit+0x39 (0x7f00) 0xdf995fb4 0xc0129915 sys_exit_group+0x15 0xc0106ae1 syscall_call+0x7 ------------------------------------------------------------------------- btp 73250 Stack traceback for pid 73250 0xdf990630 73250 73213 0 1 D 0xdf9907f0 ios_mon EBP EIP Function (args) 0xf4fc5d54 0xc03dce4c schedule+0x3fc (0xdf990630, 0x1, 0xdf990630, 0xc0121dd0, 0xdffeaa98) 0xf4fc5d8c 0xc03dc835 __down+0x75 (0x79, 0xdffeaafc) 0xf4fc5d9c 0xc03dc9e6 __down_failed+0xa 0xc0227eac .text.lock.dvp_pvpops+0x3be 0xf4fc5dfc 0xc0225fe7 pvpop_add_to_originalchild_list+0x1e7 (0xdffeaa00, 0xdffeae00, 0x0, 0x0, 0xf4fc4000) 0xf4fc5e28 0xc02263e1 pvpop_reassign_original_parent+0x121 (0xdffea600, 0x11e1e) 0xf4fc5e38 0xc022d308 vpop_reassign_original_parent+0x18 (0xdffea600, 0x11e1e, 0xf5067200, 0xf4fc5e4c, 0xf4fc5e4c) 0xf4fc5e5c 0xc0129272 exit_notify+0xc2 (0xdf990630, 0xf65ce840, 0xf4fc5e78, 0x8, 0x1) 0xf4fc5e88 0xc0129638 do_exit+0x1e8 (0x0, 0x0, 0x0, 0x9, 0xf4fc4000) 0xf4fc5ea8 0xc0129879 do_group_exit+0x39 (0x9, 0x0, 0x0, 0xc3a80180, 0xf4fc4000) 0xf4fc5ed4 0xc0134313 get_signal_to_deliver+0x2a3 (0xf4fc5f10, 0xf4fc5ef0, 0xf4fc5fbc, 0x0, 0xf65cf980) 0xf4fc5f9c 0xc0106850 do_signal+0x70 (0x3b970910, 0x805f2ec, 0x805f2ec, 0x407679b8) 0xf4fc5fb4 0xc0106987 do_notify_resume+0x57 0xc0106b72 work_notifysig+0x13 ------------------------------------------------------------------------- md 0xdffeaa00 0xdffeaa00 63727076 00011e1e 00000012 dffeaa24 vprc........$... 0xdffeaa10 00000001 dead4ead 00000000 00000000 .....N.......... 0xdffeaa20 00000000 00817063 00000001 f645a3b0 ....cp........E. 0xdffeaa30 00000000 00000000 dffea600 00000000 ................ 0xdffeaa40 c3a4d200 00000000 00000000 00011dfd ................ 0xdffeaa50 00000000 00011dbd 00011e1e 00011dbd ................ 0xdffeaa60 00011dfd 00000001 00000001 00000001 ................ 0xdffeaa70 08800001 00000001 00000001 dead4ead .............N.. ------------------------------------------------------------------------- md 0xdffea600 0xdffea600 63727076 00011e22 00000009 dffea624 vprc".......$... 0xdffea610 00000001 dead4ead 00000000 00000000 .....N.......... 0xdffea620 00000000 00083061 00000001 df990630 ....a0......0... 0xdffea630 00000000 dffeaa00 00000000 dffeaa00 ................ 0xdffea640 00000000 00000000 00000000 00011dfd ................ 0xdffea650 00000001 00011dbd 00011e1e 00011dbd ................ 0xdffea660 00011e1e 00000001 00000001 00000001 ................ 0xdffea670 08800001 00000000 00000001 dead4ead .............N.. ------------------------------------------------------------------------- call print_vproc 0xdffeaa00 vp_magic=0x63727076 (should be 0x63727076) vp_pid=73246 vp_ref_cnt=18 vp_data=0xdffeaa24 vp_hashfwd=0x00000000 vp_hashbwd=0x00000000 Function print_vproc returned 0x19 ------------------------------------------------------------------------- call print_pvproc 0xdffeaa24 pvp_flag=0x817063 pvp_wstate=0x1 pvp_pproc=0xf645a3b0 pvp_head_childl=0x00000000 pvp_childl=0x00000000 pvp_head_pgrpl=0xdffea600 pvp_pgrpl=0x00000000 pvp_sessionl=0xc3a4d200 pvp_head_oclist=0x00000000 pvp_oclist=0x00000000 pvp_ppid=73213 pvp_oppid=0 pvp_sid=73149 pvp_pgid=73246 pvp_pp_sid=73149 pvp_pp_pgid=73213 pvp_fromnode=1 pvp_tonode=1 pvp_cttynode=1 pvp_cttydev=0x8800001 pvp_jobc=1 pvp_pgrp_ldr_seqno=0 pvp_pgrp_mem_seqno=0 pvp_fork_sigmigarg=0 pvp.ml.ml_flag=2 pvp.ml.ml_shr_count=1 pvp.ml.ml_excl_count=0 pvp_loadlevel=0 pvp_pin=0 pvp_localview=0 Function print_pvproc returned 0x13 ------------------------------------------------------------------------- call print_vproc 0xdffea600 vp_magic=0x63727076 (should be 0x63727076) vp_pid=73250 vp_ref_cnt=9 vp_data=0xdffea624 vp_hashfwd=0x00000000 vp_hashbwd=0x00000000 Function print_vproc returned 0x19 ------------------------------------------------------------------------- call print_pvproc 0xdffea624 pvp_flag=0x83061 pvp_wstate=0x1 pvp_pproc=0xdf990630 pvp_head_childl=0x00000000 pvp_childl=0xdffeaa00 pvp_head_pgrpl=0x00000000 pvp_pgrpl=0xdffeaa00 pvp_sessionl=0x00000000 pvp_head_oclist=0x00000000 pvp_oclist=0x00000000 pvp_ppid=73213 pvp_oppid=1 pvp_sid=73149 pvp_pgid=73246 pvp_pp_sid=73149 pvp_pp_pgid=73246 pvp_fromnode=1 pvp_tonode=1 pvp_cttynode=1 pvp_cttydev=0x8800001 pvp_jobc=0 pvp_pgrp_ldr_seqno=0 pvp_pgrp_mem_seqno=0 pvp_fork_sigmigarg=0 pvp.ml.ml_flag=2 pvp.ml.ml_shr_count=1 pvp.ml.ml_excl_count=0 pvp_loadlevel=0 pvp_pin=0 pvp_localview=0 Function print_pvproc returned 0x13 ------------------------------------------------------------------------- On 5/30/06, Laura Ramirez <lau...@hp...> wrote: > > HI Vladmimir, > > Good to hear that the second patch fixed the original problem. > > From looking at the stack traces, it does appear to be a > deadlock. Pid 73961 is reaping pid 73980. It wants to > reassign it to its original parent 73976. It currently has > the vproc lock for 73980 and wants the vproc lock for 73976. > However, 73976 is currently exiting, and its trying to reassign > its children to next thread (ie. 73980). So it currently has > its own 73976 lock and wants the vproc lock for 73980, and so > it's now deadlocked. Besides the deadlock bug, maybe the real > problem is 73976 reassigning the children to 73980, which has > already run through its exit code and will never clean up the > list again. > > Attached is a patch that checks the state of the thread, if > its zombie, then it reassigns it to the init process. > > laura > > > Vladimir Razgulin wrote: > > Laura, > > > > Thank you for ptrace2_reap.patch > > I've tried it, and it fixed the original problem. > > However, now I have something looking like a racing condition. > > > > Again, I'm debugging a multithreaded program: > > > > gdb (pid=73961) has started ios_mon (pid=73976), which started three other > > threads (pids=73979, 73980, 73981). > > The thread 73979 was terminated successfully (thanks to ptrace2_reap.patch) > > and I've got the following: > > > > ----------------------------------------------------------------- > > 0xf5778bb0 73927 73921 0 0 S 0xf5778d70 bash > > 0xdfd746b0 73961 73927 0 3 D 0xdfd74870 gdb > > 0xdf990630 73976 73961 0 1 D 0xdf9907f0 ios_mon > > 0xdf991130 73980 73961 0 2 E 0xdf9912f0 ios_mon > > 0xdfeaf330 73981 73961 0 0 Z 0xdfeaf4f0 ios_mon > > 0xf576c030 74134 72761 0 1 D 0xf576c1f0 bash > > ----------------------------------------------------------------- > > > > gdb (73961) wants to do VPROC_LOCK_EXCL for vproc=0xc3a35800, > > pvproc=0xc3a35824, > > which is owned by ios_mon (73976) > > > > ---------------------------------------------------------------- > > Stack traceback for pid 73961 > > 0xdfd746b0 73961 73927 0 3 D 0xdfd74870 gdb > > EBP EIP Function (args) > > 0xf3e59b08 0xc03dce0c schedule+0x3fc (0xdfd746b0, 0x1, 0xdfd746b0, > > 0xc0121dd0, 0xf3cafd04) > > 0xf3e59b40 0xc03dc7f5 __down+0x75 (0x0, 0x79) > > 0xf3e59b50 0xc03dc9a6 __down_failed+0xa > > 0xc0227b04 .text.lock.dvp_pvpops+0x56 > > 0xf3e59bbc 0xc0221007 pvpop_add_child_to_parent+0x247 (0xc3a35800, > > 0xc3a35a00, 0x5, 0x0, 0x0) > > 0xf3e59c3c 0xc0221cc0 pvpop_reclaim_child+0x360 (0xc3a35a00, 0x120f8, > > 0x0, 0x0, 0x0) > > 0xf3e59c64 0xc022c3a2 vpop_reclaim_child+0x52 (0xc3a35a00, 0x120f8, > > 0x0, 0x0, 0x0) > > 0xf3e59c94 0xc012f835 __ptrace_unlink+0x75 (0xdf991130, 0x120f8, > > 0xf3e59cc4, 0x0, 0x0) > > 0xf3e59cd4 0xc0129b68 wait_task_zombie+0x1c8 (0xdf991130, 0x0, 0x0, > > 0xf3e59e8c, 0xf3e59ed4) > > 0xf3e59d20 0xc012a194 pproc_reap+0x324 (0xdf991130, 0x0, 0xf3e59e8c, > > 0xf3e59ed4, 0x120e9) > > 0xf3e59e30 0xc0220a8d pvpop_reap+0x26d (0xc3a35a00, 0xffffffff, 0x20, > > 0x120e9, 0xf3e59e8c) > > 0xf3e59eb0 0xc022b8db vpop_wait+0x69b (0xf6431400, 0xffffffff, 0xa1, > > 0xbfeec4ec, 0xf3e59ed4) > > 0xf3e59f80 0xc0129cec do_wait+0xbc (0xffffffff, 0x80000005, 0x0, > > 0xbfeec4ec, 0x0) > > 0xf3e59f9c 0xc0129dcc sys_wait4+0x3c (0xffffffff, 0xbfeec4ec, > > 0x80000001, 0x0) > > 0xf3e59fb4 0xc0129df5 sys_waitpid+0x25 > > 0xc0106ae1 syscall_call+0x7 > > ----------------------------------------------------------------------- > > md c3a35824 > > > > 0xc3a35824 00817063 00000001 df990630 00000000 cp......0....... > > 0xc3a35834 00000000 c3a4a600 00000000 f6431400 ..............C. > > 0xc3a35844 c3a35a00 00000000 000120e9 00000000 .Z....... ...... > > 0xc3a35854 000120c7 000120f8 000120c7 000120e9 . ... ... ... .. > > 0xc3a35864 00000001 00000001 00000001 08800001 ................ > > 0xc3a35874 00000001 00000001 dead4ead 00000000 .........N...... > > 0xc3a35884 ffffffff 00000001 df990630 00000001 ........0....... > > ^^^^^^^^ > > 0xc3a35894 dead4ead f3e59b20 f42b5d04 df990630 .N.. ....]+.0... > > ----------------------------------------------------------------------- > > > > At the same time ios_mon (73976) tries to lock vproc=0xc3a35a00, > > pvproc=0xc3a35a24, > > which is already locked by gdb (73961) > > > > ----------------------------------------------------------------------- > > > > Stack traceback for pid 73976 > > 0xdf990630 73976 73961 0 1 D 0xdf9907f0 ios_mon > > EBP EIP Function (args) > > 0xf3d89e58 0xc03dce0c schedule+0x3fc (0xdf990630, 0x1, 0xdf990630, > > 0xc0121dd0, 0xc3a35a98) > > 0xf3d89e90 0xc03dc7f5 __down+0x75 (0x79, 0xc3a35afc) > > 0xf3d89ea0 0xc03dc9a6 __down_failed+0xa > > 0xc0227e6c .text.lock.dvp_pvpops+0x3be > > 0xf3d89f00 0xc0225fe7 pvpop_add_to_originalchild_list+0x1e7 > > (0xc3a35a00, 0xc3a4a600, 0x0, 0xf3d89f40, 0xf55b4800) > > 0xf3d89f28 0xc02263b1 pvpop_reassign_original_parent+0xf1 (0xc3a35800, > > 0x120fc) > > 0xf3d89f38 0xc022d2c8 vpop_reassign_original_parent+0x18 (0xc3a35800, > > 0x120fc, 0xf5e96380, 0xf3d89f4c, 0xf3d89f4c) > > 0xf3d89f5c 0xc0129272 exit_notify+0xc2 (0xdf990630, 0xdf990630, > > 0xdfeaf330, 0xf3d89f88, 0x0) > > 0xf3d89f88 0xc0129638 do_exit+0x1e8 (0xdf990630, 0xf5e28584, > > 0xf5e28080, 0x7f, 0x0) > > 0xf3d89fa8 0xc0129879 do_group_exit+0x39 (0x7f00) > > 0xf3d89fb4 0xc0129915 sys_exit_group+0x15 > > 0xc0106ae1 syscall_call+0x7 > > [1]kdb> > > ----------------------------------------------------------------------- > > > > md c3a35a24 > > > > 0xc3a35a24 00083069 00000005 df991130 00000000 i0......0....... > > 0xc3a35a34 00000000 00000000 c3a35800 00000000 .........X...... > > 0xc3a35a44 00000000 00000000 000120e9 000120f8 ......... ... .. > > 0xc3a35a54 000120c7 000120f8 000120c7 000120f8 . ... ... ... .. > > 0xc3a35a64 00000001 00000001 00000001 08800001 ................ > > 0xc3a35a74 00000000 00000001 dead4ead 00000000 .........N...... > > 0xc3a35a84 ffffffff 00000001 dfd746b0 00000001 .........F...... > > ^^^^^^^^ > > 0xc3a35a94 dead4ead f3d89e70 f3d89e70 dfd746b0 .N..p...p....F.. > > ---------------------------------------------------------------------- > > > > and 0xc3a4a600 here is vproc for pid=73981 > > > > I can provide more information about the issue if it's necesary. > > Do you think that work should be serialized throught nocldwait queue? > > > > Thanks, > > Vladimir > > > > > > On 5/26/06, Laura Ramirez <lau...@hp...> wrote: > >> > >> Hi Vladimir, > >> > >> Looking at the code more closely, I believe the "ptrace_reap.patch" > >> I gave you is NOT correct. Please remove that patch. > >> > >> Attached is a different patch "ptrace_reap2.patch". In the original > >> code, the parent who is the gdb process is not getting a SIGCHLD, > >> instead the process just gets queued to the nocldwait. I believe, the > >> correct path should be the gdb parent gets the SIGCHLD, goes to do the > >> reap, sees its ptraced and does the ptrace_unlink(). Assuming the > >> ptrace_unlink() works, it should return -EAGAIN, do the > >> PVPOP_REPORT_STATE() to its original parent, sees its original parent is > >> not reaping children and gets queued to the nocldwait queue, which > >> should do reap successfully this time and not return -EAGAIN. With the > >> attached patch, I'm hoping that's going to happen. > >> > >> Apply with -p0, let me know if it works or where it deviated from the > >> code path i stated above. > >> > >> thanks, > >> > >> laura > >> > >> > >> > >> Vladimir Razgulin wrote: > >> > Laura, > >> > following your advice I got a snapshot of the moment the process > >> > becomes a zombie and a work item is added into nocldwait_async_queue: > >> > > >> > Please, note, that > >> > the process id is 74226, it's a thread process created by 74223 > >> under gdb, > >> > and its ppid and oppid are different - see attached info > >> > > >> > Thank you for your help > >> > Vladimir > >> > > >> > Do you think I can CC: our conversation to ssic-linux-devel? > >> > > >> > > >> > > >> ------------------------------------------------------------------------------------------------------------------ > >> > >> > > >> > 0xdfd019b0 73995 72325 0 1 S 0xdfd01b70 sshd > >> > 0xf674b130 74000 73995 0 2 S 0xf674b2f0 bash > >> > 0xf5f078b0 74210 73171 0 2 S 0xf5f07a70 gdb > >> > 0xc3bbb930 74223 74210 0 3 S 0xc3bbbaf0 ios_mon > >> > 0xdfeff230 74226 74210 1 3 R 0xdfeff3f0 *ios_mon > >> > 0xdfeff7b0 74227 74210 0 0 S 0xdfeff970 ios_mon > >> > 0xf5dd9830 74229 74210 0 1 S 0xf5dd99f0 ios_mon > >> > > >> ------------------------------------------------------------------------------------------------------------------ > >> > >> > > >> > Stack traceback for pid 74226 > >> > 0xdfeff230 74226 74210 1 3 R 0xdfeff3f0 *ios_mon > >> > EBP EIP Function (args) > >> > 0xec599ef4 0xc0220530 dpvproc_nocldwait_async_queue (0xf6bf1a00, > >> > 0xf6be1800, 0xc040e203, 0xf6ec5efc, 0xec599e84) > >> > 0xc0224dd3 pvpop_report_state+0x463 (0xf6bf1a00, > >> > 0xf6be1800, 0xec599f1c, 0xec599f20, 0x1) > >> > 0xec599f5c 0xc022b3a4 vpop_cleanup_vproc_relations+0x2a4 (0xf6be1800, > >> > 0xdfeff230, 0xf5e42980, 0xec599f70, 0xec599f70) > >> > 0xec599f80 0xc01292f7 exit_notify+0x137 (0xdfeff230, 0xc04048da, 0x4, > >> > 0xec599fbc, 0x0) > >> > 0xec599fac 0xc0129648 do_exit+0x1e8 > >> > 0xec599fb4 0xc012983f sys_exit+0xf > >> > 0xc0106ae1 syscall_call+0x7 > >> > ---------------------------------------------------------------- > >> > call print_task_struct 0xdfeff230 > >> > > >> > state=0x0 > >> > flags=0x802044 > >> > ptrace=0x1f1 > >> > lock_depth=-1 > >> > prio=116 > >> > static_prio=120 > >> > array=c383055c > >> > sleep_avg=933888889 > >> > timestamp=4295805885000000 > >> > activated=0x0 > >> > policy=0 > >> > &cpus_allowed=0xdfeff280 > >> > time_slice=17 > >> > first_time_slice=1 > >> > tasks.next 0xc044fc9c, tasks.prev 0xf5f0790c > >> > mm=00000000 > >> > active_mm=f5e42980 > >> > binfmt=c04dad74 > >> > exit_code=0 > >> > exit_signal=-1 > >> > pdeath_signal=0 > >> > personality=0x400000 > >> > did_exec=0 > >> > pid=74226 > >> > epid=74226 > >> > ppid=74210 > >> > tgid=74223 > >> > cltnode=0 > >> > p_vproc=0xf6be1800 > >> > p_vfparent=0x00000000 > >> > group_leader=0xc3bbb930 > >> > &pids=0xdfeff2d8 > >> > set_child_tid 0x00000000 > >> > clear_child_tid 0x00000000 > >> > rt_priority=0x0 > >> > utime=0 > >> > stime=0 > >> > nvcsw=11 > >> > nivcsw=0 > >> > sig_utime=0 > >> > sig_stime=0 > >> > cutime=0 > >> > cstime=0 > >> > sig_nvcsw=0 > >> > sig_nivcsw=0 > >> > cnvcsw=0 > >> > cnivcsw=0 > >> > start_time.tv_sec=1138 > >> > start_time.tv_nsec=381796336 > >> > min_flt=4 > >> > maj_flt=0 > >> > sig_min_flt=0 > >> > sig_maj_flt=0 > >> > cmin_flt=0 > >> > cmaj_flt=0 > >> > uid=0 > >> > euid=0 > >> > suid=0 > >> > fsuid=0 > >> > gid=0 > >> > egid=0 > >> > sgid=0 > >> > fsgid=0 > >> > group_info=0xf502c080 > >> > cap_effective=0xfffffeff > >> > cap_inheritable=0x0 > >> > cap_permitted=0xfffffeff > >> > keep_capabilities=0 > >> > user=0xc0454c00 > >> > &rlim=0xdffdd5d4 > >> > comm=ios_mon > >> > locks=0 > >> > link_count=0 > >> > total_link_count=0 > >> > semvsem.undo_list=c3b4aa20 > >> > fs=0x00000000 > >> > files=0x00000000 > >> > namespace=0x00000000 > >> > signal=0xdffdd500 > >> > sighand=0xf5e29900 > >> > &blocked=0xdfeff694 > >> > &real_blocked=0xdfeff69c > >> > &pending=0xdfeff6a4 > >> > sas_ss_sp=0x0 > >> > sas_ss_size=0x00000000 > >> > notifier_data=0x00000000 > >> > notifier_mask=0x00000000 > >> > security=0x00000000 > >> > audit_context=0x00000000 > >> > parent_exec_id=0x12 > >> > self_exec_id=0x12 > >> > journal_info=0x00000000 > >> > proc_dentry=0xecf027a0 > >> > backing_dev_info=0x00000000 > >> > io_context=0x00000000 > >> > ptrace_message=0x0 > >> > last_siginfo=0x00000000 > >> > p_nodetime=0 > >> > p_ticks_delta=0 > >> > icsprio=0x0 > >> > execnode=0x00000000 > >> > node_context=1 > >> > rcopy_task_size=0 > >> > &mosix=0xdfeff764 > >> > ---------------------------------------------------------- > >> > call print_vproc f6be1800 > >> > > >> > vp_magic=0x63727076 (should be 0x63727076) > >> > vp_pid=74226 > >> > vp_ref_cnt=8 > >> > vp_data=0xf6be1824 > >> > vp_hashfwd=0xc3a60200 > >> > vp_hashbwd=0x00000000 > >> > Function print_vproc returned 0x19 > >> > ---------------------------------------------------------------- > >> > call print_pvproc f6be1824 > >> > > >> > pvp_flag=0x83069 > >> > pvp_wstate=0x5 > >> > pvp_pproc=0xdfeff230 > >> > pvp_head_childl=0x00000000 > >> > pvp_childl=0xf6ec5e00 > >> > pvp_head_pgrpl=0x00000000 > >> > pvp_pgrpl=0xf6ec5e00 > >> > pvp_sessionl=0x00000000 > >> > pvp_head_oclist=0x00000000 > >> > pvp_oclist=0x00000000 > >> > pvp_ppid=74210 > >> > pvp_oppid=74223 > >> > pvp_sid=73171 > >> > pvp_pgid=74223 > >> > pvp_pp_sid=73171 > >> > pvp_pp_pgid=74223 > >> > pvp_fromnode=1 > >> > pvp_tonode=1 > >> > pvp_cttynode=1 > >> > pvp_cttydev=0x8800000 > >> > pvp_jobc=0 > >> > pvp_pgrp_ldr_seqno=0 > >> > pvp_pgrp_mem_seqno=0 > >> > pvp_fork_sigmigarg=0 > >> > pvp.ml.ml_flag=0 > >> > pvp.ml.ml_shr_count=0 > >> > pvp.ml.ml_excl_count=0 > >> > pvp_loadlevel=0 > >> > pvp_pin=0 > >> > pvp_localview=0 > >> > Function print_pvproc returned 0x13 > >> > ---------------------------------------------------------------- > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > On 5/25/06, Laura Ramirez <lau...@hp...> wrote: > >> >> > >> >> Hi Vladimir, > >> >> > >> >> maybe we need to take a step back, I was focused on why the > >> >> ptrace_unlink() call was failing, which was because the child > >> >> had already gotten removed from the parent list. Which is why > >> >> I created the following patch, which only removed the child from > >> >> the parent list if the child wasnt being ptraced (ie pvp_oppid == 0). > >> >> So the ! is correct for that logic. > >> >> > >> >> But maybe we need to look at why this process is being reaped by > >> >> the nocld_wait_daemon, instead of the gdb parent. > >> >> > >> >> Can you do a " call print_task_struct < address>" of the zombie > >> >> processes? > >> >> > >> >> thanks > >> >> > >> >> laura > >> >> > >> >> Vladimir Razgulin wrote: > >> >> > Laura, > >> >> > The patch you sent doesn't resolve the issue: sometimes everything > >> >> > works, sometimes gdb hangs with 2 zombies, sometimes the whole > >> system > >> >> > dies. > >> >> > I took a look at the code in the patch:: > >> >> > > >> >> > + error = 0; > >> >> > + if (!PVP(vc)->pvp_oppid) { > >> >> > + error = PVPOP_RMV_CHILD_FROM_PARENT(vp, vc, FALSE); > >> >> > + ptrace = 1; > >> >> > + } > >> >> > > >> >> > Should it be without "!", liike > >> >> > + if ( PVP(vc)->pvp_oppid) { > >> >> > > >> >> > Thanks > >> >> > Vladimir > >> >> > > >> >> > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: > >> >> >> > >> >> >> I believe the __ptrace_unlink is failing because the child > >> >> >> gets removed from the parent in the beginning of > >> >> >> dpvproc_nocldwait_async_handler(), which is the right thing > >> >> >> to do if the process isn't being ptraced. > >> >> >> > >> >> >> Attach is a patch that addresses this problem. Can you please > >> >> >> apply and test. (use -p0 to apply) > >> >> >> > >> >> >> laura > >> >> >> > >> >> >> > >> >> >> Vladimir Razgulin wrote: > >> >> >> > Laura, > >> >> >> > > >> >> >> > That's right, > >> >> >> > __ptrace_unlink(p) fails every time inside > >> >> >> > pvpop_rmv_child_from_parent() with -ESRCH > >> >> >> > because it can't find process p in the PVP(w)->pvp_childl list. > >> >> >> > The process p seems to be unlinked alredy - I saw that tracing > >> >> >> > pvp_childl with kdb while stopping inside > >> >> >> pvpop_rmv_child_from_parent(). > >> >> >> > > >> >> >> > Vladimir > >> >> >> > > >> >> >> > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: > >> >> >> >> Hi Vladimir, > >> >> >> >> > >> >> >> >> The reason why wait_task_zombie() returns EAGAIN is because > >> >> >> >> its current parent is the debugger gdb instead of its original > >> >> >> >> parent. The original parent needs to reap its child, so a > >> >> >> >> __ptrace_unlink() is done to stop tracing the process and set > >> >> >> >> the parent back to it original parent, returns EAGAIN so the > >> >> >> >> reap can be done from the original parent. So __ptrace_unlink() > >> >> >> >> must have failed since otherwise the second time in it wouldn't > >> >> >> >> go into that code path. So if you can find out why the > >> >> >> >> __ptrace_unlink() is failing, that would be a good start. > >> >> >> >> > >> >> >> >> laura > >> >> >> >> > >> >> >> >> > >> >> >> >> Vladimir Razgulin wrote: > >> >> >> >> > Laura, Roger, > >> >> >> >> > > >> >> >> >> > I've got the same problem trying to debug a multithreaded > >> program > >> >> >> >> with gdb: > >> >> >> >> > BUG_ON with exit_signal == -1 in wait_task_zombie(). > >> >> >> >> > > >> >> >> >> > I applied a patch for kernel/kernel/exit.c (rev. 1.14 8 Dec > >> 2005) > >> >> >> - Is > >> >> >> >> > it a correct one? - > >> >> >> >> > and now I have dpvproc_nocldwait_async_handler() in an > >> infinite > >> >> >> loop, > >> >> >> >> > calling pvpop_reap() and receiving -EAGAIN as an error code. > >> >> >> >> > > >> >> >> >> > Should wait_task_zombie() return (sometimes ;) ) p->pid > >> istead of > >> >> >> >> -EAGAIN? > >> >> >> >> > Thanks > >> >> >> >> > Vladimir > >> >> >> >> > > >> >> >> >> > I can provide some info about the processes... > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > On 12/6/05, Laura Ramirez <lau...@hp...> wrote: > >> >> >> >> >> > >> >> >> >> >> Hi Roger, > >> >> >> >> >> > >> >> >> >> >> A quick look at the code, there seems to be a comment > >> >> >> >> >> about the ptrace vproc path, may need to be reworked for 2.6 > >> >> merge. > >> >> >> >> >> I dont quite remember what the issue was, it obviously > >> hitting > >> >> >> >> >> BUG_ON with exit_signal == -1. Can you print the vproc and > >> the > >> >> >> >> >> pvproc? Below, you printed the pvproc using the vproc ptr > >> which > >> >> >> >> >> made it look corrupt, but it really isnt, it was just the > >> wrong > >> >> >> print > >> >> >> >> >> call. > >> >> >> >> >> > >> >> >> >> >> laura > >> >> >> >> >> > >> >> >> >> >> Roger Tsang wrote: > >> >> >> >> >> > Laura, > >> >> >> >> >> > > >> >> >> >> >> > I got an oops exiting from gdb while attached to > >> check_bacula > >> >> >> >> which had > >> >> >> >> >> > segfaulted. I eventually fixed the bug in check_bacula, > >> but > >> >> >> take a > >> >> >> >> >> look at > >> >> >> >> >> > the oops below. child_reaper was waiting for check_bacula > >> >> >> which is > >> >> >> >> >> in E > >> >> >> >> >> > state. It looks like pvproc got corrupted. > >> >> >> >> >> > > >> >> >> >> >> > I'll leave this in kdb until tomorrow just in case I > >> left out > >> >> >> >> >> something. > >> >> >> >> >> > > >> >> >> >> >> > Roger > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > pe (11)procfs: impossible type (11)procfs: impossible type > >> >> >> >> (11)procfs: > >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: > >> >> >> >> impossible type > >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type > >> >> >> (11)procfs: > >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: > >> >> >> >> impossible type > >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type > >> >> >> (11)procfs: > >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: > >> >> >> >> impossible type > >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type > >> >> >> >> >> (11)ptrace_unlink: > >> >> >> >> >> > vpop_reclaim failed > >> >> >> >> >> > <4>------------[ cut here ]------------ > >> >> >> >> >> > <1>kernel BUG at kernel/exit.c:1343! > >> >> >> >> >> > <1>invalid operand: 0000 [#1] > >> >> >> >> >> > <4>Modules linked in: nls_utf8 isofs zlib_inflate loop nfsd > >> >> >> exportfs > >> >> >> >> >> > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport > >> >> >> iptable_filter > >> >> >> >> >> > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd > >> >> ehci_hcd > >> >> >> >> >> usbcore > >> >> >> >> >> > floppy drbd bonding sata_via libata sk98lin r8169 via_rhine > >> >> >> dm_mod > >> >> >> >> >> > <4>CPU: 0 > >> >> >> >> >> > <4>EIP: 0060:[<c011da00>] Not tainted VLI > >> >> >> >> >> > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) > >> >> >> >> >> > <4>EIP is at wait_task_zombie+0x220/0x230 > >> >> >> >> >> > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: > >> >> c0430a2c > >> >> >> >> >> > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: > >> >> f7d11c98 > >> >> >> >> >> > <4>ds: 007b es: 007b ss: 0068 > >> >> >> >> >> > <4>Process child_reaper (pid: 2, threadinfo=f7d10000 > >> >> >> task=c1aeea80) > >> >> >> >> >> > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 > >> >> 00000000 > >> >> >> >> >> 00000000 > >> >> >> >> >> > 00000000 > >> >> >> >> >> > <4> 00000001 0001183d 00011667 dd600600 00000000 > >> >> 00000286 > >> >> >> >> >> f7d11d1c > >> >> >> >> >> > c011de68 > >> >> >> >> >> > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 > >> >> f7d10000 > >> >> >> >> >> c3378a40 > >> >> >> >> >> > f7d11d1c > >> >> >> >> >> > more> > >> >> >> >> >> > <4>Call Trace: > >> >> >> >> >> > <4> [<c0104a8f>] show_stack+0x7f/0xa0 > >> >> >> >> >> > <4> [<c0104c25>] show_registers+0x155/0x220 > >> >> >> >> >> > <4> [<c0104fac>] die+0xcc/0x190 > >> >> >> >> >> > <4> [<c01050f6>] do_trap+0x86/0xd0 > >> >> >> >> >> > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 > >> >> >> >> >> > <4> [<c010470b>] error_code+0x2b/0x30 > >> >> >> >> >> > <4> [<c011de68>] pproc_reap+0x228/0x2f0 > >> >> >> >> >> > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 > >> >> >> >> >> > <4> [<c020ea9f>] > >> dpvproc_nocldwait_async_handler+0x13f/0x300 > >> >> >> >> >> > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 > >> >> >> >> >> > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 > >> >> >> >> >> > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 > >> >> >> >> >> > <4> [<c01f77c9>] do_ssisys+0x99/0x200 > >> >> >> >> >> > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 > >> >> >> >> >> > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 > >> >> >> >> >> > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 > >> 77 a0 > >> >> >> ff ff > >> >> >> >> >> e9 24 > >> >> >> >> >> > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 > >> 00 00 > >> >> >> <0f> > >> >> >> >> >> 0b 3f 05 > >> >> >> >> >> > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 > >> >> >> >> >> > <4> > >> >> >> >> >> > kdb> > >> >> >> >> >> > kdb> bt > >> >> >> >> >> > Stack traceback for pid 2 > >> >> >> >> >> > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 > >> >> >> *child_reaper > >> >> >> >> >> > EBP EIP Function (args) > >> >> >> >> >> > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 > >> (0xc8709020, 0x0, > >> >> >> 0x0, > >> >> >> >> >> > 0xf7d11e4c, 0xf7d11e50) > >> >> >> >> >> > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, > >> >> >> 0xf7d11e4c, > >> >> >> >> >> > 0xf7d11e50, 0x11666) > >> >> >> >> >> > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, > >> >> 0xffffffff, > >> >> >> >> 0x20, > >> >> >> >> >> > 0x11666, 0xf7d11e4c) > >> >> >> >> >> > 0xf7d11efc 0xc020ea9f dpvproc_nocldwait_async_handler+0x13f > >> >> >> >> >> (0xed3c9574, > >> >> >> >> >> > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) > >> >> >> >> >> > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 > >> >> >> >> (0xc1aeea80, 0x0, > >> >> >> >> >> > 0x40000001, 0x0, 0xc022f870) > >> >> >> >> >> > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 > >> >> >> >> >> > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 > >> >> >> >> >> > kdb> call print_task_struct 0xc8709020 > >> >> >> >> >> > state=0x20 > >> >> >> >> >> > flags=0x44c > >> >> >> >> >> > ptrace=0x0 > >> >> >> >> >> > lock_depth=-1 > >> >> >> >> >> > prio=116 > >> >> >> >> >> > static_prio=120 > >> >> >> >> >> > array=00000000 > >> >> >> >> >> > sleep_avg=899989756 > >> >> >> >> >> > interactive_credit=1 > >> >> >> >> >> > timestamp=269658653961877 > >> >> >> >> >> > activated=0x0 > >> >> >> >> >> > policy=0 > >> >> >> >> >> > &cpus_allowed=0xc870906c > >> >> >> >> >> > time_slice=49 > >> >> >> >> >> > first_time_slice=1 > >> >> >> >> >> > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 > >> >> >> >> >> > mm=00000000 > >> >> >> >> >> > active_mm=00000000 > >> >> >> >> >> > binfmt=c04b3e68 > >> >> >> >> >> > exit_code=9 > >> >> >> >> >> > exit_signal=-1 > >> >> >> >> >> > pdeath_signal=0 > >> >> >> >> >> > more> > >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> >> >> >> >> > personality=0x0 > >> >> >> >> >> > did_exec=0 > >> >> >> >> >> > pid=71274 > >> >> >> >> >> > epid=71274 > >> >> >> >> >> > ppid=71270 > >> >> >> >> >> > tgid=71271 > >> >> >> >> >> > cltnode=0 > >> >> >> >> >> > p_vproc=0xdd600600 > >> >> >> >> >> > p_vfparent=0x00000000 > >> >> >> >> >> > group_leader=0xf22f2550 > >> >> >> >> >> > &pids=0xc87090c4 > >> >> >> >> >> > set_child_tid 0x00000000 > >> >> >> >> >> > clear_child_tid 0x00000000 > >> >> >> >> >> > rt_priority=0x0 > >> >> >> >> >> > it_real_value=0x0 > >> >> >> >> >> > it_prof_value=0x0 > >> >> >> >> >> > it_virt_value=0x0 > >> >> >> >> >> > it_real_incr=0x0 > >> >> >> >> >> > it_prof_incr=0x0 > >> >> >> >> >> > it_virt_incr=0x0 > >> >> >> >> >> > utime=0 > >> >> >> >> >> > stime=0 > >> >> >> >> >> > more> > >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> >> >> >> >> > nvcsw=3 > >> >> >> >> >> > nivcsw=0 > >> >> >> >> >> > sig_utime=0 > >> >> >> >> >> > sig_stime=0 > >> >> >> >> >> > cutime=0 > >> >> >> >> >> > cstime=0 > >> >> >> >> >> > sig_nvcsw=0 > >> >> >> >> >> > sig_nivcsw=0 > >> >> >> >> >> > cnvcsw=0 > >> >> >> >> >> > cnivcsw=0 > >> >> >> >> >> > start_time.tv_sec=270328 > >> >> >> >> >> > start_time.tv_nsec=615704896 > >> >> >> >> >> > min_flt=0 > >> >> >> >> >> > maj_flt=0 > >> >> >> >> >> > sig_min_flt=0 > >> >> >> >> >> > sig_maj_flt=0 > >> >> >> >> >> > cmin_flt=0 > >> >> >> >> >> > cmaj_flt=0 > >> >> >> >> >> > uid=0 > >> >> >> >> >> > euid=0 > >> >> >> >> >> > suid=0 > >> >> >> >> >> > fsuid=0 > >> >> >> >> >> > more> > >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> >> >> >> >> > gid=0 > >> >> >> >> >> > egid=0 > >> >> >> >> >> > sgid=0 > >> >> >> >> >> > fsgid=0 > >> >> >> >> >> > group_info=0xeaf0b980 > >> >> >> >> >> > cap_effective=0xfffffeff > >> >> >> >> >> > cap_inheritable=0x0 > >> >> >> >> >> > cap_permitted=0xfffffeff > >> >> >> >> >> > keep_capabilities=0 > >> >> >> >> >> > user=0xc0431ae0 > >> >> >> >> >> > &rlim=0xc1af02c4 > >> >> >> >> >> > used_math=1 > >> >> >> >> >> > comm=check_bacula > >> >> >> >> >> > locks=0 > >> >> >> >> >> > link_count=0 > >> >> >> >> >> > total_link_count=1 > >> >> >> >> >> > semvsem.undo_list=f4648200 > >> >> >> >> >> > fs=0x00000000 > >> >> >> >> >> > files=0x00000000 > >> >> >> >> >> > namespace=0x00000000 > >> >> >> >> >> > signal=0xc1af0240 > >> >> >> >> >> > sighand=0xf6c86580 > >> >> >> >> >> > more> > >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> >> >> >> >> > &blocked=0xc8709484 > >> >> >> >> >> > &real_blocked=0xc870948c > >> >> >> >> >> > &pending=0xc8709494 > >> >> >> >> >> > sas_ss_sp=0x0 > >> >> >> >> >> > sas_ss_size=0x00000000 > >> >> >> >> >> > notifier_data=0x00000000 > >> >> >> >> >> > notifier_mask=0x00000000 > >> >> >> >> >> > security=0x00000000 > >> >> >> >> >> > audit_context=0x00000000 > >> >> >> >> >> > parent_exec_id=0x16 > >> >> >> >> >> > self_exec_id=0x16 > >> >> >> >> >> > journal_info=0x00000000 > >> >> >> >> >> > proc_dentry=0xcb5fe8d4 > >> >> >> >> >> > backing_dev_info=0x00000000 > >> >> >> >> >> > io_context=0x00000000 > >> >> >> >> >> > ptrace_message=0x0 > >> >> >> >> >> > last_siginfo=0x00000000 > >> >> >> >> >> > p_nodetime=0 > >> >> >> >> >> > p_ticks_delta=0 > >> >> >> >> >> > icsprio=0x0 > >> >> >> >> >> > execnode=0x00000000 > >> >> >> >> >> > node_context=1 > >> >> >> >> >> > more> > >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> >> >> >> >> > rcopy_task_size=0 > >> >> >> >> >> > &mosix=0xc8709504 > >> >> >> >> >> > Function print_task_struct returned 0x0 > >> >> >> >> >> > kdb> btp 71274 > >> >> >> >> >> > Stack traceback for pid 71274 > >> >> >> >> >> > 0xc8709020 71274 71270 0 0 E 0xc87091e0 > >> >> >> check_bacula > >> >> >> >> >> > EBP EIP Function (args) > >> >> >> >> >> > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, > >> 0xf22f4dc0, > >> >> >> >> >> 0xc012523f, > >> >> >> >> >> > 0xd8017e9c, 0x0) > >> >> >> >> >> > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, > >> >> 0xd8016000) > >> >> >> >> >> > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, > >> >> >> 0xd8016000, > >> >> >> >> >> > 0xd8016000) > >> >> >> >> >> > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db > >> (0xd8017f18, > >> >> >> >> >> 0xd8017ef8, > >> >> >> >> >> > 0xd8017fc4, 0x0, 0xc0218c59) > >> >> >> >> >> > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, > >> >> >> 0xcc7be550, > >> >> >> >> >> > 0xc8709020) > >> >> >> >> >> > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 > >> >> >> >> >> > 0xc0103cf6 work_notifysig+0x13 > >> >> >> >> >> > kdb> btp 71270 > >> >> >> >> >> > Stack traceback for pid 71270 > >> >> >> >> >> > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 gdb > >> >> >> >> >> > EBP EIP Function (args) > >> >> >> >> >> > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, > >> >> >> 0xe2d7fecc, > >> >> >> >> >> > 0xf3eade68) > >> >> >> >> >> > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, 0x1166a, > >> >> 0xa0, > >> >> >> >> 0x0, > >> >> >> >> >> > 0xf3eadedc) > >> >> >> >> >> > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, > >> 0x0, > >> >> >> >> 0x0, 0x0) > >> >> >> >> >> > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, > >> >> >> 0x80000000, 0x0) > >> >> >> >> >> > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 > >> >> >> >> >> > 0xc0103c55 sysenter_past_esp+0x52 > >> >> >> >> >> > kdb> btp 117331 > >> >> >> >> >> > Stack traceback for pid 117331 > >> >> >> >> >> > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 bash > >> >> >> >> >> > EBP EIP Function (args) > >> >> >> >> >> > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, > >> 0xffffffff, > >> >> >> 0x4, > >> >> >> >> >> 0x1ca53, > >> >> >> >> >> > 0xd5399e94) > >> >> >> >> >> > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, > >> 0xffffffff, > >> >> >> 0x24, > >> >> >> >> >> > 0xbffff038, 0xd5399edc) > >> >> >> >> >> > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, 0x0, > >> >> >> >> >> 0xbffff038, 0x0) > >> >> >> >> >> > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, > >> 0xbffff038, > >> >> >> >> 0xa, 0x0) > >> >> >> >> >> > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 > >> >> >> >> >> > 0xc0103c55 sysenter_past_esp+0x52 > >> >> >> >> >> > kdb> > >> >> >> >> >> > kdb> call print_pvproc 0xdd600600 > >> >> >> >> >> > pvp_flag=0x63727076 > >> >> >> >> >> > pvp_wstate=0x1166a > >> >> >> >> >> > pvp_pproc=0x00000007 > >> >> >> >> >> > pvp_head_childl=0xdd60061c > >> >> >> >> >> > pvp_childl=0x00000000 > >> >> >> >> >> > pvp_head_pgrpl=0x00000000 > >> >> >> >> >> > pvp_pgrpl=0x00000000 > >> >> >> >> >> > pvp_sessionl=0x00083049 > >> >> >> >> >> > pvp_head_oclist=0x00000005 > >> >> >> >> >> > pvp_oclist=0xc8709020 > >> >> >> >> >> > pvp_ppid=0 > >> >> >> >> >> > pvp_oppid=0 > >> >> >> >> >> > pvp_sid=0 > >> >> >> >> >> > pvp_pgid=-489161216 > >> >> >> >> >> > pvp_pp_sid=0 > >> >> >> >> >> > pvp_pp_pgid=0 > >> >> >> >> >> > pvp_fromnode=0 > >> >> >> >> >> > pvp_tonode=71270 > >> >> >> >> >> > pvp_cttynode=71271 > >> >> >> >> >> > pvp_cttydev=0x1ca53 > >> >> >> >> >> > pvp_jobc=71271 > >> >> >> >> >> > pvp_pgrp_ldr_seqno=1 > >> >> >> >> >> > more> > >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input ignored > >> >> >> >> >> > pvp_pgrp_mem_seqno=-580909344 > >> >> >> >> >> > pvp_fork_sigmigarg=-580909332 > >> >> >> >> >> > pvp.ml.ml_flag=1 > >> >> >> >> >> > pvp.ml.ml_shr_count=-580909376 > >> >> >> >> >> > pvp.ml.ml_excl_count=0 > >> >> >> >> >> > pvp_loadlevel=-580909332 > >> >> >> >> >> > pvp_pin=0 > >> >> >> >> >> > pvp_localview=0 > >> >> >> >> >> > Function print_pvproc returned 0x0 > >> >> >> >> >> > kdb> > >> >> >> >> >> > > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> ------------------------------------------------------- > >> >> >> >> >> This SF.net email is sponsored by: Splunk Inc. Do you grep > >> >> >> through log > >> >> >> >> >> files > >> >> >> >> >> for problems? Stop! Download the new AJAX search engine > >> that > >> >> >> makes > >> >> >> >> >> searching your log files as easy as surfing the web. > >> DOWNLOAD > >> >> >> >> SPLUNK! > >> >> >> >> >> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > >> >> >> >> >> _______________________________________________ > >> >> >> >> >> ssic-linux-devel mailing list > >> >> >> >> >> ssi...@li... > >> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > >> >> >> >> >> > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > >> >> >> > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> > > >> >> > > >> >> > >> > > >> > > >> > >> > >> > > > > > > > |
From: Laura R. <lau...@hp...> - 2006-05-31 19:11:54
Attachments:
thread_deadlock2.patch
|
Hi Vladimir Please remove "thread_deadlock.patch" and apply the attached patch instead. thanks, laura Vladimir Razgulin wrote: > Laura, > > with thread_deadlock.patch I still get deadlock between the threads > and a frozen gdb - see enclosed ionfo. > > Thanks > Vladimir > > 0xf6526830 73134 72318 0 0 S 0xf65269f0 sshd > 0xf6527330 73149 73134 0 0 S 0xf65274f0 bash > 0xdfc060b0 73213 73149 0 2 S 0xdfc06270 gdb > 0xf645a3b0 73246 73213 0 1 D 0xf645a570 ios_mon > 0xdf990630 73250 73213 0 1 D 0xdf9907f0 ios_mon > ------------------------------------------------------------------------- > btp 73246 > > Stack traceback for pid 73246 > 0xf645a3b0 73246 73213 0 1 D 0xf645a570 ios_mon > EBP EIP Function (args) > 0xdf995e10 0xc03dce4c schedule+0x3fc (0xf645a3b0, 0x1, 0xf645a3b0, > 0xc0121dd0, 0xdffea698) > 0xdf995e48 0xc03dc835 __down+0x75 (0xdffea600, 0xdffea600) > 0xdf995e58 0xc03dc9e6 __down_failed+0xa > 0xc0227b9b .text.lock.dvp_pvpops+0xad > 0xdf995efc 0xc02214d8 pvpop_reassign_child+0x178 (0xdffea600, 0x10, > 0x0, 0xdf994000, 0xdf994000) > 0xdf995f28 0xc02263f2 pvpop_reassign_original_parent+0x132 (0xdffeaa00, > 0x11e22) > 0xdf995f38 0xc022d308 vpop_reassign_original_parent+0x18 (0xdffeaa00, > 0x11e22, 0xf65cf980, 0xdf995f4c, 0xdf995f4c) > 0xdf995f5c 0xc0129272 exit_notify+0xc2 (0xf645a3b0, 0xf645a3b0, > 0xdf990bb0, 0xdf995f88, 0x0) > 0xdf995f88 0xc0129638 do_exit+0x1e8 (0xf645a3b0, 0xc3a80684, > 0xc3a80180, 0x7f, 0x0) > 0xdf995fa8 0xc0129879 do_group_exit+0x39 (0x7f00) > 0xdf995fb4 0xc0129915 sys_exit_group+0x15 > 0xc0106ae1 syscall_call+0x7 > ------------------------------------------------------------------------- > btp 73250 > > Stack traceback for pid 73250 > 0xdf990630 73250 73213 0 1 D 0xdf9907f0 ios_mon > EBP EIP Function (args) > 0xf4fc5d54 0xc03dce4c schedule+0x3fc (0xdf990630, 0x1, 0xdf990630, > 0xc0121dd0, 0xdffeaa98) > 0xf4fc5d8c 0xc03dc835 __down+0x75 (0x79, 0xdffeaafc) > 0xf4fc5d9c 0xc03dc9e6 __down_failed+0xa > 0xc0227eac .text.lock.dvp_pvpops+0x3be > 0xf4fc5dfc 0xc0225fe7 pvpop_add_to_originalchild_list+0x1e7 > (0xdffeaa00, 0xdffeae00, 0x0, 0x0, 0xf4fc4000) > 0xf4fc5e28 0xc02263e1 pvpop_reassign_original_parent+0x121 (0xdffea600, > 0x11e1e) > 0xf4fc5e38 0xc022d308 vpop_reassign_original_parent+0x18 (0xdffea600, > 0x11e1e, 0xf5067200, 0xf4fc5e4c, 0xf4fc5e4c) > 0xf4fc5e5c 0xc0129272 exit_notify+0xc2 (0xdf990630, 0xf65ce840, > 0xf4fc5e78, 0x8, 0x1) > 0xf4fc5e88 0xc0129638 do_exit+0x1e8 (0x0, 0x0, 0x0, 0x9, 0xf4fc4000) > 0xf4fc5ea8 0xc0129879 do_group_exit+0x39 (0x9, 0x0, 0x0, 0xc3a80180, > 0xf4fc4000) > 0xf4fc5ed4 0xc0134313 get_signal_to_deliver+0x2a3 (0xf4fc5f10, > 0xf4fc5ef0, 0xf4fc5fbc, 0x0, 0xf65cf980) > 0xf4fc5f9c 0xc0106850 do_signal+0x70 (0x3b970910, 0x805f2ec, > 0x805f2ec, 0x407679b8) > 0xf4fc5fb4 0xc0106987 do_notify_resume+0x57 > 0xc0106b72 work_notifysig+0x13 > ------------------------------------------------------------------------- > md 0xdffeaa00 > > 0xdffeaa00 63727076 00011e1e 00000012 dffeaa24 vprc........$... > 0xdffeaa10 00000001 dead4ead 00000000 00000000 .....N.......... > 0xdffeaa20 00000000 00817063 00000001 f645a3b0 ....cp........E. > 0xdffeaa30 00000000 00000000 dffea600 00000000 ................ > 0xdffeaa40 c3a4d200 00000000 00000000 00011dfd ................ > 0xdffeaa50 00000000 00011dbd 00011e1e 00011dbd ................ > 0xdffeaa60 00011dfd 00000001 00000001 00000001 ................ > 0xdffeaa70 08800001 00000001 00000001 dead4ead .............N.. > ------------------------------------------------------------------------- > md 0xdffea600 > > 0xdffea600 63727076 00011e22 00000009 dffea624 vprc".......$... > 0xdffea610 00000001 dead4ead 00000000 00000000 .....N.......... > 0xdffea620 00000000 00083061 00000001 df990630 ....a0......0... > 0xdffea630 00000000 dffeaa00 00000000 dffeaa00 ................ > 0xdffea640 00000000 00000000 00000000 00011dfd ................ > 0xdffea650 00000001 00011dbd 00011e1e 00011dbd ................ > 0xdffea660 00011e1e 00000001 00000001 00000001 ................ > 0xdffea670 08800001 00000000 00000001 dead4ead .............N.. > ------------------------------------------------------------------------- > call print_vproc 0xdffeaa00 > > vp_magic=0x63727076 (should be 0x63727076) > vp_pid=73246 > vp_ref_cnt=18 > vp_data=0xdffeaa24 > vp_hashfwd=0x00000000 > vp_hashbwd=0x00000000 > Function print_vproc returned 0x19 > ------------------------------------------------------------------------- > call print_pvproc 0xdffeaa24 > > pvp_flag=0x817063 > pvp_wstate=0x1 > pvp_pproc=0xf645a3b0 > pvp_head_childl=0x00000000 > pvp_childl=0x00000000 > pvp_head_pgrpl=0xdffea600 > pvp_pgrpl=0x00000000 > pvp_sessionl=0xc3a4d200 > pvp_head_oclist=0x00000000 > pvp_oclist=0x00000000 > pvp_ppid=73213 > pvp_oppid=0 > pvp_sid=73149 > pvp_pgid=73246 > pvp_pp_sid=73149 > pvp_pp_pgid=73213 > pvp_fromnode=1 > pvp_tonode=1 > pvp_cttynode=1 > pvp_cttydev=0x8800001 > pvp_jobc=1 > pvp_pgrp_ldr_seqno=0 > pvp_pgrp_mem_seqno=0 > pvp_fork_sigmigarg=0 > pvp.ml.ml_flag=2 > pvp.ml.ml_shr_count=1 > pvp.ml.ml_excl_count=0 > pvp_loadlevel=0 > pvp_pin=0 > pvp_localview=0 > Function print_pvproc returned 0x13 > ------------------------------------------------------------------------- > call print_vproc 0xdffea600 > > vp_magic=0x63727076 (should be 0x63727076) > vp_pid=73250 > vp_ref_cnt=9 > vp_data=0xdffea624 > vp_hashfwd=0x00000000 > vp_hashbwd=0x00000000 > Function print_vproc returned 0x19 > ------------------------------------------------------------------------- > call print_pvproc 0xdffea624 > > pvp_flag=0x83061 > pvp_wstate=0x1 > pvp_pproc=0xdf990630 > pvp_head_childl=0x00000000 > pvp_childl=0xdffeaa00 > pvp_head_pgrpl=0x00000000 > pvp_pgrpl=0xdffeaa00 > pvp_sessionl=0x00000000 > pvp_head_oclist=0x00000000 > pvp_oclist=0x00000000 > pvp_ppid=73213 > pvp_oppid=1 > pvp_sid=73149 > pvp_pgid=73246 > pvp_pp_sid=73149 > pvp_pp_pgid=73246 > pvp_fromnode=1 > pvp_tonode=1 > pvp_cttynode=1 > pvp_cttydev=0x8800001 > pvp_jobc=0 > pvp_pgrp_ldr_seqno=0 > pvp_pgrp_mem_seqno=0 > pvp_fork_sigmigarg=0 > pvp.ml.ml_flag=2 > pvp.ml.ml_shr_count=1 > pvp.ml.ml_excl_count=0 > pvp_loadlevel=0 > pvp_pin=0 > pvp_localview=0 > Function print_pvproc returned 0x13 > ------------------------------------------------------------------------- > > > > > > On 5/30/06, Laura Ramirez <lau...@hp...> wrote: >> >> HI Vladmimir, >> >> Good to hear that the second patch fixed the original problem. >> >> From looking at the stack traces, it does appear to be a >> deadlock. Pid 73961 is reaping pid 73980. It wants to >> reassign it to its original parent 73976. It currently has >> the vproc lock for 73980 and wants the vproc lock for 73976. >> However, 73976 is currently exiting, and its trying to reassign >> its children to next thread (ie. 73980). So it currently has >> its own 73976 lock and wants the vproc lock for 73980, and so >> it's now deadlocked. Besides the deadlock bug, maybe the real >> problem is 73976 reassigning the children to 73980, which has >> already run through its exit code and will never clean up the >> list again. >> >> Attached is a patch that checks the state of the thread, if >> its zombie, then it reassigns it to the init process. >> >> laura >> >> >> Vladimir Razgulin wrote: >> > Laura, >> > >> > Thank you for ptrace2_reap.patch >> > I've tried it, and it fixed the original problem. >> > However, now I have something looking like a racing condition. >> > >> > Again, I'm debugging a multithreaded program: >> > >> > gdb (pid=73961) has started ios_mon (pid=73976), which started three >> other >> > threads (pids=73979, 73980, 73981). >> > The thread 73979 was terminated successfully (thanks to >> ptrace2_reap.patch) >> > and I've got the following: >> > >> > ----------------------------------------------------------------- >> > 0xf5778bb0 73927 73921 0 0 S 0xf5778d70 bash >> > 0xdfd746b0 73961 73927 0 3 D 0xdfd74870 gdb >> > 0xdf990630 73976 73961 0 1 D 0xdf9907f0 ios_mon >> > 0xdf991130 73980 73961 0 2 E 0xdf9912f0 ios_mon >> > 0xdfeaf330 73981 73961 0 0 Z 0xdfeaf4f0 ios_mon >> > 0xf576c030 74134 72761 0 1 D 0xf576c1f0 bash >> > ----------------------------------------------------------------- >> > >> > gdb (73961) wants to do VPROC_LOCK_EXCL for vproc=0xc3a35800, >> > pvproc=0xc3a35824, >> > which is owned by ios_mon (73976) >> > >> > ---------------------------------------------------------------- >> > Stack traceback for pid 73961 >> > 0xdfd746b0 73961 73927 0 3 D 0xdfd74870 gdb >> > EBP EIP Function (args) >> > 0xf3e59b08 0xc03dce0c schedule+0x3fc (0xdfd746b0, 0x1, 0xdfd746b0, >> > 0xc0121dd0, 0xf3cafd04) >> > 0xf3e59b40 0xc03dc7f5 __down+0x75 (0x0, 0x79) >> > 0xf3e59b50 0xc03dc9a6 __down_failed+0xa >> > 0xc0227b04 .text.lock.dvp_pvpops+0x56 >> > 0xf3e59bbc 0xc0221007 pvpop_add_child_to_parent+0x247 (0xc3a35800, >> > 0xc3a35a00, 0x5, 0x0, 0x0) >> > 0xf3e59c3c 0xc0221cc0 pvpop_reclaim_child+0x360 (0xc3a35a00, 0x120f8, >> > 0x0, 0x0, 0x0) >> > 0xf3e59c64 0xc022c3a2 vpop_reclaim_child+0x52 (0xc3a35a00, 0x120f8, >> > 0x0, 0x0, 0x0) >> > 0xf3e59c94 0xc012f835 __ptrace_unlink+0x75 (0xdf991130, 0x120f8, >> > 0xf3e59cc4, 0x0, 0x0) >> > 0xf3e59cd4 0xc0129b68 wait_task_zombie+0x1c8 (0xdf991130, 0x0, 0x0, >> > 0xf3e59e8c, 0xf3e59ed4) >> > 0xf3e59d20 0xc012a194 pproc_reap+0x324 (0xdf991130, 0x0, 0xf3e59e8c, >> > 0xf3e59ed4, 0x120e9) >> > 0xf3e59e30 0xc0220a8d pvpop_reap+0x26d (0xc3a35a00, 0xffffffff, 0x20, >> > 0x120e9, 0xf3e59e8c) >> > 0xf3e59eb0 0xc022b8db vpop_wait+0x69b (0xf6431400, 0xffffffff, 0xa1, >> > 0xbfeec4ec, 0xf3e59ed4) >> > 0xf3e59f80 0xc0129cec do_wait+0xbc (0xffffffff, 0x80000005, 0x0, >> > 0xbfeec4ec, 0x0) >> > 0xf3e59f9c 0xc0129dcc sys_wait4+0x3c (0xffffffff, 0xbfeec4ec, >> > 0x80000001, 0x0) >> > 0xf3e59fb4 0xc0129df5 sys_waitpid+0x25 >> > 0xc0106ae1 syscall_call+0x7 >> > ----------------------------------------------------------------------- >> > md c3a35824 >> > >> > 0xc3a35824 00817063 00000001 df990630 00000000 cp......0....... >> > 0xc3a35834 00000000 c3a4a600 00000000 f6431400 ..............C. >> > 0xc3a35844 c3a35a00 00000000 000120e9 00000000 .Z....... ...... >> > 0xc3a35854 000120c7 000120f8 000120c7 000120e9 . ... ... ... .. >> > 0xc3a35864 00000001 00000001 00000001 08800001 ................ >> > 0xc3a35874 00000001 00000001 dead4ead 00000000 .........N...... >> > 0xc3a35884 ffffffff 00000001 df990630 00000001 ........0....... >> > ^^^^^^^^ >> > 0xc3a35894 dead4ead f3e59b20 f42b5d04 df990630 .N.. ....]+.0... >> > ----------------------------------------------------------------------- >> > >> > At the same time ios_mon (73976) tries to lock vproc=0xc3a35a00, >> > pvproc=0xc3a35a24, >> > which is already locked by gdb (73961) >> > >> > ----------------------------------------------------------------------- >> > >> > Stack traceback for pid 73976 >> > 0xdf990630 73976 73961 0 1 D 0xdf9907f0 ios_mon >> > EBP EIP Function (args) >> > 0xf3d89e58 0xc03dce0c schedule+0x3fc (0xdf990630, 0x1, 0xdf990630, >> > 0xc0121dd0, 0xc3a35a98) >> > 0xf3d89e90 0xc03dc7f5 __down+0x75 (0x79, 0xc3a35afc) >> > 0xf3d89ea0 0xc03dc9a6 __down_failed+0xa >> > 0xc0227e6c .text.lock.dvp_pvpops+0x3be >> > 0xf3d89f00 0xc0225fe7 pvpop_add_to_originalchild_list+0x1e7 >> > (0xc3a35a00, 0xc3a4a600, 0x0, 0xf3d89f40, 0xf55b4800) >> > 0xf3d89f28 0xc02263b1 pvpop_reassign_original_parent+0xf1 (0xc3a35800, >> > 0x120fc) >> > 0xf3d89f38 0xc022d2c8 vpop_reassign_original_parent+0x18 (0xc3a35800, >> > 0x120fc, 0xf5e96380, 0xf3d89f4c, 0xf3d89f4c) >> > 0xf3d89f5c 0xc0129272 exit_notify+0xc2 (0xdf990630, 0xdf990630, >> > 0xdfeaf330, 0xf3d89f88, 0x0) >> > 0xf3d89f88 0xc0129638 do_exit+0x1e8 (0xdf990630, 0xf5e28584, >> > 0xf5e28080, 0x7f, 0x0) >> > 0xf3d89fa8 0xc0129879 do_group_exit+0x39 (0x7f00) >> > 0xf3d89fb4 0xc0129915 sys_exit_group+0x15 >> > 0xc0106ae1 syscall_call+0x7 >> > [1]kdb> >> > ----------------------------------------------------------------------- >> > >> > md c3a35a24 >> > >> > 0xc3a35a24 00083069 00000005 df991130 00000000 i0......0....... >> > 0xc3a35a34 00000000 00000000 c3a35800 00000000 .........X...... >> > 0xc3a35a44 00000000 00000000 000120e9 000120f8 ......... ... .. >> > 0xc3a35a54 000120c7 000120f8 000120c7 000120f8 . ... ... ... .. >> > 0xc3a35a64 00000001 00000001 00000001 08800001 ................ >> > 0xc3a35a74 00000000 00000001 dead4ead 00000000 .........N...... >> > 0xc3a35a84 ffffffff 00000001 dfd746b0 00000001 .........F...... >> > ^^^^^^^^ >> > 0xc3a35a94 dead4ead f3d89e70 f3d89e70 dfd746b0 .N..p...p....F.. >> > ---------------------------------------------------------------------- >> > >> > and 0xc3a4a600 here is vproc for pid=73981 >> > >> > I can provide more information about the issue if it's necesary. >> > Do you think that work should be serialized throught nocldwait queue? >> > >> > Thanks, >> > Vladimir >> > >> > >> > On 5/26/06, Laura Ramirez <lau...@hp...> wrote: >> >> >> >> Hi Vladimir, >> >> >> >> Looking at the code more closely, I believe the "ptrace_reap.patch" >> >> I gave you is NOT correct. Please remove that patch. >> >> >> >> Attached is a different patch "ptrace_reap2.patch". In the original >> >> code, the parent who is the gdb process is not getting a SIGCHLD, >> >> instead the process just gets queued to the nocldwait. I believe, the >> >> correct path should be the gdb parent gets the SIGCHLD, goes to do the >> >> reap, sees its ptraced and does the ptrace_unlink(). Assuming the >> >> ptrace_unlink() works, it should return -EAGAIN, do the >> >> PVPOP_REPORT_STATE() to its original parent, sees its original >> parent is >> >> not reaping children and gets queued to the nocldwait queue, which >> >> should do reap successfully this time and not return -EAGAIN. With >> the >> >> attached patch, I'm hoping that's going to happen. >> >> >> >> Apply with -p0, let me know if it works or where it deviated from the >> >> code path i stated above. >> >> >> >> thanks, >> >> >> >> laura >> >> >> >> >> >> >> >> Vladimir Razgulin wrote: >> >> > Laura, >> >> > following your advice I got a snapshot of the moment the process >> >> > becomes a zombie and a work item is added into >> nocldwait_async_queue: >> >> > >> >> > Please, note, that >> >> > the process id is 74226, it's a thread process created by 74223 >> >> under gdb, >> >> > and its ppid and oppid are different - see attached info >> >> > >> >> > Thank you for your help >> >> > Vladimir >> >> > >> >> > Do you think I can CC: our conversation to ssic-linux-devel? >> >> > >> >> > >> >> > >> >> >> ------------------------------------------------------------------------------------------------------------------ >> >> >> >> >> > >> >> > 0xdfd019b0 73995 72325 0 1 S 0xdfd01b70 sshd >> >> > 0xf674b130 74000 73995 0 2 S 0xf674b2f0 bash >> >> > 0xf5f078b0 74210 73171 0 2 S 0xf5f07a70 gdb >> >> > 0xc3bbb930 74223 74210 0 3 S 0xc3bbbaf0 ios_mon >> >> > 0xdfeff230 74226 74210 1 3 R 0xdfeff3f0 *ios_mon >> >> > 0xdfeff7b0 74227 74210 0 0 S 0xdfeff970 ios_mon >> >> > 0xf5dd9830 74229 74210 0 1 S 0xf5dd99f0 ios_mon >> >> > >> >> >> ------------------------------------------------------------------------------------------------------------------ >> >> >> >> >> > >> >> > Stack traceback for pid 74226 >> >> > 0xdfeff230 74226 74210 1 3 R 0xdfeff3f0 *ios_mon >> >> > EBP EIP Function (args) >> >> > 0xec599ef4 0xc0220530 dpvproc_nocldwait_async_queue (0xf6bf1a00, >> >> > 0xf6be1800, 0xc040e203, 0xf6ec5efc, 0xec599e84) >> >> > 0xc0224dd3 pvpop_report_state+0x463 (0xf6bf1a00, >> >> > 0xf6be1800, 0xec599f1c, 0xec599f20, 0x1) >> >> > 0xec599f5c 0xc022b3a4 vpop_cleanup_vproc_relations+0x2a4 >> (0xf6be1800, >> >> > 0xdfeff230, 0xf5e42980, 0xec599f70, 0xec599f70) >> >> > 0xec599f80 0xc01292f7 exit_notify+0x137 (0xdfeff230, 0xc04048da, >> 0x4, >> >> > 0xec599fbc, 0x0) >> >> > 0xec599fac 0xc0129648 do_exit+0x1e8 >> >> > 0xec599fb4 0xc012983f sys_exit+0xf >> >> > 0xc0106ae1 syscall_call+0x7 >> >> > ---------------------------------------------------------------- >> >> > call print_task_struct 0xdfeff230 >> >> > >> >> > state=0x0 >> >> > flags=0x802044 >> >> > ptrace=0x1f1 >> >> > lock_depth=-1 >> >> > prio=116 >> >> > static_prio=120 >> >> > array=c383055c >> >> > sleep_avg=933888889 >> >> > timestamp=4295805885000000 >> >> > activated=0x0 >> >> > policy=0 >> >> > &cpus_allowed=0xdfeff280 >> >> > time_slice=17 >> >> > first_time_slice=1 >> >> > tasks.next 0xc044fc9c, tasks.prev 0xf5f0790c >> >> > mm=00000000 >> >> > active_mm=f5e42980 >> >> > binfmt=c04dad74 >> >> > exit_code=0 >> >> > exit_signal=-1 >> >> > pdeath_signal=0 >> >> > personality=0x400000 >> >> > did_exec=0 >> >> > pid=74226 >> >> > epid=74226 >> >> > ppid=74210 >> >> > tgid=74223 >> >> > cltnode=0 >> >> > p_vproc=0xf6be1800 >> >> > p_vfparent=0x00000000 >> >> > group_leader=0xc3bbb930 >> >> > &pids=0xdfeff2d8 >> >> > set_child_tid 0x00000000 >> >> > clear_child_tid 0x00000000 >> >> > rt_priority=0x0 >> >> > utime=0 >> >> > stime=0 >> >> > nvcsw=11 >> >> > nivcsw=0 >> >> > sig_utime=0 >> >> > sig_stime=0 >> >> > cutime=0 >> >> > cstime=0 >> >> > sig_nvcsw=0 >> >> > sig_nivcsw=0 >> >> > cnvcsw=0 >> >> > cnivcsw=0 >> >> > start_time.tv_sec=1138 >> >> > start_time.tv_nsec=381796336 >> >> > min_flt=4 >> >> > maj_flt=0 >> >> > sig_min_flt=0 >> >> > sig_maj_flt=0 >> >> > cmin_flt=0 >> >> > cmaj_flt=0 >> >> > uid=0 >> >> > euid=0 >> >> > suid=0 >> >> > fsuid=0 >> >> > gid=0 >> >> > egid=0 >> >> > sgid=0 >> >> > fsgid=0 >> >> > group_info=0xf502c080 >> >> > cap_effective=0xfffffeff >> >> > cap_inheritable=0x0 >> >> > cap_permitted=0xfffffeff >> >> > keep_capabilities=0 >> >> > user=0xc0454c00 >> >> > &rlim=0xdffdd5d4 >> >> > comm=ios_mon >> >> > locks=0 >> >> > link_count=0 >> >> > total_link_count=0 >> >> > semvsem.undo_list=c3b4aa20 >> >> > fs=0x00000000 >> >> > files=0x00000000 >> >> > namespace=0x00000000 >> >> > signal=0xdffdd500 >> >> > sighand=0xf5e29900 >> >> > &blocked=0xdfeff694 >> >> > &real_blocked=0xdfeff69c >> >> > &pending=0xdfeff6a4 >> >> > sas_ss_sp=0x0 >> >> > sas_ss_size=0x00000000 >> >> > notifier_data=0x00000000 >> >> > notifier_mask=0x00000000 >> >> > security=0x00000000 >> >> > audit_context=0x00000000 >> >> > parent_exec_id=0x12 >> >> > self_exec_id=0x12 >> >> > journal_info=0x00000000 >> >> > proc_dentry=0xecf027a0 >> >> > backing_dev_info=0x00000000 >> >> > io_context=0x00000000 >> >> > ptrace_message=0x0 >> >> > last_siginfo=0x00000000 >> >> > p_nodetime=0 >> >> > p_ticks_delta=0 >> >> > icsprio=0x0 >> >> > execnode=0x00000000 >> >> > node_context=1 >> >> > rcopy_task_size=0 >> >> > &mosix=0xdfeff764 >> >> > ---------------------------------------------------------- >> >> > call print_vproc f6be1800 >> >> > >> >> > vp_magic=0x63727076 (should be 0x63727076) >> >> > vp_pid=74226 >> >> > vp_ref_cnt=8 >> >> > vp_data=0xf6be1824 >> >> > vp_hashfwd=0xc3a60200 >> >> > vp_hashbwd=0x00000000 >> >> > Function print_vproc returned 0x19 >> >> > ---------------------------------------------------------------- >> >> > call print_pvproc f6be1824 >> >> > >> >> > pvp_flag=0x83069 >> >> > pvp_wstate=0x5 >> >> > pvp_pproc=0xdfeff230 >> >> > pvp_head_childl=0x00000000 >> >> > pvp_childl=0xf6ec5e00 >> >> > pvp_head_pgrpl=0x00000000 >> >> > pvp_pgrpl=0xf6ec5e00 >> >> > pvp_sessionl=0x00000000 >> >> > pvp_head_oclist=0x00000000 >> >> > pvp_oclist=0x00000000 >> >> > pvp_ppid=74210 >> >> > pvp_oppid=74223 >> >> > pvp_sid=73171 >> >> > pvp_pgid=74223 >> >> > pvp_pp_sid=73171 >> >> > pvp_pp_pgid=74223 >> >> > pvp_fromnode=1 >> >> > pvp_tonode=1 >> >> > pvp_cttynode=1 >> >> > pvp_cttydev=0x8800000 >> >> > pvp_jobc=0 >> >> > pvp_pgrp_ldr_seqno=0 >> >> > pvp_pgrp_mem_seqno=0 >> >> > pvp_fork_sigmigarg=0 >> >> > pvp.ml.ml_flag=0 >> >> > pvp.ml.ml_shr_count=0 >> >> > pvp.ml.ml_excl_count=0 >> >> > pvp_loadlevel=0 >> >> > pvp_pin=0 >> >> > pvp_localview=0 >> >> > Function print_pvproc returned 0x13 >> >> > ---------------------------------------------------------------- >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > On 5/25/06, Laura Ramirez <lau...@hp...> wrote: >> >> >> >> >> >> Hi Vladimir, >> >> >> >> >> >> maybe we need to take a step back, I was focused on why the >> >> >> ptrace_unlink() call was failing, which was because the child >> >> >> had already gotten removed from the parent list. Which is why >> >> >> I created the following patch, which only removed the child from >> >> >> the parent list if the child wasnt being ptraced (ie pvp_oppid >> == 0). >> >> >> So the ! is correct for that logic. >> >> >> >> >> >> But maybe we need to look at why this process is being reaped by >> >> >> the nocld_wait_daemon, instead of the gdb parent. >> >> >> >> >> >> Can you do a " call print_task_struct < address>" of the zombie >> >> >> processes? >> >> >> >> >> >> thanks >> >> >> >> >> >> laura >> >> >> >> >> >> Vladimir Razgulin wrote: >> >> >> > Laura, >> >> >> > The patch you sent doesn't resolve the issue: sometimes >> everything >> >> >> > works, sometimes gdb hangs with 2 zombies, sometimes the whole >> >> system >> >> >> > dies. >> >> >> > I took a look at the code in the patch:: >> >> >> > >> >> >> > + error = 0; >> >> >> > + if (!PVP(vc)->pvp_oppid) { >> >> >> > + error = PVPOP_RMV_CHILD_FROM_PARENT(vp, vc, FALSE); >> >> >> > + ptrace = 1; >> >> >> > + } >> >> >> > >> >> >> > Should it be without "!", liike >> >> >> > + if ( PVP(vc)->pvp_oppid) { >> >> >> > >> >> >> > Thanks >> >> >> > Vladimir >> >> >> > >> >> >> > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: >> >> >> >> >> >> >> >> I believe the __ptrace_unlink is failing because the child >> >> >> >> gets removed from the parent in the beginning of >> >> >> >> dpvproc_nocldwait_async_handler(), which is the right thing >> >> >> >> to do if the process isn't being ptraced. >> >> >> >> >> >> >> >> Attach is a patch that addresses this problem. Can you please >> >> >> >> apply and test. (use -p0 to apply) >> >> >> >> >> >> >> >> laura >> >> >> >> >> >> >> >> >> >> >> >> Vladimir Razgulin wrote: >> >> >> >> > Laura, >> >> >> >> > >> >> >> >> > That's right, >> >> >> >> > __ptrace_unlink(p) fails every time inside >> >> >> >> > pvpop_rmv_child_from_parent() with -ESRCH >> >> >> >> > because it can't find process p in the PVP(w)->pvp_childl >> list. >> >> >> >> > The process p seems to be unlinked alredy - I saw that tracing >> >> >> >> > pvp_childl with kdb while stopping inside >> >> >> >> pvpop_rmv_child_from_parent(). >> >> >> >> > >> >> >> >> > Vladimir >> >> >> >> > >> >> >> >> > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: >> >> >> >> >> Hi Vladimir, >> >> >> >> >> >> >> >> >> >> The reason why wait_task_zombie() returns EAGAIN is because >> >> >> >> >> its current parent is the debugger gdb instead of its >> original >> >> >> >> >> parent. The original parent needs to reap its child, so a >> >> >> >> >> __ptrace_unlink() is done to stop tracing the process and set >> >> >> >> >> the parent back to it original parent, returns EAGAIN so the >> >> >> >> >> reap can be done from the original parent. So >> __ptrace_unlink() >> >> >> >> >> must have failed since otherwise the second time in it >> wouldn't >> >> >> >> >> go into that code path. So if you can find out why the >> >> >> >> >> __ptrace_unlink() is failing, that would be a good start. >> >> >> >> >> >> >> >> >> >> laura >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Vladimir Razgulin wrote: >> >> >> >> >> > Laura, Roger, >> >> >> >> >> > >> >> >> >> >> > I've got the same problem trying to debug a multithreaded >> >> program >> >> >> >> >> with gdb: >> >> >> >> >> > BUG_ON with exit_signal == -1 in wait_task_zombie(). >> >> >> >> >> > >> >> >> >> >> > I applied a patch for kernel/kernel/exit.c (rev. 1.14 8 Dec >> >> 2005) >> >> >> >> - Is >> >> >> >> >> > it a correct one? - >> >> >> >> >> > and now I have dpvproc_nocldwait_async_handler() in an >> >> infinite >> >> >> >> loop, >> >> >> >> >> > calling pvpop_reap() and receiving -EAGAIN as an error >> code. >> >> >> >> >> > >> >> >> >> >> > Should wait_task_zombie() return (sometimes ;) ) p->pid >> >> istead of >> >> >> >> >> -EAGAIN? >> >> >> >> >> > Thanks >> >> >> >> >> > Vladimir >> >> >> >> >> > >> >> >> >> >> > I can provide some info about the processes... >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > On 12/6/05, Laura Ramirez <lau...@hp...> wrote: >> >> >> >> >> >> >> >> >> >> >> >> Hi Roger, >> >> >> >> >> >> >> >> >> >> >> >> A quick look at the code, there seems to be a comment >> >> >> >> >> >> about the ptrace vproc path, may need to be reworked >> for 2.6 >> >> >> merge. >> >> >> >> >> >> I dont quite remember what the issue was, it obviously >> >> hitting >> >> >> >> >> >> BUG_ON with exit_signal == -1. Can you print the vproc and >> >> the >> >> >> >> >> >> pvproc? Below, you printed the pvproc using the vproc ptr >> >> which >> >> >> >> >> >> made it look corrupt, but it really isnt, it was just the >> >> wrong >> >> >> >> print >> >> >> >> >> >> call. >> >> >> >> >> >> >> >> >> >> >> >> laura >> >> >> >> >> >> >> >> >> >> >> >> Roger Tsang wrote: >> >> >> >> >> >> > Laura, >> >> >> >> >> >> > >> >> >> >> >> >> > I got an oops exiting from gdb while attached to >> >> check_bacula >> >> >> >> >> which had >> >> >> >> >> >> > segfaulted. I eventually fixed the bug in check_bacula, >> >> but >> >> >> >> take a >> >> >> >> >> >> look at >> >> >> >> >> >> > the oops below. child_reaper was waiting for >> check_bacula >> >> >> >> which is >> >> >> >> >> >> in E >> >> >> >> >> >> > state. It looks like pvproc got corrupted. >> >> >> >> >> >> > >> >> >> >> >> >> > I'll leave this in kdb until tomorrow just in case I >> >> left out >> >> >> >> >> >> something. >> >> >> >> >> >> > >> >> >> >> >> >> > Roger >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> > pe (11)procfs: impossible type (11)procfs: impossible >> type >> >> >> >> >> (11)procfs: >> >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: >> >> >> >> >> impossible type >> >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type >> >> >> >> (11)procfs: >> >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: >> >> >> >> >> impossible type >> >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type >> >> >> >> (11)procfs: >> >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: >> >> >> >> >> impossible type >> >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type >> >> >> >> >> >> (11)ptrace_unlink: >> >> >> >> >> >> > vpop_reclaim failed >> >> >> >> >> >> > <4>------------[ cut here ]------------ >> >> >> >> >> >> > <1>kernel BUG at kernel/exit.c:1343! >> >> >> >> >> >> > <1>invalid operand: 0000 [#1] >> >> >> >> >> >> > <4>Modules linked in: nls_utf8 isofs zlib_inflate >> loop nfsd >> >> >> >> exportfs >> >> >> >> >> >> > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport >> >> >> >> iptable_filter >> >> >> >> >> >> > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd >> >> >> ehci_hcd >> >> >> >> >> >> usbcore >> >> >> >> >> >> > floppy drbd bonding sata_via libata sk98lin r8169 >> via_rhine >> >> >> >> dm_mod >> >> >> >> >> >> > <4>CPU: 0 >> >> >> >> >> >> > <4>EIP: 0060:[<c011da00>] Not tainted VLI >> >> >> >> >> >> > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) >> >> >> >> >> >> > <4>EIP is at wait_task_zombie+0x220/0x230 >> >> >> >> >> >> > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: >> >> >> c0430a2c >> >> >> >> >> >> > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: >> >> >> f7d11c98 >> >> >> >> >> >> > <4>ds: 007b es: 007b ss: 0068 >> >> >> >> >> >> > <4>Process child_reaper (pid: 2, threadinfo=f7d10000 >> >> >> >> task=c1aeea80) >> >> >> >> >> >> > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 >> >> >> 00000000 >> >> >> >> >> >> 00000000 >> >> >> >> >> >> > 00000000 >> >> >> >> >> >> > <4> 00000001 0001183d 00011667 dd600600 00000000 >> >> >> 00000286 >> >> >> >> >> >> f7d11d1c >> >> >> >> >> >> > c011de68 >> >> >> >> >> >> > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 >> >> >> f7d10000 >> >> >> >> >> >> c3378a40 >> >> >> >> >> >> > f7d11d1c >> >> >> >> >> >> > more> >> >> >> >> >> >> > <4>Call Trace: >> >> >> >> >> >> > <4> [<c0104a8f>] show_stack+0x7f/0xa0 >> >> >> >> >> >> > <4> [<c0104c25>] show_registers+0x155/0x220 >> >> >> >> >> >> > <4> [<c0104fac>] die+0xcc/0x190 >> >> >> >> >> >> > <4> [<c01050f6>] do_trap+0x86/0xd0 >> >> >> >> >> >> > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 >> >> >> >> >> >> > <4> [<c010470b>] error_code+0x2b/0x30 >> >> >> >> >> >> > <4> [<c011de68>] pproc_reap+0x228/0x2f0 >> >> >> >> >> >> > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 >> >> >> >> >> >> > <4> [<c020ea9f>] >> >> dpvproc_nocldwait_async_handler+0x13f/0x300 >> >> >> >> >> >> > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 >> >> >> >> >> >> > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 >> >> >> >> >> >> > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 >> >> >> >> >> >> > <4> [<c01f77c9>] do_ssisys+0x99/0x200 >> >> >> >> >> >> > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 >> >> >> >> >> >> > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 >> >> >> >> >> >> > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 >> >> 77 a0 >> >> >> >> ff ff >> >> >> >> >> >> e9 24 >> >> >> >> >> >> > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 >> >> 00 00 >> >> >> >> <0f> >> >> >> >> >> >> 0b 3f 05 >> >> >> >> >> >> > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 >> >> >> >> >> >> > <4> >> >> >> >> >> >> > kdb> >> >> >> >> >> >> > kdb> bt >> >> >> >> >> >> > Stack traceback for pid 2 >> >> >> >> >> >> > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 >> >> >> >> *child_reaper >> >> >> >> >> >> > EBP EIP Function (args) >> >> >> >> >> >> > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 >> >> (0xc8709020, 0x0, >> >> >> >> 0x0, >> >> >> >> >> >> > 0xf7d11e4c, 0xf7d11e50) >> >> >> >> >> >> > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, >> >> >> >> 0xf7d11e4c, >> >> >> >> >> >> > 0xf7d11e50, 0x11666) >> >> >> >> >> >> > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, >> >> >> 0xffffffff, >> >> >> >> >> 0x20, >> >> >> >> >> >> > 0x11666, 0xf7d11e4c) >> >> >> >> >> >> > 0xf7d11efc 0xc020ea9f >> dpvproc_nocldwait_async_handler+0x13f >> >> >> >> >> >> (0xed3c9574, >> >> >> >> >> >> > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) >> >> >> >> >> >> > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 >> >> >> >> >> (0xc1aeea80, 0x0, >> >> >> >> >> >> > 0x40000001, 0x0, 0xc022f870) >> >> >> >> >> >> > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 >> >> >> >> >> >> > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 >> >> >> >> >> >> > kdb> call print_task_struct 0xc8709020 >> >> >> >> >> >> > state=0x20 >> >> >> >> >> >> > flags=0x44c >> >> >> >> >> >> > ptrace=0x0 >> >> >> >> >> >> > lock_depth=-1 >> >> >> >> >> >> > prio=116 >> >> >> >> >> >> > static_prio=120 >> >> >> >> >> >> > array=00000000 >> >> >> >> >> >> > sleep_avg=899989756 >> >> >> >> >> >> > interactive_credit=1 >> >> >> >> >> >> > timestamp=269658653961877 >> >> >> >> >> >> > activated=0x0 >> >> >> >> >> >> > policy=0 >> >> >> >> >> >> > &cpus_allowed=0xc870906c >> >> >> >> >> >> > time_slice=49 >> >> >> >> >> >> > first_time_slice=1 >> >> >> >> >> >> > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 >> >> >> >> >> >> > mm=00000000 >> >> >> >> >> >> > active_mm=00000000 >> >> >> >> >> >> > binfmt=c04b3e68 >> >> >> >> >> >> > exit_code=9 >> >> >> >> >> >> > exit_signal=-1 >> >> >> >> >> >> > pdeath_signal=0 >> >> >> >> >> >> > more> >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input >> ignored >> >> >> >> >> >> > personality=0x0 >> >> >> >> >> >> > did_exec=0 >> >> >> >> >> >> > pid=71274 >> >> >> >> >> >> > epid=71274 >> >> >> >> >> >> > ppid=71270 >> >> >> >> >> >> > tgid=71271 >> >> >> >> >> >> > cltnode=0 >> >> >> >> >> >> > p_vproc=0xdd600600 >> >> >> >> >> >> > p_vfparent=0x00000000 >> >> >> >> >> >> > group_leader=0xf22f2550 >> >> >> >> >> >> > &pids=0xc87090c4 >> >> >> >> >> >> > set_child_tid 0x00000000 >> >> >> >> >> >> > clear_child_tid 0x00000000 >> >> >> >> >> >> > rt_priority=0x0 >> >> >> >> >> >> > it_real_value=0x0 >> >> >> >> >> >> > it_prof_value=0x0 >> >> >> >> >> >> > it_virt_value=0x0 >> >> >> >> >> >> > it_real_incr=0x0 >> >> >> >> >> >> > it_prof_incr=0x0 >> >> >> >> >> >> > it_virt_incr=0x0 >> >> >> >> >> >> > utime=0 >> >> >> >> >> >> > stime=0 >> >> >> >> >> >> > more> >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input >> ignored >> >> >> >> >> >> > nvcsw=3 >> >> >> >> >> >> > nivcsw=0 >> >> >> >> >> >> > sig_utime=0 >> >> >> >> >> >> > sig_stime=0 >> >> >> >> >> >> > cutime=0 >> >> >> >> >> >> > cstime=0 >> >> >> >> >> >> > sig_nvcsw=0 >> >> >> >> >> >> > sig_nivcsw=0 >> >> >> >> >> >> > cnvcsw=0 >> >> >> >> >> >> > cnivcsw=0 >> >> >> >> >> >> > start_time.tv_sec=270328 >> >> >> >> >> >> > start_time.tv_nsec=615704896 >> >> >> >> >> >> > min_flt=0 >> >> >> >> >> >> > maj_flt=0 >> >> >> >> >> >> > sig_min_flt=0 >> >> >> >> >> >> > sig_maj_flt=0 >> >> >> >> >> >> > cmin_flt=0 >> >> >> >> >> >> > cmaj_flt=0 >> >> >> >> >> >> > uid=0 >> >> >> >> >> >> > euid=0 >> >> >> >> >> >> > suid=0 >> >> >> >> >> >> > fsuid=0 >> >> >> >> >> >> > more> >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input >> ignored >> >> >> >> >> >> > gid=0 >> >> >> >> >> >> > egid=0 >> >> >> >> >> >> > sgid=0 >> >> >> >> >> >> > fsgid=0 >> >> >> >> >> >> > group_info=0xeaf0b980 >> >> >> >> >> >> > cap_effective=0xfffffeff >> >> >> >> >> >> > cap_inheritable=0x0 >> >> >> >> >> >> > cap_permitted=0xfffffeff >> >> >> >> >> >> > keep_capabilities=0 >> >> >> >> >> >> > user=0xc0431ae0 >> >> >> >> >> >> > &rlim=0xc1af02c4 >> >> >> >> >> >> > used_math=1 >> >> >> >> >> >> > comm=check_bacula >> >> >> >> >> >> > locks=0 >> >> >> >> >> >> > link_count=0 >> >> >> >> >> >> > total_link_count=1 >> >> >> >> >> >> > semvsem.undo_list=f4648200 >> >> >> >> >> >> > fs=0x00000000 >> >> >> >> >> >> > files=0x00000000 >> >> >> >> >> >> > namespace=0x00000000 >> >> >> >> >> >> > signal=0xc1af0240 >> >> >> >> >> >> > sighand=0xf6c86580 >> >> >> >> >> >> > more> >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input >> ignored >> >> >> >> >> >> > &blocked=0xc8709484 >> >> >> >> >> >> > &real_blocked=0xc870948c >> >> >> >> >> >> > &pending=0xc8709494 >> >> >> >> >> >> > sas_ss_sp=0x0 >> >> >> >> >> >> > sas_ss_size=0x00000000 >> >> >> >> >> >> > notifier_data=0x00000000 >> >> >> >> >> >> > notifier_mask=0x00000000 >> >> >> >> >> >> > security=0x00000000 >> >> >> >> >> >> > audit_context=0x00000000 >> >> >> >> >> >> > parent_exec_id=0x16 >> >> >> >> >> >> > self_exec_id=0x16 >> >> >> >> >> >> > journal_info=0x00000000 >> >> >> >> >> >> > proc_dentry=0xcb5fe8d4 >> >> >> >> >> >> > backing_dev_info=0x00000000 >> >> >> >> >> >> > io_context=0x00000000 >> >> >> >> >> >> > ptrace_message=0x0 >> >> >> >> >> >> > last_siginfo=0x00000000 >> >> >> >> >> >> > p_nodetime=0 >> >> >> >> >> >> > p_ticks_delta=0 >> >> >> >> >> >> > icsprio=0x0 >> >> >> >> >> >> > execnode=0x00000000 >> >> >> >> >> >> > node_context=1 >> >> >> >> >> >> > more> >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input >> ignored >> >> >> >> >> >> > rcopy_task_size=0 >> >> >> >> >> >> > &mosix=0xc8709504 >> >> >> >> >> >> > Function print_task_struct returned 0x0 >> >> >> >> >> >> > kdb> btp 71274 >> >> >> >> >> >> > Stack traceback for pid 71274 >> >> >> >> >> >> > 0xc8709020 71274 71270 0 0 E 0xc87091e0 >> >> >> >> check_bacula >> >> >> >> >> >> > EBP EIP Function (args) >> >> >> >> >> >> > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, >> >> 0xf22f4dc0, >> >> >> >> >> >> 0xc012523f, >> >> >> >> >> >> > 0xd8017e9c, 0x0) >> >> >> >> >> >> > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, >> >> >> 0xd8016000) >> >> >> >> >> >> > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, >> >> >> >> 0xd8016000, >> >> >> >> >> >> > 0xd8016000) >> >> >> >> >> >> > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db >> >> (0xd8017f18, >> >> >> >> >> >> 0xd8017ef8, >> >> >> >> >> >> > 0xd8017fc4, 0x0, 0xc0218c59) >> >> >> >> >> >> > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, >> >> >> >> 0xcc7be550, >> >> >> >> >> >> > 0xc8709020) >> >> >> >> >> >> > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 >> >> >> >> >> >> > 0xc0103cf6 work_notifysig+0x13 >> >> >> >> >> >> > kdb> btp 71270 >> >> >> >> >> >> > Stack traceback for pid 71270 >> >> >> >> >> >> > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 >> gdb >> >> >> >> >> >> > EBP EIP Function (args) >> >> >> >> >> >> > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, >> >> >> >> 0xe2d7fecc, >> >> >> >> >> >> > 0xf3eade68) >> >> >> >> >> >> > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, >> 0x1166a, >> >> >> 0xa0, >> >> >> >> >> 0x0, >> >> >> >> >> >> > 0xf3eadedc) >> >> >> >> >> >> > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, >> >> 0x0, >> >> >> >> >> 0x0, 0x0) >> >> >> >> >> >> > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, >> >> >> >> 0x80000000, 0x0) >> >> >> >> >> >> > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 >> >> >> >> >> >> > 0xc0103c55 sysenter_past_esp+0x52 >> >> >> >> >> >> > kdb> btp 117331 >> >> >> >> >> >> > Stack traceback for pid 117331 >> >> >> >> >> >> > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 >> bash >> >> >> >> >> >> > EBP EIP Function (args) >> >> >> >> >> >> > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, >> >> 0xffffffff, >> >> >> >> 0x4, >> >> >> >> >> >> 0x1ca53, >> >> >> >> >> >> > 0xd5399e94) >> >> >> >> >> >> > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, >> >> 0xffffffff, >> >> >> >> 0x24, >> >> >> >> >> >> > 0xbffff038, 0xd5399edc) >> >> >> >> >> >> > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, >> 0x0, >> >> >> >> >> >> 0xbffff038, 0x0) >> >> >> >> >> >> > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, >> >> 0xbffff038, >> >> >> >> >> 0xa, 0x0) >> >> >> >> >> >> > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 >> >> >> >> >> >> > 0xc0103c55 sysenter_past_esp+0x52 >> >> >> >> >> >> > kdb> >> >> >> >> >> >> > kdb> call print_pvproc 0xdd600600 >> >> >> >> >> >> > pvp_flag=0x63727076 >> >> >> >> >> >> > pvp_wstate=0x1166a >> >> >> >> >> >> > pvp_pproc=0x00000007 >> >> >> >> >> >> > pvp_head_childl=0xdd60061c >> >> >> >> >> >> > pvp_childl=0x00000000 >> >> >> >> >> >> > pvp_head_pgrpl=0x00000000 >> >> >> >> >> >> > pvp_pgrpl=0x00000000 >> >> >> >> >> >> > pvp_sessionl=0x00083049 >> >> >> >> >> >> > pvp_head_oclist=0x00000005 >> >> >> >> >> >> > pvp_oclist=0xc8709020 >> >> >> >> >> >> > pvp_ppid=0 >> >> >> >> >> >> > pvp_oppid=0 >> >> >> >> >> >> > pvp_sid=0 >> >> >> >> >> >> > pvp_pgid=-489161216 >> >> >> >> >> >> > pvp_pp_sid=0 >> >> >> >> >> >> > pvp_pp_pgid=0 >> >> >> >> >> >> > pvp_fromnode=0 >> >> >> >> >> >> > pvp_tonode=71270 >> >> >> >> >> >> > pvp_cttynode=71271 >> >> >> >> >> >> > pvp_cttydev=0x1ca53 >> >> >> >> >> >> > pvp_jobc=71271 >> >> >> >> >> >> > pvp_pgrp_ldr_seqno=1 >> >> >> >> >> >> > more> >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input >> ignored >> >> >> >> >> >> > pvp_pgrp_mem_seqno=-580909344 >> >> >> >> >> >> > pvp_fork_sigmigarg=-580909332 >> >> >> >> >> >> > pvp.ml.ml_flag=1 >> >> >> >> >> >> > pvp.ml.ml_shr_count=-580909376 >> >> >> >> >> >> > pvp.ml.ml_excl_count=0 >> >> >> >> >> >> > pvp_loadlevel=-580909332 >> >> >> >> >> >> > pvp_pin=0 >> >> >> >> >> >> > pvp_localview=0 >> >> >> >> >> >> > Function print_pvproc returned 0x0 >> >> >> >> >> >> > kdb> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------- >> >> >> >> >> >> This SF.net email is sponsored by: Splunk Inc. Do you grep >> >> >> >> through log >> >> >> >> >> >> files >> >> >> >> >> >> for problems? Stop! Download the new AJAX search engine >> >> that >> >> >> >> makes >> >> >> >> >> >> searching your log files as easy as surfing the web. >> >> DOWNLOAD >> >> >> >> >> SPLUNK! >> >> >> >> >> >> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >> >> >> >> >> >> _______________________________________________ >> >> >> >> >> >> ssic-linux-devel mailing list >> >> >> >> >> >> ssi...@li... >> >> >> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel >> >> >> >> >> >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> > >> >> >> >> >> > >> >> > >> >> >> >> >> >> >> > >> > >> >> >> > > |
From: Vladimir R. <one...@gm...> - 2006-06-01 15:02:21
|
Laura, with thread_deadlock2.patch I still have two zombies and sleeping gdb, but it doesn't look like a deadlock anymore: ----------------------------------------------------------------------- 0xdf9c47b0 73119 73081 0 1 S 0xdf9c4970 bash 0xdfa87230 73165 73119 0 2 S 0xdfa873f0 gdb 0xf6561130 73204 73165 0 1 Z 0xf65612f0 ios_mon 0xdff44730 73208 1 0 3 Z 0xdff448f0 ios_mon ----------------------------------------------------------------------- Stack traceback for pid 73165 0xdfa87230 73165 73119 0 2 S 0xdfa873f0 gdb EBP EIP Function (args) 0xc3b87f88 0xc03dce4c schedule+0x3fc (0xc3b87fa4, 0x8274e00, 0x8, 0x10000, 0x0) 0xc3b87fb4 0xc0104590 sys_rt_sigsuspend+0xe0 0xc0106ae1 syscall_call+0x7 ----------------------------------------------------------------------- Stack traceback for pid 73204 0xf6561130 73204 73165 0 1 Z 0xf65612f0 ios_mon EBP EIP Function (args) 0xf502ff5c 0xc03dce4c schedule+0x3fc (0xf6561130, 0xf6561130, 0xdfee3130, 0xf502ff88, 0x0) 0xf502ff88 0xc0129682 do_exit+0x232 (0xf6561130, 0xdfc0ab84, 0xdfc0a680, 0x7f, 0x0) 0xf502ffa8 0xc0129879 do_group_exit+0x39 (0x7f00) 0xf502ffb4 0xc0129915 sys_exit_group+0x15 0xc0106ae1 syscall_call+0x7 ----------------------------------------------------------------------- Stack traceback for pid 73208 0xdff44730 73208 1 0 3 Z 0xdff448f0 ios_mon EBP EIP Function (args) 0xc3b19e5c 0xc03dce4c schedule+0x3fc (0xdff44730, 0x1, 0xc3b19e78, 0x8, 0x1) 0xc3b19e88 0xc0129682 do_exit+0x232 (0x0, 0x0, 0x0, 0x9, 0xc3b18000) 0xc3b19ea8 0xc0129879 do_group_exit+0x39 (0x9, 0x0, 0x0, 0xdfc0a680, 0xc3b18000) 0xc3b19ed4 0xc0134313 get_signal_to_deliver+0x2a3 (0xc3b19f10, 0xc3b19ef0, 0xc3b19fbc, 0x0, 0xf5dcf080) 0xc3b19f9c 0xc0106850 do_signal+0x70 (0x3b83f640, 0x805f2ec, 0x805f2ec, 0x407679b8) 0xc3b19fb4 0xc0106987 do_notify_resume+0x57 0xc0106b72 work_notifysig+0x13 ----------------------------------------------------------------------- call print_task_struct f6561130 state=0x20 flags=0x80200c ptrace=0x1f1 lock_depth=-1 prio=116 static_prio=120 array=00000000 sleep_avg=900000000 timestamp=4294907435000000 activated=0x0 policy=0 &cpus_allowed=0xf6561180 time_slice=11 first_time_slice=0 tasks.next 0xc044ec9c, tasks.prev 0xdfa8728c mm=00000000 active_mm=00000000 binfmt=c04d9d74 exit_code=32512 exit_signal=17 pdeath_signal=0 personality=0x400000 did_exec=1 pid=73204 epid=73204 ppid=73165 tgid=73204 cltnode=0 p_vproc=0xf5f07e00 p_vfparent=0x00000000 group_leader=0xf6561130 &pids=0xf65611d8 set_child_tid 0x00000000 clear_child_tid 0x00000000 rt_priority=0x0 utime=4 stime=4 nvcsw=34 nivcsw=0 sig_utime=0 sig_stime=0 cutime=0 cstime=0 sig_nvcsw=24 sig_nivcsw=0 cnvcsw=12 cnivcsw=0 start_time.tv_sec=219 start_time.tv_nsec=579684304 min_flt=798 maj_flt=0 sig_min_flt=20 sig_maj_flt=0 cmin_flt=4 cmaj_flt=0 uid=0 euid=0 suid=0 fsuid=0 gid=0 egid=0 sgid=0 fsgid=0 group_info=0xdfcf9a80 cap_effective=0xfffffeff cap_inheritable=0x0 cap_permitted=0xfffffeff keep_capabilities=0 user=0xc0453c00 &rlim=0xdfc3f2d4 comm=ios_mon locks=0 link_count=0 total_link_count=0 semvsem.undo_list=f4c96660 fs=0x00000000 files=0x00000000 namespace=0x00000000 signal=0xdfc3f200 sighand=0xdfc0a680 &blocked=0xf6561594 &real_blocked=0xf656159c &pending=0xf65615a4 sas_ss_sp=0x0 sas_ss_size=0x00000000 notifier_data=0x00000000 notifier_mask=0x00000000 security=0x00000000 audit_context=0x00000000 parent_exec_id=0x10 self_exec_id=0x12 journal_info=0x00000000 proc_dentry=0xf51e6170 backing_dev_info=0x00000000 io_context=0x00000000 ptrace_message=0x11df9 last_siginfo=0x00000000 p_nodetime=0 p_ticks_delta=0 icsprio=0x0 execnode=0x00000000 node_context=1 rcopy_task_size=0 &mosix=0xf6561664 Function print_task_struct returned 0x15 -------------------------------------------------------------------- call print_vproc f5f07e00 vp_magic=0x63727076 (should be 0x63727076) vp_pid=73204 vp_ref_cnt=17 vp_data=0xf5f07e24 vp_hashfwd=0xdffe0000 vp_hashbwd=0x00000000 Function print_vproc returned 0x19 -------------------------------------------------------------------- call print_pvproc f5f07e24 pvp_flag=0x81706b pvp_wstate=0x5 pvp_pproc=0xf6561130 pvp_head_childl=0x00000000 pvp_childl=0x00000000 pvp_head_pgrpl=0xdfdad000 pvp_pgrpl=0x00000000 pvp_sessionl=0xdffc4200 pvp_head_oclist=0x00000000 pvp_oclist=0x00000000 pvp_ppid=73165 pvp_oppid=0 pvp_sid=73119 pvp_pgid=73204 pvp_pp_sid=73119 pvp_pp_pgid=73165 pvp_fromnode=1 pvp_tonode=1 pvp_cttynode=1 pvp_cttydev=0x8800000 pvp_jobc=0 pvp_pgrp_ldr_seqno=0 pvp_pgrp_mem_seqno=0 pvp_fork_sigmigarg=0 pvp.ml.ml_flag=0 pvp.ml.ml_shr_count=0 pvp.ml.ml_excl_count=0 pvp_loadlevel=0 pvp_pin=0 pvp_localview=0 Function print_pvproc returned 0x13 -------------------------------------------------------------------- -------------------------------------------------------------------- call print_task_struct dff44730 state=0x20 flags=0x80244c ptrace=0x0 lock_depth=-1 prio=116 static_prio=120 array=00000000 sleep_avg=900000000 timestamp=4294907435000000 activated=0x0 policy=0 &cpus_allowed=0xdff44780 time_slice=23 first_time_slice=1 tasks.next 0xc044ec9c, tasks.prev 0xf656ce0c mm=00000000 active_mm=00000000 binfmt=c04d9d74 exit_code=32512 exit_signal=-1 pdeath_signal=0 personality=0x400000 did_exec=0 pid=73208 epid=73208 ppid=1 tgid=73204 cltnode=0 p_vproc=0xdfdad000 p_vfparent=0x00000000 group_leader=0xf6561130 &pids=0xdff447d8 set_child_tid 0x00000000 clear_child_tid 0x00000000 rt_priority=0x0 utime=0 stime=1 nvcsw=16 nivcsw=0 sig_utime=0 sig_stime=0 cutime=0 cstime=0 sig_nvcsw=24 sig_nivcsw=0 cnvcsw=12 cnivcsw=0 start_time.tv_sec=219 start_time.tv_nsec=857633304 min_flt=2 maj_flt=0 sig_min_flt=20 sig_maj_flt=0 cmin_flt=4 cmaj_flt=0 uid=0 euid=0 suid=0 fsuid=0 gid=0 egid=0 sgid=0 fsgid=0 group_info=0xdfcf9a80 cap_effective=0xfffffeff cap_inheritable=0x0 cap_permitted=0xfffffeff keep_capabilities=0 user=0xc0453c00 &rlim=0xdfc3f2d4 comm=ios_mon locks=0 link_count=0 total_link_count=0 semvsem.undo_list=f4c96660 fs=0x00000000 files=0x00000000 namespace=0x00000000 signal=0xdfc3f200 sighand=0xdfc0a680 &blocked=0xdff44b94 &real_blocked=0xdff44b9c &pending=0xdff44ba4 sas_ss_sp=0x0 sas_ss_size=0x00000000 notifier_data=0x00000000 notifier_mask=0x00000000 security=0x00000000 audit_context=0x00000000 parent_exec_id=0x12 self_exec_id=0x12 journal_info=0x00000000 proc_dentry=0xf643dc20 backing_dev_info=0x00000000 io_context=0x00000000 ptrace_message=0x11df7 last_siginfo=0x00000000 p_nodetime=0 p_ticks_delta=0 icsprio=0x0 execnode=0x00000000 node_context=1 rcopy_task_size=0 &mosix=0xdff44c64 Function print_task_struct returned 0x15 -------------------------------------------------------------------- call print_vproc dfdad000 vp_magic=0x63727076 (should be 0x63727076) vp_pid=73208 vp_ref_cnt=7 vp_data=0xdfdad024 vp_hashfwd=0xf5f00400 vp_hashbwd=0x00000000 Function print_vproc returned 0x19 -------------------------------------------------------------------- call print_pvproc dfdad024 pvp_flag=0x83069 pvp_wstate=0x5 pvp_pproc=0xdff44730 pvp_head_childl=0x00000000 pvp_childl=0xf5e65e00 pvp_head_pgrpl=0x00000000 pvp_pgrpl=0xf5f07e00 pvp_sessionl=0x00000000 pvp_head_oclist=0x00000000 pvp_oclist=0x00000000 pvp_ppid=1 pvp_oppid=0 pvp_sid=73119 pvp_pgid=73204 pvp_pp_sid=73119 pvp_pp_pgid=73204 pvp_fromnode=1 pvp_tonode=1 pvp_cttynode=1 pvp_cttydev=0x8800000 pvp_jobc=0 pvp_pgrp_ldr_seqno=0 pvp_pgrp_mem_seqno=0 pvp_fork_sigmigarg=0 pvp.ml.ml_flag=0 pvp.ml.ml_shr_count=0 pvp.ml.ml_excl_count=0 pvp_loadlevel=0 pvp_pin=0 pvp_localview=0 Function print_pvproc returned 0x13 -------------------------------------------------------------------- On 5/31/06, Laura Ramirez <lau...@hp...> wrote: > Hi Vladimir > > Please remove "thread_deadlock.patch" and apply the attached > patch instead. > > thanks, > > laura > > Vladimir Razgulin wrote: > > Laura, > > > > with thread_deadlock.patch I still get deadlock between the threads > > and a frozen gdb - see enclosed ionfo. > > > > Thanks > > Vladimir > > > > 0xf6526830 73134 72318 0 0 S 0xf65269f0 sshd > > 0xf6527330 73149 73134 0 0 S 0xf65274f0 bash > > 0xdfc060b0 73213 73149 0 2 S 0xdfc06270 gdb > > 0xf645a3b0 73246 73213 0 1 D 0xf645a570 ios_mon > > 0xdf990630 73250 73213 0 1 D 0xdf9907f0 ios_mon > > ------------------------------------------------------------------------- > > btp 73246 > > > > Stack traceback for pid 73246 > > 0xf645a3b0 73246 73213 0 1 D 0xf645a570 ios_mon > > EBP EIP Function (args) > > 0xdf995e10 0xc03dce4c schedule+0x3fc (0xf645a3b0, 0x1, 0xf645a3b0, > > 0xc0121dd0, 0xdffea698) > > 0xdf995e48 0xc03dc835 __down+0x75 (0xdffea600, 0xdffea600) > > 0xdf995e58 0xc03dc9e6 __down_failed+0xa > > 0xc0227b9b .text.lock.dvp_pvpops+0xad > > 0xdf995efc 0xc02214d8 pvpop_reassign_child+0x178 (0xdffea600, 0x10, > > 0x0, 0xdf994000, 0xdf994000) > > 0xdf995f28 0xc02263f2 pvpop_reassign_original_parent+0x132 (0xdffeaa00, > > 0x11e22) > > 0xdf995f38 0xc022d308 vpop_reassign_original_parent+0x18 (0xdffeaa00, > > 0x11e22, 0xf65cf980, 0xdf995f4c, 0xdf995f4c) > > 0xdf995f5c 0xc0129272 exit_notify+0xc2 (0xf645a3b0, 0xf645a3b0, > > 0xdf990bb0, 0xdf995f88, 0x0) > > 0xdf995f88 0xc0129638 do_exit+0x1e8 (0xf645a3b0, 0xc3a80684, > > 0xc3a80180, 0x7f, 0x0) > > 0xdf995fa8 0xc0129879 do_group_exit+0x39 (0x7f00) > > 0xdf995fb4 0xc0129915 sys_exit_group+0x15 > > 0xc0106ae1 syscall_call+0x7 > > ------------------------------------------------------------------------- > > btp 73250 > > > > Stack traceback for pid 73250 > > 0xdf990630 73250 73213 0 1 D 0xdf9907f0 ios_mon > > EBP EIP Function (args) > > 0xf4fc5d54 0xc03dce4c schedule+0x3fc (0xdf990630, 0x1, 0xdf990630, > > 0xc0121dd0, 0xdffeaa98) > > 0xf4fc5d8c 0xc03dc835 __down+0x75 (0x79, 0xdffeaafc) > > 0xf4fc5d9c 0xc03dc9e6 __down_failed+0xa > > 0xc0227eac .text.lock.dvp_pvpops+0x3be > > 0xf4fc5dfc 0xc0225fe7 pvpop_add_to_originalchild_list+0x1e7 > > (0xdffeaa00, 0xdffeae00, 0x0, 0x0, 0xf4fc4000) > > 0xf4fc5e28 0xc02263e1 pvpop_reassign_original_parent+0x121 (0xdffea600, > > 0x11e1e) > > 0xf4fc5e38 0xc022d308 vpop_reassign_original_parent+0x18 (0xdffea600, > > 0x11e1e, 0xf5067200, 0xf4fc5e4c, 0xf4fc5e4c) > > 0xf4fc5e5c 0xc0129272 exit_notify+0xc2 (0xdf990630, 0xf65ce840, > > 0xf4fc5e78, 0x8, 0x1) > > 0xf4fc5e88 0xc0129638 do_exit+0x1e8 (0x0, 0x0, 0x0, 0x9, 0xf4fc4000) > > 0xf4fc5ea8 0xc0129879 do_group_exit+0x39 (0x9, 0x0, 0x0, 0xc3a80180, > > 0xf4fc4000) > > 0xf4fc5ed4 0xc0134313 get_signal_to_deliver+0x2a3 (0xf4fc5f10, > > 0xf4fc5ef0, 0xf4fc5fbc, 0x0, 0xf65cf980) > > 0xf4fc5f9c 0xc0106850 do_signal+0x70 (0x3b970910, 0x805f2ec, > > 0x805f2ec, 0x407679b8) > > 0xf4fc5fb4 0xc0106987 do_notify_resume+0x57 > > 0xc0106b72 work_notifysig+0x13 > > ------------------------------------------------------------------------- > > md 0xdffeaa00 > > > > 0xdffeaa00 63727076 00011e1e 00000012 dffeaa24 vprc........$... > > 0xdffeaa10 00000001 dead4ead 00000000 00000000 .....N.......... > > 0xdffeaa20 00000000 00817063 00000001 f645a3b0 ....cp........E. > > 0xdffeaa30 00000000 00000000 dffea600 00000000 ................ > > 0xdffeaa40 c3a4d200 00000000 00000000 00011dfd ................ > > 0xdffeaa50 00000000 00011dbd 00011e1e 00011dbd ................ > > 0xdffeaa60 00011dfd 00000001 00000001 00000001 ................ > > 0xdffeaa70 08800001 00000001 00000001 dead4ead .............N.. > > ------------------------------------------------------------------------- > > md 0xdffea600 > > > > 0xdffea600 63727076 00011e22 00000009 dffea624 vprc".......$... > > 0xdffea610 00000001 dead4ead 00000000 00000000 .....N.......... > > 0xdffea620 00000000 00083061 00000001 df990630 ....a0......0... > > 0xdffea630 00000000 dffeaa00 00000000 dffeaa00 ................ > > 0xdffea640 00000000 00000000 00000000 00011dfd ................ > > 0xdffea650 00000001 00011dbd 00011e1e 00011dbd ................ > > 0xdffea660 00011e1e 00000001 00000001 00000001 ................ > > 0xdffea670 08800001 00000000 00000001 dead4ead .............N.. > > ------------------------------------------------------------------------- > > call print_vproc 0xdffeaa00 > > > > vp_magic=0x63727076 (should be 0x63727076) > > vp_pid=73246 > > vp_ref_cnt=18 > > vp_data=0xdffeaa24 > > vp_hashfwd=0x00000000 > > vp_hashbwd=0x00000000 > > Function print_vproc returned 0x19 > > ------------------------------------------------------------------------- > > call print_pvproc 0xdffeaa24 > > > > pvp_flag=0x817063 > > pvp_wstate=0x1 > > pvp_pproc=0xf645a3b0 > > pvp_head_childl=0x00000000 > > pvp_childl=0x00000000 > > pvp_head_pgrpl=0xdffea600 > > pvp_pgrpl=0x00000000 > > pvp_sessionl=0xc3a4d200 > > pvp_head_oclist=0x00000000 > > pvp_oclist=0x00000000 > > pvp_ppid=73213 > > pvp_oppid=0 > > pvp_sid=73149 > > pvp_pgid=73246 > > pvp_pp_sid=73149 > > pvp_pp_pgid=73213 > > pvp_fromnode=1 > > pvp_tonode=1 > > pvp_cttynode=1 > > pvp_cttydev=0x8800001 > > pvp_jobc=1 > > pvp_pgrp_ldr_seqno=0 > > pvp_pgrp_mem_seqno=0 > > pvp_fork_sigmigarg=0 > > pvp.ml.ml_flag=2 > > pvp.ml.ml_shr_count=1 > > pvp.ml.ml_excl_count=0 > > pvp_loadlevel=0 > > pvp_pin=0 > > pvp_localview=0 > > Function print_pvproc returned 0x13 > > ------------------------------------------------------------------------- > > call print_vproc 0xdffea600 > > > > vp_magic=0x63727076 (should be 0x63727076) > > vp_pid=73250 > > vp_ref_cnt=9 > > vp_data=0xdffea624 > > vp_hashfwd=0x00000000 > > vp_hashbwd=0x00000000 > > Function print_vproc returned 0x19 > > ------------------------------------------------------------------------- > > call print_pvproc 0xdffea624 > > > > pvp_flag=0x83061 > > pvp_wstate=0x1 > > pvp_pproc=0xdf990630 > > pvp_head_childl=0x00000000 > > pvp_childl=0xdffeaa00 > > pvp_head_pgrpl=0x00000000 > > pvp_pgrpl=0xdffeaa00 > > pvp_sessionl=0x00000000 > > pvp_head_oclist=0x00000000 > > pvp_oclist=0x00000000 > > pvp_ppid=73213 > > pvp_oppid=1 > > pvp_sid=73149 > > pvp_pgid=73246 > > pvp_pp_sid=73149 > > pvp_pp_pgid=73246 > > pvp_fromnode=1 > > pvp_tonode=1 > > pvp_cttynode=1 > > pvp_cttydev=0x8800001 > > pvp_jobc=0 > > pvp_pgrp_ldr_seqno=0 > > pvp_pgrp_mem_seqno=0 > > pvp_fork_sigmigarg=0 > > pvp.ml.ml_flag=2 > > pvp.ml.ml_shr_count=1 > > pvp.ml.ml_excl_count=0 > > pvp_loadlevel=0 > > pvp_pin=0 > > pvp_localview=0 > > Function print_pvproc returned 0x13 > > ------------------------------------------------------------------------- > > > > > > > > > > > > On 5/30/06, Laura Ramirez <lau...@hp...> wrote: > >> > >> HI Vladmimir, > >> > >> Good to hear that the second patch fixed the original problem. > >> > >> From looking at the stack traces, it does appear to be a > >> deadlock. Pid 73961 is reaping pid 73980. It wants to > >> reassign it to its original parent 73976. It currently has > >> the vproc lock for 73980 and wants the vproc lock for 73976. > >> However, 73976 is currently exiting, and its trying to reassign > >> its children to next thread (ie. 73980). So it currently has > >> its own 73976 lock and wants the vproc lock for 73980, and so > >> it's now deadlocked. Besides the deadlock bug, maybe the real > >> problem is 73976 reassigning the children to 73980, which has > >> already run through its exit code and will never clean up the > >> list again. > >> > >> Attached is a patch that checks the state of the thread, if > >> its zombie, then it reassigns it to the init process. > >> > >> laura > >> > >> > >> Vladimir Razgulin wrote: > >> > Laura, > >> > > >> > Thank you for ptrace2_reap.patch > >> > I've tried it, and it fixed the original problem. > >> > However, now I have something looking like a racing condition. > >> > > >> > Again, I'm debugging a multithreaded program: > >> > > >> > gdb (pid=73961) has started ios_mon (pid=73976), which started three > >> other > >> > threads (pids=73979, 73980, 73981). > >> > The thread 73979 was terminated successfully (thanks to > >> ptrace2_reap.patch) > >> > and I've got the following: > >> > > >> > ----------------------------------------------------------------- > >> > 0xf5778bb0 73927 73921 0 0 S 0xf5778d70 bash > >> > 0xdfd746b0 73961 73927 0 3 D 0xdfd74870 gdb > >> > 0xdf990630 73976 73961 0 1 D 0xdf9907f0 ios_mon > >> > 0xdf991130 73980 73961 0 2 E 0xdf9912f0 ios_mon > >> > 0xdfeaf330 73981 73961 0 0 Z 0xdfeaf4f0 ios_mon > >> > 0xf576c030 74134 72761 0 1 D 0xf576c1f0 bash > >> > ----------------------------------------------------------------- > >> > > >> > gdb (73961) wants to do VPROC_LOCK_EXCL for vproc=0xc3a35800, > >> > pvproc=0xc3a35824, > >> > which is owned by ios_mon (73976) > >> > > >> > ---------------------------------------------------------------- > >> > Stack traceback for pid 73961 > >> > 0xdfd746b0 73961 73927 0 3 D 0xdfd74870 gdb > >> > EBP EIP Function (args) > >> > 0xf3e59b08 0xc03dce0c schedule+0x3fc (0xdfd746b0, 0x1, 0xdfd746b0, > >> > 0xc0121dd0, 0xf3cafd04) > >> > 0xf3e59b40 0xc03dc7f5 __down+0x75 (0x0, 0x79) > >> > 0xf3e59b50 0xc03dc9a6 __down_failed+0xa > >> > 0xc0227b04 .text.lock.dvp_pvpops+0x56 > >> > 0xf3e59bbc 0xc0221007 pvpop_add_child_to_parent+0x247 (0xc3a35800, > >> > 0xc3a35a00, 0x5, 0x0, 0x0) > >> > 0xf3e59c3c 0xc0221cc0 pvpop_reclaim_child+0x360 (0xc3a35a00, 0x120f8, > >> > 0x0, 0x0, 0x0) > >> > 0xf3e59c64 0xc022c3a2 vpop_reclaim_child+0x52 (0xc3a35a00, 0x120f8, > >> > 0x0, 0x0, 0x0) > >> > 0xf3e59c94 0xc012f835 __ptrace_unlink+0x75 (0xdf991130, 0x120f8, > >> > 0xf3e59cc4, 0x0, 0x0) > >> > 0xf3e59cd4 0xc0129b68 wait_task_zombie+0x1c8 (0xdf991130, 0x0, 0x0, > >> > 0xf3e59e8c, 0xf3e59ed4) > >> > 0xf3e59d20 0xc012a194 pproc_reap+0x324 (0xdf991130, 0x0, 0xf3e59e8c, > >> > 0xf3e59ed4, 0x120e9) > >> > 0xf3e59e30 0xc0220a8d pvpop_reap+0x26d (0xc3a35a00, 0xffffffff, 0x20, > >> > 0x120e9, 0xf3e59e8c) > >> > 0xf3e59eb0 0xc022b8db vpop_wait+0x69b (0xf6431400, 0xffffffff, 0xa1, > >> > 0xbfeec4ec, 0xf3e59ed4) > >> > 0xf3e59f80 0xc0129cec do_wait+0xbc (0xffffffff, 0x80000005, 0x0, > >> > 0xbfeec4ec, 0x0) > >> > 0xf3e59f9c 0xc0129dcc sys_wait4+0x3c (0xffffffff, 0xbfeec4ec, > >> > 0x80000001, 0x0) > >> > 0xf3e59fb4 0xc0129df5 sys_waitpid+0x25 > >> > 0xc0106ae1 syscall_call+0x7 > >> > ----------------------------------------------------------------------- > >> > md c3a35824 > >> > > >> > 0xc3a35824 00817063 00000001 df990630 00000000 cp......0....... > >> > 0xc3a35834 00000000 c3a4a600 00000000 f6431400 ..............C. > >> > 0xc3a35844 c3a35a00 00000000 000120e9 00000000 .Z....... ...... > >> > 0xc3a35854 000120c7 000120f8 000120c7 000120e9 . ... ... ... .. > >> > 0xc3a35864 00000001 00000001 00000001 08800001 ................ > >> > 0xc3a35874 00000001 00000001 dead4ead 00000000 .........N...... > >> > 0xc3a35884 ffffffff 00000001 df990630 00000001 ........0....... > >> > ^^^^^^^^ > >> > 0xc3a35894 dead4ead f3e59b20 f42b5d04 df990630 .N.. ....]+.0... > >> > ----------------------------------------------------------------------- > >> > > >> > At the same time ios_mon (73976) tries to lock vproc=0xc3a35a00, > >> > pvproc=0xc3a35a24, > >> > which is already locked by gdb (73961) > >> > > >> > ----------------------------------------------------------------------- > >> > > >> > Stack traceback for pid 73976 > >> > 0xdf990630 73976 73961 0 1 D 0xdf9907f0 ios_mon > >> > EBP EIP Function (args) > >> > 0xf3d89e58 0xc03dce0c schedule+0x3fc (0xdf990630, 0x1, 0xdf990630, > >> > 0xc0121dd0, 0xc3a35a98) > >> > 0xf3d89e90 0xc03dc7f5 __down+0x75 (0x79, 0xc3a35afc) > >> > 0xf3d89ea0 0xc03dc9a6 __down_failed+0xa > >> > 0xc0227e6c .text.lock.dvp_pvpops+0x3be > >> > 0xf3d89f00 0xc0225fe7 pvpop_add_to_originalchild_list+0x1e7 > >> > (0xc3a35a00, 0xc3a4a600, 0x0, 0xf3d89f40, 0xf55b4800) > >> > 0xf3d89f28 0xc02263b1 pvpop_reassign_original_parent+0xf1 (0xc3a35800, > >> > 0x120fc) > >> > 0xf3d89f38 0xc022d2c8 vpop_reassign_original_parent+0x18 (0xc3a35800, > >> > 0x120fc, 0xf5e96380, 0xf3d89f4c, 0xf3d89f4c) > >> > 0xf3d89f5c 0xc0129272 exit_notify+0xc2 (0xdf990630, 0xdf990630, > >> > 0xdfeaf330, 0xf3d89f88, 0x0) > >> > 0xf3d89f88 0xc0129638 do_exit+0x1e8 (0xdf990630, 0xf5e28584, > >> > 0xf5e28080, 0x7f, 0x0) > >> > 0xf3d89fa8 0xc0129879 do_group_exit+0x39 (0x7f00) > >> > 0xf3d89fb4 0xc0129915 sys_exit_group+0x15 > >> > 0xc0106ae1 syscall_call+0x7 > >> > [1]kdb> > >> > ----------------------------------------------------------------------- > >> > > >> > md c3a35a24 > >> > > >> > 0xc3a35a24 00083069 00000005 df991130 00000000 i0......0....... > >> > 0xc3a35a34 00000000 00000000 c3a35800 00000000 .........X...... > >> > 0xc3a35a44 00000000 00000000 000120e9 000120f8 ......... ... .. > >> > 0xc3a35a54 000120c7 000120f8 000120c7 000120f8 . ... ... ... .. > >> > 0xc3a35a64 00000001 00000001 00000001 08800001 ................ > >> > 0xc3a35a74 00000000 00000001 dead4ead 00000000 .........N...... > >> > 0xc3a35a84 ffffffff 00000001 dfd746b0 00000001 .........F...... > >> > ^^^^^^^^ > >> > 0xc3a35a94 dead4ead f3d89e70 f3d89e70 dfd746b0 .N..p...p....F.. > >> > ---------------------------------------------------------------------- > >> > > >> > and 0xc3a4a600 here is vproc for pid=73981 > >> > > >> > I can provide more information about the issue if it's necesary. > >> > Do you think that work should be serialized throught nocldwait queue? > >> > > >> > Thanks, > >> > Vladimir > >> > > >> > > >> > On 5/26/06, Laura Ramirez <lau...@hp...> wrote: > >> >> > >> >> Hi Vladimir, > >> >> > >> >> Looking at the code more closely, I believe the "ptrace_reap.patch" > >> >> I gave you is NOT correct. Please remove that patch. > >> >> > >> >> Attached is a different patch "ptrace_reap2.patch". In the original > >> >> code, the parent who is the gdb process is not getting a SIGCHLD, > >> >> instead the process just gets queued to the nocldwait. I believe, the > >> >> correct path should be the gdb parent gets the SIGCHLD, goes to do the > >> >> reap, sees its ptraced and does the ptrace_unlink(). Assuming the > >> >> ptrace_unlink() works, it should return -EAGAIN, do the > >> >> PVPOP_REPORT_STATE() to its original parent, sees its original > >> parent is > >> >> not reaping children and gets queued to the nocldwait queue, which > >> >> should do reap successfully this time and not return -EAGAIN. With > >> the > >> >> attached patch, I'm hoping that's going to happen. > >> >> > >> >> Apply with -p0, let me know if it works or where it deviated from the > >> >> code path i stated above. > >> >> > >> >> thanks, > >> >> > >> >> laura > >> >> > >> >> > >> >> > >> >> Vladimir Razgulin wrote: > >> >> > Laura, > >> >> > following your advice I got a snapshot of the moment the process > >> >> > becomes a zombie and a work item is added into > >> nocldwait_async_queue: > >> >> > > >> >> > Please, note, that > >> >> > the process id is 74226, it's a thread process created by 74223 > >> >> under gdb, > >> >> > and its ppid and oppid are different - see attached info > >> >> > > >> >> > Thank you for your help > >> >> > Vladimir > >> >> > > >> >> > Do you think I can CC: our conversation to ssic-linux-devel? > >> >> > > >> >> > > >> >> > > >> >> > >> ------------------------------------------------------------------------------------------------------------------ > >> > >> >> > >> >> > > >> >> > 0xdfd019b0 73995 72325 0 1 S 0xdfd01b70 sshd > >> >> > 0xf674b130 74000 73995 0 2 S 0xf674b2f0 bash > >> >> > 0xf5f078b0 74210 73171 0 2 S 0xf5f07a70 gdb > >> >> > 0xc3bbb930 74223 74210 0 3 S 0xc3bbbaf0 ios_mon > >> >> > 0xdfeff230 74226 74210 1 3 R 0xdfeff3f0 *ios_mon > >> >> > 0xdfeff7b0 74227 74210 0 0 S 0xdfeff970 ios_mon > >> >> > 0xf5dd9830 74229 74210 0 1 S 0xf5dd99f0 ios_mon > >> >> > > >> >> > >> ------------------------------------------------------------------------------------------------------------------ > >> > >> >> > >> >> > > >> >> > Stack traceback for pid 74226 > >> >> > 0xdfeff230 74226 74210 1 3 R 0xdfeff3f0 *ios_mon > >> >> > EBP EIP Function (args) > >> >> > 0xec599ef4 0xc0220530 dpvproc_nocldwait_async_queue (0xf6bf1a00, > >> >> > 0xf6be1800, 0xc040e203, 0xf6ec5efc, 0xec599e84) > >> >> > 0xc0224dd3 pvpop_report_state+0x463 (0xf6bf1a00, > >> >> > 0xf6be1800, 0xec599f1c, 0xec599f20, 0x1) > >> >> > 0xec599f5c 0xc022b3a4 vpop_cleanup_vproc_relations+0x2a4 > >> (0xf6be1800, > >> >> > 0xdfeff230, 0xf5e42980, 0xec599f70, 0xec599f70) > >> >> > 0xec599f80 0xc01292f7 exit_notify+0x137 (0xdfeff230, 0xc04048da, > >> 0x4, > >> >> > 0xec599fbc, 0x0) > >> >> > 0xec599fac 0xc0129648 do_exit+0x1e8 > >> >> > 0xec599fb4 0xc012983f sys_exit+0xf > >> >> > 0xc0106ae1 syscall_call+0x7 > >> >> > ---------------------------------------------------------------- > >> >> > call print_task_struct 0xdfeff230 > >> >> > > >> >> > state=0x0 > >> >> > flags=0x802044 > >> >> > ptrace=0x1f1 > >> >> > lock_depth=-1 > >> >> > prio=116 > >> >> > static_prio=120 > >> >> > array=c383055c > >> >> > sleep_avg=933888889 > >> >> > timestamp=4295805885000000 > >> >> > activated=0x0 > >> >> > policy=0 > >> >> > &cpus_allowed=0xdfeff280 > >> >> > time_slice=17 > >> >> > first_time_slice=1 > >> >> > tasks.next 0xc044fc9c, tasks.prev 0xf5f0790c > >> >> > mm=00000000 > >> >> > active_mm=f5e42980 > >> >> > binfmt=c04dad74 > >> >> > exit_code=0 > >> >> > exit_signal=-1 > >> >> > pdeath_signal=0 > >> >> > personality=0x400000 > >> >> > did_exec=0 > >> >> > pid=74226 > >> >> > epid=74226 > >> >> > ppid=74210 > >> >> > tgid=74223 > >> >> > cltnode=0 > >> >> > p_vproc=0xf6be1800 > >> >> > p_vfparent=0x00000000 > >> >> > group_leader=0xc3bbb930 > >> >> > &pids=0xdfeff2d8 > >> >> > set_child_tid 0x00000000 > >> >> > clear_child_tid 0x00000000 > >> >> > rt_priority=0x0 > >> >> > utime=0 > >> >> > stime=0 > >> >> > nvcsw=11 > >> >> > nivcsw=0 > >> >> > sig_utime=0 > >> >> > sig_stime=0 > >> >> > cutime=0 > >> >> > cstime=0 > >> >> > sig_nvcsw=0 > >> >> > sig_nivcsw=0 > >> >> > cnvcsw=0 > >> >> > cnivcsw=0 > >> >> > start_time.tv_sec=1138 > >> >> > start_time.tv_nsec=381796336 > >> >> > min_flt=4 > >> >> > maj_flt=0 > >> >> > sig_min_flt=0 > >> >> > sig_maj_flt=0 > >> >> > cmin_flt=0 > >> >> > cmaj_flt=0 > >> >> > uid=0 > >> >> > euid=0 > >> >> > suid=0 > >> >> > fsuid=0 > >> >> > gid=0 > >> >> > egid=0 > >> >> > sgid=0 > >> >> > fsgid=0 > >> >> > group_info=0xf502c080 > >> >> > cap_effective=0xfffffeff > >> >> > cap_inheritable=0x0 > >> >> > cap_permitted=0xfffffeff > >> >> > keep_capabilities=0 > >> >> > user=0xc0454c00 > >> >> > &rlim=0xdffdd5d4 > >> >> > comm=ios_mon > >> >> > locks=0 > >> >> > link_count=0 > >> >> > total_link_count=0 > >> >> > semvsem.undo_list=c3b4aa20 > >> >> > fs=0x00000000 > >> >> > files=0x00000000 > >> >> > namespace=0x00000000 > >> >> > signal=0xdffdd500 > >> >> > sighand=0xf5e29900 > >> >> > &blocked=0xdfeff694 > >> >> > &real_blocked=0xdfeff69c > >> >> > &pending=0xdfeff6a4 > >> >> > sas_ss_sp=0x0 > >> >> > sas_ss_size=0x00000000 > >> >> > notifier_data=0x00000000 > >> >> > notifier_mask=0x00000000 > >> >> > security=0x00000000 > >> >> > audit_context=0x00000000 > >> >> > parent_exec_id=0x12 > >> >> > self_exec_id=0x12 > >> >> > journal_info=0x00000000 > >> >> > proc_dentry=0xecf027a0 > >> >> > backing_dev_info=0x00000000 > >> >> > io_context=0x00000000 > >> >> > ptrace_message=0x0 > >> >> > last_siginfo=0x00000000 > >> >> > p_nodetime=0 > >> >> > p_ticks_delta=0 > >> >> > icsprio=0x0 > >> >> > execnode=0x00000000 > >> >> > node_context=1 > >> >> > rcopy_task_size=0 > >> >> > &mosix=0xdfeff764 > >> >> > ---------------------------------------------------------- > >> >> > call print_vproc f6be1800 > >> >> > > >> >> > vp_magic=0x63727076 (should be 0x63727076) > >> >> > vp_pid=74226 > >> >> > vp_ref_cnt=8 > >> >> > vp_data=0xf6be1824 > >> >> > vp_hashfwd=0xc3a60200 > >> >> > vp_hashbwd=0x00000000 > >> >> > Function print_vproc returned 0x19 > >> >> > ---------------------------------------------------------------- > >> >> > call print_pvproc f6be1824 > >> >> > > >> >> > pvp_flag=0x83069 > >> >> > pvp_wstate=0x5 > >> >> > pvp_pproc=0xdfeff230 > >> >> > pvp_head_childl=0x00000000 > >> >> > pvp_childl=0xf6ec5e00 > >> >> > pvp_head_pgrpl=0x00000000 > >> >> > pvp_pgrpl=0xf6ec5e00 > >> >> > pvp_sessionl=0x00000000 > >> >> > pvp_head_oclist=0x00000000 > >> >> > pvp_oclist=0x00000000 > >> >> > pvp_ppid=74210 > >> >> > pvp_oppid=74223 > >> >> > pvp_sid=73171 > >> >> > pvp_pgid=74223 > >> >> > pvp_pp_sid=73171 > >> >> > pvp_pp_pgid=74223 > >> >> > pvp_fromnode=1 > >> >> > pvp_tonode=1 > >> >> > pvp_cttynode=1 > >> >> > pvp_cttydev=0x8800000 > >> >> > pvp_jobc=0 > >> >> > pvp_pgrp_ldr_seqno=0 > >> >> > pvp_pgrp_mem_seqno=0 > >> >> > pvp_fork_sigmigarg=0 > >> >> > pvp.ml.ml_flag=0 > >> >> > pvp.ml.ml_shr_count=0 > >> >> > pvp.ml.ml_excl_count=0 > >> >> > pvp_loadlevel=0 > >> >> > pvp_pin=0 > >> >> > pvp_localview=0 > >> >> > Function print_pvproc returned 0x13 > >> >> > ---------------------------------------------------------------- > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On 5/25/06, Laura Ramirez <lau...@hp...> wrote: > >> >> >> > >> >> >> Hi Vladimir, > >> >> >> > >> >> >> maybe we need to take a step back, I was focused on why the > >> >> >> ptrace_unlink() call was failing, which was because the child > >> >> >> had already gotten removed from the parent list. Which is why > >> >> >> I created the following patch, which only removed the child from > >> >> >> the parent list if the child wasnt being ptraced (ie pvp_oppid > >> == 0). > >> >> >> So the ! is correct for that logic. > >> >> >> > >> >> >> But maybe we need to look at why this process is being reaped by > >> >> >> the nocld_wait_daemon, instead of the gdb parent. > >> >> >> > >> >> >> Can you do a " call print_task_struct < address>" of the zombie > >> >> >> processes? > >> >> >> > >> >> >> thanks > >> >> >> > >> >> >> laura > >> >> >> > >> >> >> Vladimir Razgulin wrote: > >> >> >> > Laura, > >> >> >> > The patch you sent doesn't resolve the issue: sometimes > >> everything > >> >> >> > works, sometimes gdb hangs with 2 zombies, sometimes the whole > >> >> system > >> >> >> > dies. > >> >> >> > I took a look at the code in the patch:: > >> >> >> > > >> >> >> > + error = 0; > >> >> >> > + if (!PVP(vc)->pvp_oppid) { > >> >> >> > + error = PVPOP_RMV_CHILD_FROM_PARENT(vp, vc, FALSE); > >> >> >> > + ptrace = 1; > >> >> >> > + } > >> >> >> > > >> >> >> > Should it be without "!", liike > >> >> >> > + if ( PVP(vc)->pvp_oppid) { > >> >> >> > > >> >> >> > Thanks > >> >> >> > Vladimir > >> >> >> > > >> >> >> > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: > >> >> >> >> > >> >> >> >> I believe the __ptrace_unlink is failing because the child > >> >> >> >> gets removed from the parent in the beginning of > >> >> >> >> dpvproc_nocldwait_async_handler(), which is the right thing > >> >> >> >> to do if the process isn't being ptraced. > >> >> >> >> > >> >> >> >> Attach is a patch that addresses this problem. Can you please > >> >> >> >> apply and test. (use -p0 to apply) > >> >> >> >> > >> >> >> >> laura > >> >> >> >> > >> >> >> >> > >> >> >> >> Vladimir Razgulin wrote: > >> >> >> >> > Laura, > >> >> >> >> > > >> >> >> >> > That's right, > >> >> >> >> > __ptrace_unlink(p) fails every time inside > >> >> >> >> > pvpop_rmv_child_from_parent() with -ESRCH > >> >> >> >> > because it can't find process p in the PVP(w)->pvp_childl > >> list. > >> >> >> >> > The process p seems to be unlinked alredy - I saw that tracing > >> >> >> >> > pvp_childl with kdb while stopping inside > >> >> >> >> pvpop_rmv_child_from_parent(). > >> >> >> >> > > >> >> >> >> > Vladimir > >> >> >> >> > > >> >> >> >> > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: > >> >> >> >> >> Hi Vladimir, > >> >> >> >> >> > >> >> >> >> >> The reason why wait_task_zombie() returns EAGAIN is because > >> >> >> >> >> its current parent is the debugger gdb instead of its > >> original > >> >> >> >> >> parent. The original parent needs to reap its child, so a > >> >> >> >> >> __ptrace_unlink() is done to stop tracing the process and set > >> >> >> >> >> the parent back to it original parent, returns EAGAIN so the > >> >> >> >> >> reap can be done from the original parent. So > >> __ptrace_unlink() > >> >> >> >> >> must have failed since otherwise the second time in it > >> wouldn't > >> >> >> >> >> go into that code path. So if you can find out why the > >> >> >> >> >> __ptrace_unlink() is failing, that would be a good start. > >> >> >> >> >> > >> >> >> >> >> laura > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> Vladimir Razgulin wrote: > >> >> >> >> >> > Laura, Roger, > >> >> >> >> >> > > >> >> >> >> >> > I've got the same problem trying to debug a multithreaded > >> >> program > >> >> >> >> >> with gdb: > >> >> >> >> >> > BUG_ON with exit_signal == -1 in wait_task_zombie(). > >> >> >> >> >> > > >> >> >> >> >> > I applied a patch for kernel/kernel/exit.c (rev. 1.14 8 Dec > >> >> 2005) > >> >> >> >> - Is > >> >> >> >> >> > it a correct one? - > >> >> >> >> >> > and now I have dpvproc_nocldwait_async_handler() in an > >> >> infinite > >> >> >> >> loop, > >> >> >> >> >> > calling pvpop_reap() and receiving -EAGAIN as an error > >> code. > >> >> >> >> >> > > >> >> >> >> >> > Should wait_task_zombie() return (sometimes ;) ) p->pid > >> >> istead of > >> >> >> >> >> -EAGAIN? > >> >> >> >> >> > Thanks > >> >> >> >> >> > Vladimir > >> >> >> >> >> > > >> >> >> >> >> > I can provide some info about the processes... > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > On 12/6/05, Laura Ramirez <lau...@hp...> wrote: > >> >> >> >> >> >> > >> >> >> >> >> >> Hi Roger, > >> >> >> >> >> >> > >> >> >> >> >> >> A quick look at the code, there seems to be a comment > >> >> >> >> >> >> about the ptrace vproc path, may need to be reworked > >> for 2.6 > >> >> >> merge. > >> >> >> >> >> >> I dont quite remember what the issue was, it obviously > >> >> hitting > >> >> >> >> >> >> BUG_ON with exit_signal == -1. Can you print the vproc and > >> >> the > >> >> >> >> >> >> pvproc? Below, you printed the pvproc using the vproc ptr > >> >> which > >> >> >> >> >> >> made it look corrupt, but it really isnt, it was just the > >> >> wrong > >> >> >> >> print > >> >> >> >> >> >> call. > >> >> >> >> >> >> > >> >> >> >> >> >> laura > >> >> >> >> >> >> > >> >> >> >> >> >> Roger Tsang wrote: > >> >> >> >> >> >> > Laura, > >> >> >> >> >> >> > > >> >> >> >> >> >> > I got an oops exiting from gdb while attached to > >> >> check_bacula > >> >> >> >> >> which had > >> >> >> >> >> >> > segfaulted. I eventually fixed the bug in check_bacula, > >> >> but > >> >> >> >> take a > >> >> >> >> >> >> look at > >> >> >> >> >> >> > the oops below. child_reaper was waiting for > >> check_bacula > >> >> >> >> which is > >> >> >> >> >> >> in E > >> >> >> >> >> >> > state. It looks like pvproc got corrupted. > >> >> >> >> >> >> > > >> >> >> >> >> >> > I'll leave this in kdb until tomorrow just in case I > >> >> left out > >> >> >> >> >> >> something. > >> >> >> >> >> >> > > >> >> >> >> >> >> > Roger > >> >> >> >> >> >> > > >> >> >> >> >> >> > > >> >> >> >> >> >> > pe (11)procfs: impossible type (11)procfs: impossible > >> type > >> >> >> >> >> (11)procfs: > >> >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: > >> >> >> >> >> impossible type > >> >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type > >> >> >> >> (11)procfs: > >> >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: > >> >> >> >> >> impossible type > >> >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type > >> >> >> >> (11)procfs: > >> >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: > >> >> >> >> >> impossible type > >> >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type > >> >> >> >> >> >> (11)ptrace_unlink: > >> >> >> >> >> >> > vpop_reclaim failed > >> >> >> >> >> >> > <4>------------[ cut here ]------------ > >> >> >> >> >> >> > <1>kernel BUG at kernel/exit.c:1343! > >> >> >> >> >> >> > <1>invalid operand: 0000 [#1] > >> >> >> >> >> >> > <4>Modules linked in: nls_utf8 isofs zlib_inflate > >> loop nfsd > >> >> >> >> exportfs > >> >> >> >> >> >> > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport > >> >> >> >> iptable_filter > >> >> >> >> >> >> > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd > >> >> >> ehci_hcd > >> >> >> >> >> >> usbcore > >> >> >> >> >> >> > floppy drbd bonding sata_via libata sk98lin r8169 > >> via_rhine > >> >> >> >> dm_mod > >> >> >> >> >> >> > <4>CPU: 0 > >> >> >> >> >> >> > <4>EIP: 0060:[<c011da00>] Not tainted VLI > >> >> >> >> >> >> > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) > >> >> >> >> >> >> > <4>EIP is at wait_task_zombie+0x220/0x230 > >> >> >> >> >> >> > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: > >> >> >> c0430a2c > >> >> >> >> >> >> > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: > >> >> >> f7d11c98 > >> >> >> >> >> >> > <4>ds: 007b es: 007b ss: 0068 > >> >> >> >> >> >> > <4>Process child_reaper (pid: 2, threadinfo=f7d10000 > >> >> >> >> task=c1aeea80) > >> >> >> >> >> >> > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 > >> >> >> 00000000 > >> >> >> >> >> >> 00000000 > >> >> >> >> >> >> > 00000000 > >> >> >> >> >> >> > <4> 00000001 0001183d 00011667 dd600600 00000000 > >> >> >> 00000286 > >> >> >> >> >> >> f7d11d1c > >> >> >> >> >> >> > c011de68 > >> >> >> >> >> >> > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 > >> >> >> f7d10000 > >> >> >> >> >> >> c3378a40 > >> >> >> >> >> >> > f7d11d1c > >> >> >> >> >> >> > more> > >> >> >> >> >> >> > <4>Call Trace: > >> >> >> >> >> >> > <4> [<c0104a8f>] show_stack+0x7f/0xa0 > >> >> >> >> >> >> > <4> [<c0104c25>] show_registers+0x155/0x220 > >> >> >> >> >> >> > <4> [<c0104fac>] die+0xcc/0x190 > >> >> >> >> >> >> > <4> [<c01050f6>] do_trap+0x86/0xd0 > >> >> >> >> >> >> > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 > >> >> >> >> >> >> > <4> [<c010470b>] error_code+0x2b/0x30 > >> >> >> >> >> >> > <4> [<c011de68>] pproc_reap+0x228/0x2f0 > >> >> >> >> >> >> > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 > >> >> >> >> >> >> > <4> [<c020ea9f>] > >> >> dpvproc_nocldwait_async_handler+0x13f/0x300 > >> >> >> >> >> >> > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 > >> >> >> >> >> >> > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 > >> >> >> >> >> >> > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 > >> >> >> >> >> >> > <4> [<c01f77c9>] do_ssisys+0x99/0x200 > >> >> >> >> >> >> > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 > >> >> >> >> >> >> > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 > >> >> >> >> >> >> > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 > >> >> 77 a0 > >> >> >> >> ff ff > >> >> >> >> >> >> e9 24 > >> >> >> >> >> >> > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 > >> >> 00 00 > >> >> >> >> <0f> > >> >> >> >> >> >> 0b 3f 05 > >> >> >> >> >> >> > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 > >> >> >> >> >> >> > <4> > >> >> >> >> >> >> > kdb> > >> >> >> >> >> >> > kdb> bt > >> >> >> >> >> >> > Stack traceback for pid 2 > >> >> >> >> >> >> > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 > >> >> >> >> *child_reaper > >> >> >> >> >> >> > EBP EIP Function (args) > >> >> >> >> >> >> > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 > >> >> (0xc8709020, 0x0, > >> >> >> >> 0x0, > >> >> >> >> >> >> > 0xf7d11e4c, 0xf7d11e50) > >> >> >> >> >> >> > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, > >> >> >> >> 0xf7d11e4c, > >> >> >> >> >> >> > 0xf7d11e50, 0x11666) > >> >> >> >> >> >> > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, > >> >> >> 0xffffffff, > >> >> >> >> >> 0x20, > >> >> >> >> >> >> > 0x11666, 0xf7d11e4c) > >> >> >> >> >> >> > 0xf7d11efc 0xc020ea9f > >> dpvproc_nocldwait_async_handler+0x13f > >> >> >> >> >> >> (0xed3c9574, > >> >> >> >> >> >> > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) > >> >> >> >> >> >> > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 > >> >> >> >> >> (0xc1aeea80, 0x0, > >> >> >> >> >> >> > 0x40000001, 0x0, 0xc022f870) > >> >> >> >> >> >> > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 > >> >> >> >> >> >> > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 > >> >> >> >> >> >> > kdb> call print_task_struct 0xc8709020 > >> >> >> >> >> >> > state=0x20 > >> >> >> >> >> >> > flags=0x44c > >> >> >> >> >> >> > ptrace=0x0 > >> >> >> >> >> >> > lock_depth=-1 > >> >> >> >> >> >> > prio=116 > >> >> >> >> >> >> > static_prio=120 > >> >> >> >> >> >> > array=00000000 > >> >> >> >> >> >> > sleep_avg=899989756 > >> >> >> >> >> >> > interactive_credit=1 > >> >> >> >> >> >> > timestamp=269658653961877 > >> >> >> >> >> >> > activated=0x0 > >> >> >> >> >> >> > policy=0 > >> >> >> >> >> >> > &cpus_allowed=0xc870906c > >> >> >> >> >> >> > time_slice=49 > >> >> >> >> >> >> > first_time_slice=1 > >> >> >> >> >> >> > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 > >> >> >> >> >> >> > mm=00000000 > >> >> >> >> >> >> > active_mm=00000000 > >> >> >> >> >> >> > binfmt=c04b3e68 > >> >> >> >> >> >> > exit_code=9 > >> >> >> >> >> >> > exit_signal=-1 > >> >> >> >> >> >> > pdeath_signal=0 > >> >> >> >> >> >> > more> > >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input > >> ignored > >> >> >> >> >> >> > personality=0x0 > >> >> >> >> >> >> > did_exec=0 > >> >> >> >> >> >> > pid=71274 > >> >> >> >> >> >> > epid=71274 > >> >> >> >> >> >> > ppid=71270 > >> >> >> >> >> >> > tgid=71271 > >> >> >> >> >> >> > cltnode=0 > >> >> >> >> >> >> > p_vproc=0xdd600600 > >> >> >> >> >> >> > p_vfparent=0x00000000 > >> >> >> >> >> >> > group_leader=0xf22f2550 > >> >> >> >> >> >> > &pids=0xc87090c4 > >> >> >> >> >> >> > set_child_tid 0x00000000 > >> >> >> >> >> >> > clear_child_tid 0x00000000 > >> >> >> >> >> >> > rt_priority=0x0 > >> >> >> >> >> >> > it_real_value=0x0 > >> >> >> >> >> >> > it_prof_value=0x0 > >> >> >> >> >> >> > it_virt_value=0x0 > >> >> >> >> >> >> > it_real_incr=0x0 > >> >> >> >> >> >> > it_prof_incr=0x0 > >> >> >> >> >> >> > it_virt_incr=0x0 > >> >> >> >> >> >> > utime=0 > >> >> >> >> >> >> > stime=0 > >> >> >> >> >> >> > more> > >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input > >> ignored > >> >> >> >> >> >> > nvcsw=3 > >> >> >> >> >> >> > nivcsw=0 > >> >> >> >> >> >> > sig_utime=0 > >> >> >> >> >> >> > sig_stime=0 > >> >> >> >> >> >> > cutime=0 > >> >> >> >> >> >> > cstime=0 > >> >> >> >> >> >> > sig_nvcsw=0 > >> >> >> >> >> >> > sig_nivcsw=0 > >> >> >> >> >> >> > cnvcsw=0 > >> >> >> >> >> >> > cnivcsw=0 > >> >> >> >> >> >> > start_time.tv_sec=270328 > >> >> >> >> >> >> > start_time.tv_nsec=615704896 > >> >> >> >> >> >> > min_flt=0 > >> >> >> >> >> >> > maj_flt=0 > >> >> >> >> >> >> > sig_min_flt=0 > >> >> >> >> >> >> > sig_maj_flt=0 > >> >> >> >> >> >> > cmin_flt=0 > >> >> >> >> >> >> > cmaj_flt=0 > >> >> >> >> >> >> > uid=0 > >> >> >> >> >> >> > euid=0 > >> >> >> >> >> >> > suid=0 > >> >> >> >> >> >> > fsuid=0 > >> >> >> >> >> >> > more> > >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input > >> ignored > >> >> >> >> >> >> > gid=0 > >> >> >> >> >> >> > egid=0 > >> >> >> >> >> >> > sgid=0 > >> >> >> >> >> >> > fsgid=0 > >> >> >> >> >> >> > group_info=0xeaf0b980 > >> >> >> >> >> >> > cap_effective=0xfffffeff > >> >> >> >> >> >> > cap_inheritable=0x0 > >> >> >> >> >> >> > cap_permitted=0xfffffeff > >> >> >> >> >> >> > keep_capabilities=0 > >> >> >> >> >> >> > user=0xc0431ae0 > >> >> >> >> >> >> > &rlim=0xc1af02c4 > >> >> >> >> >> >> > used_math=1 > >> >> >> >> >> >> > comm=check_bacula > >> >> >> >> >> >> > locks=0 > >> >> >> >> >> >> > link_count=0 > >> >> >> >> >> >> > total_link_count=1 > >> >> >> >> >> >> > semvsem.undo_list=f4648200 > >> >> >> >> >> >> > fs=0x00000000 > >> >> >> >> >> >> > files=0x00000000 > >> >> >> >> >> >> > namespace=0x00000000 > >> >> >> >> >> >> > signal=0xc1af0240 > >> >> >> >> >> >> > sighand=0xf6c86580 > >> >> >> >> >> >> > more> > >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input > >> ignored > >> >> >> >> >> >> > &blocked=0xc8709484 > >> >> >> >> >> >> > &real_blocked=0xc870948c > >> >> >> >> >> >> > &pending=0xc8709494 > >> >> >> >> >> >> > sas_ss_sp=0x0 > >> >> >> >> >> >> > sas_ss_size=0x00000000 > >> >> >> >> >> >> > notifier_data=0x00000000 > >> >> >> >> >> >> > notifier_mask=0x00000000 > >> >> >> >> >> >> > security=0x00000000 > >> >> >> >> >> >> > audit_context=0x00000000 > >> >> >> >> >> >> > parent_exec_id=0x16 > >> >> >> >> >> >> > self_exec_id=0x16 > >> >> >> >> >> >> > journal_info=0x00000000 > >> >> >> >> >> >> > proc_dentry=0xcb5fe8d4 > >> >> >> >> >> >> > backing_dev_info=0x00000000 > >> >> >> >> >> >> > io_context=0x00000000 > >> >> >> >> >> >> > ptrace_message=0x0 > >> >> >> >> >> >> > last_siginfo=0x00000000 > >> >> >> >> >> >> > p_nodetime=0 > >> >> >> >> >> >> > p_ticks_delta=0 > >> >> >> >> >> >> > icsprio=0x0 > >> >> >> >> >> >> > execnode=0x00000000 > >> >> >> >> >> >> > node_context=1 > >> >> >> >> >> >> > more> > >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input > >> ignored > >> >> >> >> >> >> > rcopy_task_size=0 > >> >> >> >> >> >> > &mosix=0xc8709504 > >> >> >> >> >> >> > Function print_task_struct returned 0x0 > >> >> >> >> >> >> > kdb> btp 71274 > >> >> >> >> >> >> > Stack traceback for pid 71274 > >> >> >> >> >> >> > 0xc8709020 71274 71270 0 0 E 0xc87091e0 > >> >> >> >> check_bacula > >> >> >> >> >> >> > EBP EIP Function (args) > >> >> >> >> >> >> > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, > >> >> 0xf22f4dc0, > >> >> >> >> >> >> 0xc012523f, > >> >> >> >> >> >> > 0xd8017e9c, 0x0) > >> >> >> >> >> >> > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, > >> >> >> 0xd8016000) > >> >> >> >> >> >> > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, > >> >> >> >> 0xd8016000, > >> >> >> >> >> >> > 0xd8016000) > >> >> >> >> >> >> > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db > >> >> (0xd8017f18, > >> >> >> >> >> >> 0xd8017ef8, > >> >> >> >> >> >> > 0xd8017fc4, 0x0, 0xc0218c59) > >> >> >> >> >> >> > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, > >> >> >> >> 0xcc7be550, > >> >> >> >> >> >> > 0xc8709020) > >> >> >> >> >> >> > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 > >> >> >> >> >> >> > 0xc0103cf6 work_notifysig+0x13 > >> >> >> >> >> >> > kdb> btp 71270 > >> >> >> >> >> >> > Stack traceback for pid 71270 > >> >> >> >> >> >> > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 > >> gdb > >> >> >> >> >> >> > EBP EIP Function (args) > >> >> >> >> >> >> > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, > >> >> >> >> 0xe2d7fecc, > >> >> >> >> >> >> > 0xf3eade68) > >> >> >> >> >> >> > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, > >> 0x1166a, > >> >> >> 0xa0, > >> >> >> >> >> 0x0, > >> >> >> >> >> >> > 0xf3eadedc) > >> >> >> >> >> >> > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, > >> >> 0x0, > >> >> >> >> >> 0x0, 0x0) > >> >> >> >> >> >> > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, > >> >> >> >> 0x80000000, 0x0) > >> >> >> >> >> >> > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 > >> >> >> >> >> >> > 0xc0103c55 sysenter_past_esp+0x52 > >> >> >> >> >> >> > kdb> btp 117331 > >> >> >> >> >> >> > Stack traceback for pid 117331 > >> >> >> >> >> >> > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 > >> bash > >> >> >> >> >> >> > EBP EIP Function (args) > >> >> >> >> >> >> > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, > >> >> 0xffffffff, > >> >> >> >> 0x4, > >> >> >> >> >> >> 0x1ca53, > >> >> >> >> >> >> > 0xd5399e94) > >> >> >> >> >> >> > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, > >> >> 0xffffffff, > >> >> >> >> 0x24, > >> >> >> >> >> >> > 0xbffff038, 0xd5399edc) > >> >> >> >> >> >> > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, > >> 0x0, > >> >> >> >> >> >> 0xbffff038, 0x0) > >> >> >> >> >> >> > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, > >> >> 0xbffff038, > >> >> >> >> >> 0xa, 0x0) > >> >> >> >> >> >> > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 > >> >> >> >> >> >> > 0xc0103c55 sysenter_past_esp+0x52 > >> >> >> >> >> >> > kdb> > >> >> >> >> >> >> > kdb> call print_pvproc 0xdd600600 > >> >> >> >> >> >> > pvp_flag=0x63727076 > >> >> >> >> >> >> > pvp_wstate=0x1166a > >> >> >> >> >> >> > pvp_pproc=0x00000007 > >> >> >> >> >> >> > pvp_head_childl=0xdd60061c > >> >> >> >> >> >> > pvp_childl=0x00000000 > >> >> >> >> >> >> > pvp_head_pgrpl=0x00000000 > >> >> >> >> >> >> > pvp_pgrpl=0x00000000 > >> >> >> >> >> >> > pvp_sessionl=0x00083049 > >> >> >> >> >> >> > pvp_head_oclist=0x00000005 > >> >> >> >> >> >> > pvp_oclist=0xc8709020 > >> >> >> >> >> >> > pvp_ppid=0 > >> >> >> >> >> >> > pvp_oppid=0 > >> >> >> >> >> >> > pvp_sid=0 > >> >> >> >> >> >> > pvp_pgid=-489161216 > >> >> >> >> >> >> > pvp_pp_sid=0 > >> >> >> >> >> >> > pvp_pp_pgid=0 > >> >> >> >> >> >> > pvp_fromnode=0 > >> >> >> >> >> >> > pvp_tonode=71270 > >> >> >> >> >> >> > pvp_cttynode=71271 > >> >> >> >> >> >> > pvp_cttydev=0x1ca53 > >> >> >> >> >> >> > pvp_jobc=71271 > >> >> >> >> >> >> > pvp_pgrp_ldr_seqno=1 > >> >> >> >> >> >> > more> > >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input > >> ignored > >> >> >> >> >> >> > pvp_pgrp_mem_seqno=-580909344 > >> >> >> >> >> >> > pvp_fork_sigmigarg=-580909332 > >> >> >> >> >> >> > pvp.ml.ml_flag=1 > >> >> >> >> >> >> > pvp.ml.ml_shr_count=-580909376 > >> >> >> >> >> >> > pvp.ml.ml_excl_count=0 > >> >> >> >> >> >> > pvp_loadlevel=-580909332 > >> >> >> >> >> >> > pvp_pin=0 > >> >> >> >> >> >> > pvp_localview=0 > >> >> >> >> >> >> > Function print_pvproc returned 0x0 > >> >> >> >> >> >> > kdb> > >> >> >> >> >> >> > > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> ------------------------------------------------------- > >> >> >> >> >> >> This SF.net email is sponsored by: Splunk Inc. Do you grep > >> >> >> >> through log > >> >> >> >> >> >> files > >> >> >> >> >> >> for problems? Stop! Download the new AJAX search engine > >> >> that > >> >> >> >> makes > >> >> >> >> >> >> searching your log files as easy as surfing the web. > >> >> DOWNLOAD > >> >> >> >> >> SPLUNK! > >> >> >> >> >> >> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > >> >> >> >> >> >> _______________________________________________ > >> >> >> >> >> >> ssic-linux-devel mailing list > >> >> >> >> >> >> ssi...@li... > >> >> >> >> >> >> > >> https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > >> >> >> >> >> >> > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> > > >> >> >> > > >> >> >> > >> >> > > >> >> > > >> >> > >> >> > >> >> > >> > > >> > > >> > >> > >> > > > > > > > |
From: Roger T. <rog...@gm...> - 2006-06-06 02:55:55
|
Hi, I applied updated my repository with those patches Laura just checked in, but getting these errors. Maybe the patch is incomplete? CC cluster/ssi/vproc/dvp_pvpops.o CC cluster/ssi/vproc/dvp_pvpsops.o CC cluster/ssi/vproc/dvp_vpops.o CC cluster/ssi/util/rcopy.o cluster/ssi/vproc/dvp_vpops.c: In function `vpop_cleanup_vproc_relations': cluster/ssi/vproc/dvp_vpops.c:798: error: `SIGCHILD' undeclared (first use in this function) cluster/ssi/vproc/dvp_vpops.c:798: error: (Each undeclared identifier is reported only once cluster/ssi/vproc/dvp_vpops.c:798: error: for each function it appears in.) CC cluster/ssi/util/gfs_mount.o make[3]: *** [cluster/ssi/vproc/dvp_vpops.o] Error 1 make[3]: *** Waiting for unfinished jobs.... CC cluster/ssi/util/nfs_mount.o CC cluster/ssi/util/ssipty.o make[2]: *** [cluster/ssi/vproc] Error 2 make[2]: *** Waiting for unfinished jobs.... LD cluster/ssi/util/built-in.o make[1]: *** [cluster/ssi] Error 2 make: *** [cluster] Error 2 Roger On 6/1/06, Vladimir Razgulin <one...@gm...> wrote: > Laura, > with thread_deadlock2.patch > I still have two zombies and sleeping gdb, > but it doesn't look like a deadlock anymore: > > ----------------------------------------------------------------------- > 0xdf9c47b0 73119 73081 0 1 S 0xdf9c4970 bash > 0xdfa87230 73165 73119 0 2 S 0xdfa873f0 gdb > 0xf6561130 73204 73165 0 1 Z 0xf65612f0 ios_mon > 0xdff44730 73208 1 0 3 Z 0xdff448f0 ios_mon > > ----------------------------------------------------------------------- > Stack traceback for pid 73165 > 0xdfa87230 73165 73119 0 2 S 0xdfa873f0 gdb > EBP EIP Function (args) > 0xc3b87f88 0xc03dce4c schedule+0x3fc (0xc3b87fa4, 0x8274e00, 0x8, 0x10000, 0x0) > 0xc3b87fb4 0xc0104590 sys_rt_sigsuspend+0xe0 > 0xc0106ae1 syscall_call+0x7 > ----------------------------------------------------------------------- > Stack traceback for pid 73204 > 0xf6561130 73204 73165 0 1 Z 0xf65612f0 ios_mon > EBP EIP Function (args) > 0xf502ff5c 0xc03dce4c schedule+0x3fc (0xf6561130, 0xf6561130, > 0xdfee3130, 0xf502ff88, 0x0) > 0xf502ff88 0xc0129682 do_exit+0x232 (0xf6561130, 0xdfc0ab84, > 0xdfc0a680, 0x7f, 0x0) > 0xf502ffa8 0xc0129879 do_group_exit+0x39 (0x7f00) > 0xf502ffb4 0xc0129915 sys_exit_group+0x15 > 0xc0106ae1 syscall_call+0x7 > > ----------------------------------------------------------------------- > Stack traceback for pid 73208 > 0xdff44730 73208 1 0 3 Z 0xdff448f0 ios_mon > EBP EIP Function (args) > 0xc3b19e5c 0xc03dce4c schedule+0x3fc (0xdff44730, 0x1, 0xc3b19e78, 0x8, 0x1) > 0xc3b19e88 0xc0129682 do_exit+0x232 (0x0, 0x0, 0x0, 0x9, 0xc3b18000) > 0xc3b19ea8 0xc0129879 do_group_exit+0x39 (0x9, 0x0, 0x0, 0xdfc0a680, 0xc3b18000) > 0xc3b19ed4 0xc0134313 get_signal_to_deliver+0x2a3 (0xc3b19f10, > 0xc3b19ef0, 0xc3b19fbc, 0x0, 0xf5dcf080) > 0xc3b19f9c 0xc0106850 do_signal+0x70 (0x3b83f640, 0x805f2ec, > 0x805f2ec, 0x407679b8) > 0xc3b19fb4 0xc0106987 do_notify_resume+0x57 > 0xc0106b72 work_notifysig+0x13 > ----------------------------------------------------------------------- > call print_task_struct f6561130 > > state=0x20 > flags=0x80200c > ptrace=0x1f1 > lock_depth=-1 > prio=116 > static_prio=120 > array=00000000 > sleep_avg=900000000 > timestamp=4294907435000000 > activated=0x0 > policy=0 > &cpus_allowed=0xf6561180 > time_slice=11 > first_time_slice=0 > tasks.next 0xc044ec9c, tasks.prev 0xdfa8728c > mm=00000000 > active_mm=00000000 > binfmt=c04d9d74 > exit_code=32512 > exit_signal=17 > pdeath_signal=0 > personality=0x400000 > did_exec=1 > pid=73204 > epid=73204 > ppid=73165 > tgid=73204 > cltnode=0 > p_vproc=0xf5f07e00 > p_vfparent=0x00000000 > group_leader=0xf6561130 > &pids=0xf65611d8 > set_child_tid 0x00000000 > clear_child_tid 0x00000000 > rt_priority=0x0 > utime=4 > stime=4 > nvcsw=34 > nivcsw=0 > sig_utime=0 > sig_stime=0 > cutime=0 > cstime=0 > sig_nvcsw=24 > sig_nivcsw=0 > cnvcsw=12 > cnivcsw=0 > start_time.tv_sec=219 > start_time.tv_nsec=579684304 > min_flt=798 > maj_flt=0 > sig_min_flt=20 > sig_maj_flt=0 > cmin_flt=4 > cmaj_flt=0 > uid=0 > euid=0 > suid=0 > fsuid=0 > gid=0 > egid=0 > sgid=0 > fsgid=0 > group_info=0xdfcf9a80 > cap_effective=0xfffffeff > cap_inheritable=0x0 > cap_permitted=0xfffffeff > keep_capabilities=0 > user=0xc0453c00 > &rlim=0xdfc3f2d4 > comm=ios_mon > locks=0 > link_count=0 > total_link_count=0 > semvsem.undo_list=f4c96660 > fs=0x00000000 > files=0x00000000 > namespace=0x00000000 > signal=0xdfc3f200 > sighand=0xdfc0a680 > &blocked=0xf6561594 > &real_blocked=0xf656159c > &pending=0xf65615a4 > sas_ss_sp=0x0 > sas_ss_size=0x00000000 > notifier_data=0x00000000 > notifier_mask=0x00000000 > security=0x00000000 > audit_context=0x00000000 > parent_exec_id=0x10 > self_exec_id=0x12 > journal_info=0x00000000 > proc_dentry=0xf51e6170 > backing_dev_info=0x00000000 > io_context=0x00000000 > ptrace_message=0x11df9 > last_siginfo=0x00000000 > p_nodetime=0 > p_ticks_delta=0 > icsprio=0x0 > execnode=0x00000000 > node_context=1 > rcopy_task_size=0 > &mosix=0xf6561664 > Function print_task_struct returned 0x15 > -------------------------------------------------------------------- > call print_vproc f5f07e00 > > vp_magic=0x63727076 (should be 0x63727076) > vp_pid=73204 > vp_ref_cnt=17 > vp_data=0xf5f07e24 > vp_hashfwd=0xdffe0000 > vp_hashbwd=0x00000000 > Function print_vproc returned 0x19 > -------------------------------------------------------------------- > call print_pvproc f5f07e24 > > pvp_flag=0x81706b > pvp_wstate=0x5 > pvp_pproc=0xf6561130 > pvp_head_childl=0x00000000 > pvp_childl=0x00000000 > pvp_head_pgrpl=0xdfdad000 > pvp_pgrpl=0x00000000 > pvp_sessionl=0xdffc4200 > pvp_head_oclist=0x00000000 > pvp_oclist=0x00000000 > pvp_ppid=73165 > pvp_oppid=0 > pvp_sid=73119 > pvp_pgid=73204 > pvp_pp_sid=73119 > pvp_pp_pgid=73165 > pvp_fromnode=1 > pvp_tonode=1 > pvp_cttynode=1 > pvp_cttydev=0x8800000 > pvp_jobc=0 > pvp_pgrp_ldr_seqno=0 > pvp_pgrp_mem_seqno=0 > pvp_fork_sigmigarg=0 > pvp.ml.ml_flag=0 > pvp.ml.ml_shr_count=0 > pvp.ml.ml_excl_count=0 > pvp_loadlevel=0 > pvp_pin=0 > pvp_localview=0 > Function print_pvproc returned 0x13 > -------------------------------------------------------------------- > -------------------------------------------------------------------- > call print_task_struct dff44730 > > state=0x20 > flags=0x80244c > ptrace=0x0 > lock_depth=-1 > prio=116 > static_prio=120 > array=00000000 > sleep_avg=900000000 > timestamp=4294907435000000 > activated=0x0 > policy=0 > &cpus_allowed=0xdff44780 > time_slice=23 > first_time_slice=1 > tasks.next 0xc044ec9c, tasks.prev 0xf656ce0c > mm=00000000 > active_mm=00000000 > binfmt=c04d9d74 > exit_code=32512 > exit_signal=-1 > pdeath_signal=0 > personality=0x400000 > did_exec=0 > pid=73208 > epid=73208 > ppid=1 > tgid=73204 > cltnode=0 > p_vproc=0xdfdad000 > p_vfparent=0x00000000 > group_leader=0xf6561130 > &pids=0xdff447d8 > set_child_tid 0x00000000 > clear_child_tid 0x00000000 > rt_priority=0x0 > utime=0 > stime=1 > nvcsw=16 > nivcsw=0 > sig_utime=0 > sig_stime=0 > cutime=0 > cstime=0 > sig_nvcsw=24 > sig_nivcsw=0 > cnvcsw=12 > cnivcsw=0 > start_time.tv_sec=219 > start_time.tv_nsec=857633304 > min_flt=2 > maj_flt=0 > sig_min_flt=20 > sig_maj_flt=0 > cmin_flt=4 > cmaj_flt=0 > uid=0 > euid=0 > suid=0 > fsuid=0 > gid=0 > egid=0 > sgid=0 > fsgid=0 > group_info=0xdfcf9a80 > cap_effective=0xfffffeff > cap_inheritable=0x0 > cap_permitted=0xfffffeff > keep_capabilities=0 > user=0xc0453c00 > &rlim=0xdfc3f2d4 > comm=ios_mon > locks=0 > link_count=0 > total_link_count=0 > semvsem.undo_list=f4c96660 > fs=0x00000000 > files=0x00000000 > namespace=0x00000000 > signal=0xdfc3f200 > sighand=0xdfc0a680 > &blocked=0xdff44b94 > &real_blocked=0xdff44b9c > &pending=0xdff44ba4 > sas_ss_sp=0x0 > sas_ss_size=0x00000000 > notifier_data=0x00000000 > notifier_mask=0x00000000 > security=0x00000000 > audit_context=0x00000000 > parent_exec_id=0x12 > self_exec_id=0x12 > journal_info=0x00000000 > proc_dentry=0xf643dc20 > backing_dev_info=0x00000000 > io_context=0x00000000 > ptrace_message=0x11df7 > last_siginfo=0x00000000 > p_nodetime=0 > p_ticks_delta=0 > icsprio=0x0 > execnode=0x00000000 > node_context=1 > rcopy_task_size=0 > &mosix=0xdff44c64 > Function print_task_struct returned 0x15 > -------------------------------------------------------------------- > call print_vproc dfdad000 > > vp_magic=0x63727076 (should be 0x63727076) > vp_pid=73208 > vp_ref_cnt=7 > vp_data=0xdfdad024 > vp_hashfwd=0xf5f00400 > vp_hashbwd=0x00000000 > Function print_vproc returned 0x19 > -------------------------------------------------------------------- > call print_pvproc dfdad024 > > pvp_flag=0x83069 > pvp_wstate=0x5 > pvp_pproc=0xdff44730 > pvp_head_childl=0x00000000 > pvp_childl=0xf5e65e00 > pvp_head_pgrpl=0x00000000 > pvp_pgrpl=0xf5f07e00 > pvp_sessionl=0x00000000 > pvp_head_oclist=0x00000000 > pvp_oclist=0x00000000 > pvp_ppid=1 > pvp_oppid=0 > pvp_sid=73119 > pvp_pgid=73204 > pvp_pp_sid=73119 > pvp_pp_pgid=73204 > pvp_fromnode=1 > pvp_tonode=1 > pvp_cttynode=1 > pvp_cttydev=0x8800000 > pvp_jobc=0 > pvp_pgrp_ldr_seqno=0 > pvp_pgrp_mem_seqno=0 > pvp_fork_sigmigarg=0 > pvp.ml.ml_flag=0 > pvp.ml.ml_shr_count=0 > pvp.ml.ml_excl_count=0 > pvp_loadlevel=0 > pvp_pin=0 > pvp_localview=0 > Function print_pvproc returned 0x13 > -------------------------------------------------------------------- > > > On 5/31/06, Laura Ramirez <lau...@hp...> wrote: > > Hi Vladimir > > > > Please remove "thread_deadlock.patch" and apply the attached > > patch instead. > > > > thanks, > > > > laura > > > > Vladimir Razgulin wrote: > > > Laura, > > > > > > with thread_deadlock.patch I still get deadlock between the threads > > > and a frozen gdb - see enclosed ionfo. > > > > > > Thanks > > > Vladimir > > > > > > 0xf6526830 73134 72318 0 0 S 0xf65269f0 sshd > > > 0xf6527330 73149 73134 0 0 S 0xf65274f0 bash > > > 0xdfc060b0 73213 73149 0 2 S 0xdfc06270 gdb > > > 0xf645a3b0 73246 73213 0 1 D 0xf645a570 ios_mon > > > 0xdf990630 73250 73213 0 1 D 0xdf9907f0 ios_mon > > > ------------------------------------------------------------------------- > > > btp 73246 > > > > > > Stack traceback for pid 73246 > > > 0xf645a3b0 73246 73213 0 1 D 0xf645a570 ios_mon > > > EBP EIP Function (args) > > > 0xdf995e10 0xc03dce4c schedule+0x3fc (0xf645a3b0, 0x1, 0xf645a3b0, > > > 0xc0121dd0, 0xdffea698) > > > 0xdf995e48 0xc03dc835 __down+0x75 (0xdffea600, 0xdffea600) > > > 0xdf995e58 0xc03dc9e6 __down_failed+0xa > > > 0xc0227b9b .text.lock.dvp_pvpops+0xad > > > 0xdf995efc 0xc02214d8 pvpop_reassign_child+0x178 (0xdffea600, 0x10, > > > 0x0, 0xdf994000, 0xdf994000) > > > 0xdf995f28 0xc02263f2 pvpop_reassign_original_parent+0x132 (0xdffeaa00, > > > 0x11e22) > > > 0xdf995f38 0xc022d308 vpop_reassign_original_parent+0x18 (0xdffeaa00, > > > 0x11e22, 0xf65cf980, 0xdf995f4c, 0xdf995f4c) > > > 0xdf995f5c 0xc0129272 exit_notify+0xc2 (0xf645a3b0, 0xf645a3b0, > > > 0xdf990bb0, 0xdf995f88, 0x0) > > > 0xdf995f88 0xc0129638 do_exit+0x1e8 (0xf645a3b0, 0xc3a80684, > > > 0xc3a80180, 0x7f, 0x0) > > > 0xdf995fa8 0xc0129879 do_group_exit+0x39 (0x7f00) > > > 0xdf995fb4 0xc0129915 sys_exit_group+0x15 > > > 0xc0106ae1 syscall_call+0x7 > > > ------------------------------------------------------------------------- > > > btp 73250 > > > > > > Stack traceback for pid 73250 > > > 0xdf990630 73250 73213 0 1 D 0xdf9907f0 ios_mon > > > EBP EIP Function (args) > > > 0xf4fc5d54 0xc03dce4c schedule+0x3fc (0xdf990630, 0x1, 0xdf990630, > > > 0xc0121dd0, 0xdffeaa98) > > > 0xf4fc5d8c 0xc03dc835 __down+0x75 (0x79, 0xdffeaafc) > > > 0xf4fc5d9c 0xc03dc9e6 __down_failed+0xa > > > 0xc0227eac .text.lock.dvp_pvpops+0x3be > > > 0xf4fc5dfc 0xc0225fe7 pvpop_add_to_originalchild_list+0x1e7 > > > (0xdffeaa00, 0xdffeae00, 0x0, 0x0, 0xf4fc4000) > > > 0xf4fc5e28 0xc02263e1 pvpop_reassign_original_parent+0x121 (0xdffea600, > > > 0x11e1e) > > > 0xf4fc5e38 0xc022d308 vpop_reassign_original_parent+0x18 (0xdffea600, > > > 0x11e1e, 0xf5067200, 0xf4fc5e4c, 0xf4fc5e4c) > > > 0xf4fc5e5c 0xc0129272 exit_notify+0xc2 (0xdf990630, 0xf65ce840, > > > 0xf4fc5e78, 0x8, 0x1) > > > 0xf4fc5e88 0xc0129638 do_exit+0x1e8 (0x0, 0x0, 0x0, 0x9, 0xf4fc4000) > > > 0xf4fc5ea8 0xc0129879 do_group_exit+0x39 (0x9, 0x0, 0x0, 0xc3a80180, > > > 0xf4fc4000) > > > 0xf4fc5ed4 0xc0134313 get_signal_to_deliver+0x2a3 (0xf4fc5f10, > > > 0xf4fc5ef0, 0xf4fc5fbc, 0x0, 0xf65cf980) > > > 0xf4fc5f9c 0xc0106850 do_signal+0x70 (0x3b970910, 0x805f2ec, > > > 0x805f2ec, 0x407679b8) > > > 0xf4fc5fb4 0xc0106987 do_notify_resume+0x57 > > > 0xc0106b72 work_notifysig+0x13 > > > ------------------------------------------------------------------------- > > > md 0xdffeaa00 > > > > > > 0xdffeaa00 63727076 00011e1e 00000012 dffeaa24 vprc........$... > > > 0xdffeaa10 00000001 dead4ead 00000000 00000000 .....N.......... > > > 0xdffeaa20 00000000 00817063 00000001 f645a3b0 ....cp........E. > > > 0xdffeaa30 00000000 00000000 dffea600 00000000 ................ > > > 0xdffeaa40 c3a4d200 00000000 00000000 00011dfd ................ > > > 0xdffeaa50 00000000 00011dbd 00011e1e 00011dbd ................ > > > 0xdffeaa60 00011dfd 00000001 00000001 00000001 ................ > > > 0xdffeaa70 08800001 00000001 00000001 dead4ead .............N.. > > > ------------------------------------------------------------------------- > > > md 0xdffea600 > > > > > > 0xdffea600 63727076 00011e22 00000009 dffea624 vprc".......$... > > > 0xdffea610 00000001 dead4ead 00000000 00000000 .....N.......... > > > 0xdffea620 00000000 00083061 00000001 df990630 ....a0......0... > > > 0xdffea630 00000000 dffeaa00 00000000 dffeaa00 ................ > > > 0xdffea640 00000000 00000000 00000000 00011dfd ................ > > > 0xdffea650 00000001 00011dbd 00011e1e 00011dbd ................ > > > 0xdffea660 00011e1e 00000001 00000001 00000001 ................ > > > 0xdffea670 08800001 00000000 00000001 dead4ead .............N.. > > > ------------------------------------------------------------------------- > > > call print_vproc 0xdffeaa00 > > > > > > vp_magic=0x63727076 (should be 0x63727076) > > > vp_pid=73246 > > > vp_ref_cnt=18 > > > vp_data=0xdffeaa24 > > > vp_hashfwd=0x00000000 > > > vp_hashbwd=0x00000000 > > > Function print_vproc returned 0x19 > > > ------------------------------------------------------------------------- > > > call print_pvproc 0xdffeaa24 > > > > > > pvp_flag=0x817063 > > > pvp_wstate=0x1 > > > pvp_pproc=0xf645a3b0 > > > pvp_head_childl=0x00000000 > > > pvp_childl=0x00000000 > > > pvp_head_pgrpl=0xdffea600 > > > pvp_pgrpl=0x00000000 > > > pvp_sessionl=0xc3a4d200 > > > pvp_head_oclist=0x00000000 > > > pvp_oclist=0x00000000 > > > pvp_ppid=73213 > > > pvp_oppid=0 > > > pvp_sid=73149 > > > pvp_pgid=73246 > > > pvp_pp_sid=73149 > > > pvp_pp_pgid=73213 > > > pvp_fromnode=1 > > > pvp_tonode=1 > > > pvp_cttynode=1 > > > pvp_cttydev=0x8800001 > > > pvp_jobc=1 > > > pvp_pgrp_ldr_seqno=0 > > > pvp_pgrp_mem_seqno=0 > > > pvp_fork_sigmigarg=0 > > > pvp.ml.ml_flag=2 > > > pvp.ml.ml_shr_count=1 > > > pvp.ml.ml_excl_count=0 > > > pvp_loadlevel=0 > > > pvp_pin=0 > > > pvp_localview=0 > > > Function print_pvproc returned 0x13 > > > ------------------------------------------------------------------------- > > > call print_vproc 0xdffea600 > > > > > > vp_magic=0x63727076 (should be 0x63727076) > > > vp_pid=73250 > > > vp_ref_cnt=9 > > > vp_data=0xdffea624 > > > vp_hashfwd=0x00000000 > > > vp_hashbwd=0x00000000 > > > Function print_vproc returned 0x19 > > > ------------------------------------------------------------------------- > > > call print_pvproc 0xdffea624 > > > > > > pvp_flag=0x83061 > > > pvp_wstate=0x1 > > > pvp_pproc=0xdf990630 > > > pvp_head_childl=0x00000000 > > > pvp_childl=0xdffeaa00 > > > pvp_head_pgrpl=0x00000000 > > > pvp_pgrpl=0xdffeaa00 > > > pvp_sessionl=0x00000000 > > > pvp_head_oclist=0x00000000 > > > pvp_oclist=0x00000000 > > > pvp_ppid=73213 > > > pvp_oppid=1 > > > pvp_sid=73149 > > > pvp_pgid=73246 > > > pvp_pp_sid=73149 > > > pvp_pp_pgid=73246 > > > pvp_fromnode=1 > > > pvp_tonode=1 > > > pvp_cttynode=1 > > > pvp_cttydev=0x8800001 > > > pvp_jobc=0 > > > pvp_pgrp_ldr_seqno=0 > > > pvp_pgrp_mem_seqno=0 > > > pvp_fork_sigmigarg=0 > > > pvp.ml.ml_flag=2 > > > pvp.ml.ml_shr_count=1 > > > pvp.ml.ml_excl_count=0 > > > pvp_loadlevel=0 > > > pvp_pin=0 > > > pvp_localview=0 > > > Function print_pvproc returned 0x13 > > > ------------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > On 5/30/06, Laura Ramirez <lau...@hp...> wrote: > > >> > > >> HI Vladmimir, > > >> > > >> Good to hear that the second patch fixed the original problem. > > >> > > >> From looking at the stack traces, it does appear to be a > > >> deadlock. Pid 73961 is reaping pid 73980. It wants to > > >> reassign it to its original parent 73976. It currently has > > >> the vproc lock for 73980 and wants the vproc lock for 73976. > > >> However, 73976 is currently exiting, and its trying to reassign > > >> its children to next thread (ie. 73980). So it currently has > > >> its own 73976 lock and wants the vproc lock for 73980, and so > > >> it's now deadlocked. Besides the deadlock bug, maybe the real > > >> problem is 73976 reassigning the children to 73980, which has > > >> already run through its exit code and will never clean up the > > >> list again. > > >> > > >> Attached is a patch that checks the state of the thread, if > > >> its zombie, then it reassigns it to the init process. > > >> > > >> laura > > >> > > >> > > >> Vladimir Razgulin wrote: > > >> > Laura, > > >> > > > >> > Thank you for ptrace2_reap.patch > > >> > I've tried it, and it fixed the original problem. > > >> > However, now I have something looking like a racing condition. > > >> > > > >> > Again, I'm debugging a multithreaded program: > > >> > > > >> > gdb (pid=73961) has started ios_mon (pid=73976), which started three > > >> other > > >> > threads (pids=73979, 73980, 73981). > > >> > The thread 73979 was terminated successfully (thanks to > > >> ptrace2_reap.patch) > > >> > and I've got the following: > > >> > > > >> > ----------------------------------------------------------------- > > >> > 0xf5778bb0 73927 73921 0 0 S 0xf5778d70 bash > > >> > 0xdfd746b0 73961 73927 0 3 D 0xdfd74870 gdb > > >> > 0xdf990630 73976 73961 0 1 D 0xdf9907f0 ios_mon > > >> > 0xdf991130 73980 73961 0 2 E 0xdf9912f0 ios_mon > > >> > 0xdfeaf330 73981 73961 0 0 Z 0xdfeaf4f0 ios_mon > > >> > 0xf576c030 74134 72761 0 1 D 0xf576c1f0 bash > > >> > ----------------------------------------------------------------- > > >> > > > >> > gdb (73961) wants to do VPROC_LOCK_EXCL for vproc=0xc3a35800, > > >> > pvproc=0xc3a35824, > > >> > which is owned by ios_mon (73976) > > >> > > > >> > ---------------------------------------------------------------- > > >> > Stack traceback for pid 73961 > > >> > 0xdfd746b0 73961 73927 0 3 D 0xdfd74870 gdb > > >> > EBP EIP Function (args) > > >> > 0xf3e59b08 0xc03dce0c schedule+0x3fc (0xdfd746b0, 0x1, 0xdfd746b0, > > >> > 0xc0121dd0, 0xf3cafd04) > > >> > 0xf3e59b40 0xc03dc7f5 __down+0x75 (0x0, 0x79) > > >> > 0xf3e59b50 0xc03dc9a6 __down_failed+0xa > > >> > 0xc0227b04 .text.lock.dvp_pvpops+0x56 > > >> > 0xf3e59bbc 0xc0221007 pvpop_add_child_to_parent+0x247 (0xc3a35800, > > >> > 0xc3a35a00, 0x5, 0x0, 0x0) > > >> > 0xf3e59c3c 0xc0221cc0 pvpop_reclaim_child+0x360 (0xc3a35a00, 0x120f8, > > >> > 0x0, 0x0, 0x0) > > >> > 0xf3e59c64 0xc022c3a2 vpop_reclaim_child+0x52 (0xc3a35a00, 0x120f8, > > >> > 0x0, 0x0, 0x0) > > >> > 0xf3e59c94 0xc012f835 __ptrace_unlink+0x75 (0xdf991130, 0x120f8, > > >> > 0xf3e59cc4, 0x0, 0x0) > > >> > 0xf3e59cd4 0xc0129b68 wait_task_zombie+0x1c8 (0xdf991130, 0x0, 0x0, > > >> > 0xf3e59e8c, 0xf3e59ed4) > > >> > 0xf3e59d20 0xc012a194 pproc_reap+0x324 (0xdf991130, 0x0, 0xf3e59e8c, > > >> > 0xf3e59ed4, 0x120e9) > > >> > 0xf3e59e30 0xc0220a8d pvpop_reap+0x26d (0xc3a35a00, 0xffffffff, 0x20, > > >> > 0x120e9, 0xf3e59e8c) > > >> > 0xf3e59eb0 0xc022b8db vpop_wait+0x69b (0xf6431400, 0xffffffff, 0xa1, > > >> > 0xbfeec4ec, 0xf3e59ed4) > > >> > 0xf3e59f80 0xc0129cec do_wait+0xbc (0xffffffff, 0x80000005, 0x0, > > >> > 0xbfeec4ec, 0x0) > > >> > 0xf3e59f9c 0xc0129dcc sys_wait4+0x3c (0xffffffff, 0xbfeec4ec, > > >> > 0x80000001, 0x0) > > >> > 0xf3e59fb4 0xc0129df5 sys_waitpid+0x25 > > >> > 0xc0106ae1 syscall_call+0x7 > > >> > ----------------------------------------------------------------------- > > >> > md c3a35824 > > >> > > > >> > 0xc3a35824 00817063 00000001 df990630 00000000 cp......0....... > > >> > 0xc3a35834 00000000 c3a4a600 00000000 f6431400 ..............C. > > >> > 0xc3a35844 c3a35a00 00000000 000120e9 00000000 .Z....... ...... > > >> > 0xc3a35854 000120c7 000120f8 000120c7 000120e9 . ... ... ... .. > > >> > 0xc3a35864 00000001 00000001 00000001 08800001 ................ > > >> > 0xc3a35874 00000001 00000001 dead4ead 00000000 .........N...... > > >> > 0xc3a35884 ffffffff 00000001 df990630 00000001 ........0....... > > >> > ^^^^^^^^ > > >> > 0xc3a35894 dead4ead f3e59b20 f42b5d04 df990630 .N.. ....]+.0... > > >> > ----------------------------------------------------------------------- > > >> > > > >> > At the same time ios_mon (73976) tries to lock vproc=0xc3a35a00, > > >> > pvproc=0xc3a35a24, > > >> > which is already locked by gdb (73961) > > >> > > > >> > ----------------------------------------------------------------------- > > >> > > > >> > Stack traceback for pid 73976 > > >> > 0xdf990630 73976 73961 0 1 D 0xdf9907f0 ios_mon > > >> > EBP EIP Function (args) > > >> > 0xf3d89e58 0xc03dce0c schedule+0x3fc (0xdf990630, 0x1, 0xdf990630, > > >> > 0xc0121dd0, 0xc3a35a98) > > >> > 0xf3d89e90 0xc03dc7f5 __down+0x75 (0x79, 0xc3a35afc) > > >> > 0xf3d89ea0 0xc03dc9a6 __down_failed+0xa > > >> > 0xc0227e6c .text.lock.dvp_pvpops+0x3be > > >> > 0xf3d89f00 0xc0225fe7 pvpop_add_to_originalchild_list+0x1e7 > > >> > (0xc3a35a00, 0xc3a4a600, 0x0, 0xf3d89f40, 0xf55b4800) > > >> > 0xf3d89f28 0xc02263b1 pvpop_reassign_original_parent+0xf1 (0xc3a35800, > > >> > 0x120fc) > > >> > 0xf3d89f38 0xc022d2c8 vpop_reassign_original_parent+0x18 (0xc3a35800, > > >> > 0x120fc, 0xf5e96380, 0xf3d89f4c, 0xf3d89f4c) > > >> > 0xf3d89f5c 0xc0129272 exit_notify+0xc2 (0xdf990630, 0xdf990630, > > >> > 0xdfeaf330, 0xf3d89f88, 0x0) > > >> > 0xf3d89f88 0xc0129638 do_exit+0x1e8 (0xdf990630, 0xf5e28584, > > >> > 0xf5e28080, 0x7f, 0x0) > > >> > 0xf3d89fa8 0xc0129879 do_group_exit+0x39 (0x7f00) > > >> > 0xf3d89fb4 0xc0129915 sys_exit_group+0x15 > > >> > 0xc0106ae1 syscall_call+0x7 > > >> > [1]kdb> > > >> > ----------------------------------------------------------------------- > > >> > > > >> > md c3a35a24 > > >> > > > >> > 0xc3a35a24 00083069 00000005 df991130 00000000 i0......0....... > > >> > 0xc3a35a34 00000000 00000000 c3a35800 00000000 .........X...... > > >> > 0xc3a35a44 00000000 00000000 000120e9 000120f8 ......... ... .. > > >> > 0xc3a35a54 000120c7 000120f8 000120c7 000120f8 . ... ... ... .. > > >> > 0xc3a35a64 00000001 00000001 00000001 08800001 ................ > > >> > 0xc3a35a74 00000000 00000001 dead4ead 00000000 .........N...... > > >> > 0xc3a35a84 ffffffff 00000001 dfd746b0 00000001 .........F...... > > >> > ^^^^^^^^ > > >> > 0xc3a35a94 dead4ead f3d89e70 f3d89e70 dfd746b0 .N..p...p....F.. > > >> > ---------------------------------------------------------------------- > > >> > > > >> > and 0xc3a4a600 here is vproc for pid=73981 > > >> > > > >> > I can provide more information about the issue if it's necesary. > > >> > Do you think that work should be serialized throught nocldwait queue? > > >> > > > >> > Thanks, > > >> > Vladimir > > >> > > > >> > > > >> > On 5/26/06, Laura Ramirez <lau...@hp...> wrote: > > >> >> > > >> >> Hi Vladimir, > > >> >> > > >> >> Looking at the code more closely, I believe the "ptrace_reap.patch" > > >> >> I gave you is NOT correct. Please remove that patch. > > >> >> > > >> >> Attached is a different patch "ptrace_reap2.patch". In the original > > >> >> code, the parent who is the gdb process is not getting a SIGCHLD, > > >> >> instead the process just gets queued to the nocldwait. I believe, the > > >> >> correct path should be the gdb parent gets the SIGCHLD, goes to do the > > >> >> reap, sees its ptraced and does the ptrace_unlink(). Assuming the > > >> >> ptrace_unlink() works, it should return -EAGAIN, do the > > >> >> PVPOP_REPORT_STATE() to its original parent, sees its original > > >> parent is > > >> >> not reaping children and gets queued to the nocldwait queue, which > > >> >> should do reap successfully this time and not return -EAGAIN. With > > >> the > > >> >> attached patch, I'm hoping that's going to happen. > > >> >> > > >> >> Apply with -p0, let me know if it works or where it deviated from the > > >> >> code path i stated above. > > >> >> > > >> >> thanks, > > >> >> > > >> >> laura > > >> >> > > >> >> > > >> >> > > >> >> Vladimir Razgulin wrote: > > >> >> > Laura, > > >> >> > following your advice I got a snapshot of the moment the process > > >> >> > becomes a zombie and a work item is added into > > >> nocldwait_async_queue: > > >> >> > > > >> >> > Please, note, that > > >> >> > the process id is 74226, it's a thread process created by 74223 > > >> >> under gdb, > > >> >> > and its ppid and oppid are different - see attached info > > >> >> > > > >> >> > Thank you for your help > > >> >> > Vladimir > > >> >> > > > >> >> > Do you think I can CC: our conversation to ssic-linux-devel? > > >> >> > > > >> >> > > > >> >> > > > >> >> > > >> ------------------------------------------------------------------------------------------------------------------ > > >> > > >> >> > > >> >> > > > >> >> > 0xdfd019b0 73995 72325 0 1 S 0xdfd01b70 sshd > > >> >> > 0xf674b130 74000 73995 0 2 S 0xf674b2f0 bash > > >> >> > 0xf5f078b0 74210 73171 0 2 S 0xf5f07a70 gdb > > >> >> > 0xc3bbb930 74223 74210 0 3 S 0xc3bbbaf0 ios_mon > > >> >> > 0xdfeff230 74226 74210 1 3 R 0xdfeff3f0 *ios_mon > > >> >> > 0xdfeff7b0 74227 74210 0 0 S 0xdfeff970 ios_mon > > >> >> > 0xf5dd9830 74229 74210 0 1 S 0xf5dd99f0 ios_mon > > >> >> > > > >> >> > > >> ------------------------------------------------------------------------------------------------------------------ > > >> > > >> >> > > >> >> > > > >> >> > Stack traceback for pid 74226 > > >> >> > 0xdfeff230 74226 74210 1 3 R 0xdfeff3f0 *ios_mon > > >> >> > EBP EIP Function (args) > > >> >> > 0xec599ef4 0xc0220530 dpvproc_nocldwait_async_queue (0xf6bf1a00, > > >> >> > 0xf6be1800, 0xc040e203, 0xf6ec5efc, 0xec599e84) > > >> >> > 0xc0224dd3 pvpop_report_state+0x463 (0xf6bf1a00, > > >> >> > 0xf6be1800, 0xec599f1c, 0xec599f20, 0x1) > > >> >> > 0xec599f5c 0xc022b3a4 vpop_cleanup_vproc_relations+0x2a4 > > >> (0xf6be1800, > > >> >> > 0xdfeff230, 0xf5e42980, 0xec599f70, 0xec599f70) > > >> >> > 0xec599f80 0xc01292f7 exit_notify+0x137 (0xdfeff230, 0xc04048da, > > >> 0x4, > > >> >> > 0xec599fbc, 0x0) > > >> >> > 0xec599fac 0xc0129648 do_exit+0x1e8 > > >> >> > 0xec599fb4 0xc012983f sys_exit+0xf > > >> >> > 0xc0106ae1 syscall_call+0x7 > > >> >> > ---------------------------------------------------------------- > > >> >> > call print_task_struct 0xdfeff230 > > >> >> > > > >> >> > state=0x0 > > >> >> > flags=0x802044 > > >> >> > ptrace=0x1f1 > > >> >> > lock_depth=-1 > > >> >> > prio=116 > > >> >> > static_prio=120 > > >> >> > array=c383055c > > >> >> > sleep_avg=933888889 > > >> >> > timestamp=4295805885000000 > > >> >> > activated=0x0 > > >> >> > policy=0 > > >> >> > &cpus_allowed=0xdfeff280 > > >> >> > time_slice=17 > > >> >> > first_time_slice=1 > > >> >> > tasks.next 0xc044fc9c, tasks.prev 0xf5f0790c > > >> >> > mm=00000000 > > >> >> > active_mm=f5e42980 > > >> >> > binfmt=c04dad74 > > >> >> > exit_code=0 > > >> >> > exit_signal=-1 > > >> >> > pdeath_signal=0 > > >> >> > personality=0x400000 > > >> >> > did_exec=0 > > >> >> > pid=74226 > > >> >> > epid=74226 > > >> >> > ppid=74210 > > >> >> > tgid=74223 > > >> >> > cltnode=0 > > >> >> > p_vproc=0xf6be1800 > > >> >> > p_vfparent=0x00000000 > > >> >> > group_leader=0xc3bbb930 > > >> >> > &pids=0xdfeff2d8 > > >> >> > set_child_tid 0x00000000 > > >> >> > clear_child_tid 0x00000000 > > >> >> > rt_priority=0x0 > > >> >> > utime=0 > > >> >> > stime=0 > > >> >> > nvcsw=11 > > >> >> > nivcsw=0 > > >> >> > sig_utime=0 > > >> >> > sig_stime=0 > > >> >> > cutime=0 > > >> >> > cstime=0 > > >> >> > sig_nvcsw=0 > > >> >> > sig_nivcsw=0 > > >> >> > cnvcsw=0 > > >> >> > cnivcsw=0 > > >> >> > start_time.tv_sec=1138 > > >> >> > start_time.tv_nsec=381796336 > > >> >> > min_flt=4 > > >> >> > maj_flt=0 > > >> >> > sig_min_flt=0 > > >> >> > sig_maj_flt=0 > > >> >> > cmin_flt=0 > > >> >> > cmaj_flt=0 > > >> >> > uid=0 > > >> >> > euid=0 > > >> >> > suid=0 > > >> >> > fsuid=0 > > >> >> > gid=0 > > >> >> > egid=0 > > >> >> > sgid=0 > > >> >> > fsgid=0 > > >> >> > group_info=0xf502c080 > > >> >> > cap_effective=0xfffffeff > > >> >> > cap_inheritable=0x0 > > >> >> > cap_permitted=0xfffffeff > > >> >> > keep_capabilities=0 > > >> >> > user=0xc0454c00 > > >> >> > &rlim=0xdffdd5d4 > > >> >> > comm=ios_mon > > >> >> > locks=0 > > >> >> > link_count=0 > > >> >> > total_link_count=0 > > >> >> > semvsem.undo_list=c3b4aa20 > > >> >> > fs=0x00000000 > > >> >> > files=0x00000000 > > >> >> > namespace=0x00000000 > > >> >> > signal=0xdffdd500 > > >> >> > sighand=0xf5e29900 > > >> >> > &blocked=0xdfeff694 > > >> >> > &real_blocked=0xdfeff69c > > >> >> > &pending=0xdfeff6a4 > > >> >> > sas_ss_sp=0x0 > > >> >> > sas_ss_size=0x00000000 > > >> >> > notifier_data=0x00000000 > > >> >> > notifier_mask=0x00000000 > > >> >> > security=0x00000000 > > >> >> > audit_context=0x00000000 > > >> >> > parent_exec_id=0x12 > > >> >> > self_exec_id=0x12 > > >> >> > journal_info=0x00000000 > > >> >> > proc_dentry=0xecf027a0 > > >> >> > backing_dev_info=0x00000000 > > >> >> > io_context=0x00000000 > > >> >> > ptrace_message=0x0 > > >> >> > last_siginfo=0x00000000 > > >> >> > p_nodetime=0 > > >> >> > p_ticks_delta=0 > > >> >> > icsprio=0x0 > > >> >> > execnode=0x00000000 > > >> >> > node_context=1 > > >> >> > rcopy_task_size=0 > > >> >> > &mosix=0xdfeff764 > > >> >> > ---------------------------------------------------------- > > >> >> > call print_vproc f6be1800 > > >> >> > > > >> >> > vp_magic=0x63727076 (should be 0x63727076) > > >> >> > vp_pid=74226 > > >> >> > vp_ref_cnt=8 > > >> >> > vp_data=0xf6be1824 > > >> >> > vp_hashfwd=0xc3a60200 > > >> >> > vp_hashbwd=0x00000000 > > >> >> > Function print_vproc returned 0x19 > > >> >> > ---------------------------------------------------------------- > > >> >> > call print_pvproc f6be1824 > > >> >> > > > >> >> > pvp_flag=0x83069 > > >> >> > pvp_wstate=0x5 > > >> >> > pvp_pproc=0xdfeff230 > > >> >> > pvp_head_childl=0x00000000 > > >> >> > pvp_childl=0xf6ec5e00 > > >> >> > pvp_head_pgrpl=0x00000000 > > >> >> > pvp_pgrpl=0xf6ec5e00 > > >> >> > pvp_sessionl=0x00000000 > > >> >> > pvp_head_oclist=0x00000000 > > >> >> > pvp_oclist=0x00000000 > > >> >> > pvp_ppid=74210 > > >> >> > pvp_oppid=74223 > > >> >> > pvp_sid=73171 > > >> >> > pvp_pgid=74223 > > >> >> > pvp_pp_sid=73171 > > >> >> > pvp_pp_pgid=74223 > > >> >> > pvp_fromnode=1 > > >> >> > pvp_tonode=1 > > >> >> > pvp_cttynode=1 > > >> >> > pvp_cttydev=0x8800000 > > >> >> > pvp_jobc=0 > > >> >> > pvp_pgrp_ldr_seqno=0 > > >> >> > pvp_pgrp_mem_seqno=0 > > >> >> > pvp_fork_sigmigarg=0 > > >> >> > pvp.ml.ml_flag=0 > > >> >> > pvp.ml.ml_shr_count=0 > > >> >> > pvp.ml.ml_excl_count=0 > > >> >> > pvp_loadlevel=0 > > >> >> > pvp_pin=0 > > >> >> > pvp_localview=0 > > >> >> > Function print_pvproc returned 0x13 > > >> >> > ---------------------------------------------------------------- > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > On 5/25/06, Laura Ramirez <lau...@hp...> wrote: > > >> >> >> > > >> >> >> Hi Vladimir, > > >> >> >> > > >> >> >> maybe we need to take a step back, I was focused on why the > > >> >> >> ptrace_unlink() call was failing, which was because the child > > >> >> >> had already gotten removed from the parent list. Which is why > > >> >> >> I created the following patch, which only removed the child from > > >> >> >> the parent list if the child wasnt being ptraced (ie pvp_oppid > > >> == 0). > > >> >> >> So the ! is correct for that logic. > > >> >> >> > > >> >> >> But maybe we need to look at why this process is being reaped by > > >> >> >> the nocld_wait_daemon, instead of the gdb parent. > > >> >> >> > > >> >> >> Can you do a " call print_task_struct < address>" of the zombie > > >> >> >> processes? > > >> >> >> > > >> >> >> thanks > > >> >> >> > > >> >> >> laura > > >> >> >> > > >> >> >> Vladimir Razgulin wrote: > > >> >> >> > Laura, > > >> >> >> > The patch you sent doesn't resolve the issue: sometimes > > >> everything > > >> >> >> > works, sometimes gdb hangs with 2 zombies, sometimes the whole > > >> >> system > > >> >> >> > dies. > > >> >> >> > I took a look at the code in the patch:: > > >> >> >> > > > >> >> >> > + error = 0; > > >> >> >> > + if (!PVP(vc)->pvp_oppid) { > > >> >> >> > + error = PVPOP_RMV_CHILD_FROM_PARENT(vp, vc, FALSE); > > >> >> >> > + ptrace = 1; > > >> >> >> > + } > > >> >> >> > > > >> >> >> > Should it be without "!", liike > > >> >> >> > + if ( PVP(vc)->pvp_oppid) { > > >> >> >> > > > >> >> >> > Thanks > > >> >> >> > Vladimir > > >> >> >> > > > >> >> >> > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: > > >> >> >> >> > > >> >> >> >> I believe the __ptrace_unlink is failing because the child > > >> >> >> >> gets removed from the parent in the beginning of > > >> >> >> >> dpvproc_nocldwait_async_handler(), which is the right thing > > >> >> >> >> to do if the process isn't being ptraced. > > >> >> >> >> > > >> >> >> >> Attach is a patch that addresses this problem. Can you please > > >> >> >> >> apply and test. (use -p0 to apply) > > >> >> >> >> > > >> >> >> >> laura > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> Vladimir Razgulin wrote: > > >> >> >> >> > Laura, > > >> >> >> >> > > > >> >> >> >> > That's right, > > >> >> >> >> > __ptrace_unlink(p) fails every time inside > > >> >> >> >> > pvpop_rmv_child_from_parent() with -ESRCH > > >> >> >> >> > because it can't find process p in the PVP(w)->pvp_childl > > >> list. > > >> >> >> >> > The process p seems to be unlinked alredy - I saw that tracing > > >> >> >> >> > pvp_childl with kdb while stopping inside > > >> >> >> >> pvpop_rmv_child_from_parent(). > > >> >> >> >> > > > >> >> >> >> > Vladimir > > >> >> >> >> > > > >> >> >> >> > On 5/24/06, Laura Ramirez <lau...@hp...> wrote: > > >> >> >> >> >> Hi Vladimir, > > >> >> >> >> >> > > >> >> >> >> >> The reason why wait_task_zombie() returns EAGAIN is because > > >> >> >> >> >> its current parent is the debugger gdb instead of its > > >> original > > >> >> >> >> >> parent. The original parent needs to reap its child, so a > > >> >> >> >> >> __ptrace_unlink() is done to stop tracing the process and set > > >> >> >> >> >> the parent back to it original parent, returns EAGAIN so the > > >> >> >> >> >> reap can be done from the original parent. So > > >> __ptrace_unlink() > > >> >> >> >> >> must have failed since otherwise the second time in it > > >> wouldn't > > >> >> >> >> >> go into that code path. So if you can find out why the > > >> >> >> >> >> __ptrace_unlink() is failing, that would be a good start. > > >> >> >> >> >> > > >> >> >> >> >> laura > > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> Vladimir Razgulin wrote: > > >> >> >> >> >> > Laura, Roger, > > >> >> >> >> >> > > > >> >> >> >> >> > I've got the same problem trying to debug a multithreaded > > >> >> program > > >> >> >> >> >> with gdb: > > >> >> >> >> >> > BUG_ON with exit_signal == -1 in wait_task_zombie(). > > >> >> >> >> >> > > > >> >> >> >> >> > I applied a patch for kernel/kernel/exit.c (rev. 1.14 8 Dec > > >> >> 2005) > > >> >> >> >> - Is > > >> >> >> >> >> > it a correct one? - > > >> >> >> >> >> > and now I have dpvproc_nocldwait_async_handler() in an > > >> >> infinite > > >> >> >> >> loop, > > >> >> >> >> >> > calling pvpop_reap() and receiving -EAGAIN as an error > > >> code. > > >> >> >> >> >> > > > >> >> >> >> >> > Should wait_task_zombie() return (sometimes ;) ) p->pid > > >> >> istead of > > >> >> >> >> >> -EAGAIN? > > >> >> >> >> >> > Thanks > > >> >> >> >> >> > Vladimir > > >> >> >> >> >> > > > >> >> >> >> >> > I can provide some info about the processes... > > >> >> >> >> >> > > > >> >> >> >> >> > > > >> >> >> >> >> > On 12/6/05, Laura Ramirez <lau...@hp...> wrote: > > >> >> >> >> >> >> > > >> >> >> >> >> >> Hi Roger, > > >> >> >> >> >> >> > > >> >> >> >> >> >> A quick look at the code, there seems to be a comment > > >> >> >> >> >> >> about the ptrace vproc path, may need to be reworked > > >> for 2.6 > > >> >> >> merge. > > >> >> >> >> >> >> I dont quite remember what the issue was, it obviously > > >> >> hitting > > >> >> >> >> >> >> BUG_ON with exit_signal == -1. Can you print the vproc and > > >> >> the > > >> >> >> >> >> >> pvproc? Below, you printed the pvproc using the vproc ptr > > >> >> which > > >> >> >> >> >> >> made it look corrupt, but it really isnt, it was just the > > >> >> wrong > > >> >> >> >> print > > >> >> >> >> >> >> call. > > >> >> >> >> >> >> > > >> >> >> >> >> >> laura > > >> >> >> >> >> >> > > >> >> >> >> >> >> Roger Tsang wrote: > > >> >> >> >> >> >> > Laura, > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > I got an oops exiting from gdb while attached to > > >> >> check_bacula > > >> >> >> >> >> which had > > >> >> >> >> >> >> > segfaulted. I eventually fixed the bug in check_bacula, > > >> >> but > > >> >> >> >> take a > > >> >> >> >> >> >> look at > > >> >> >> >> >> >> > the oops below. child_reaper was waiting for > > >> check_bacula > > >> >> >> >> which is > > >> >> >> >> >> >> in E > > >> >> >> >> >> >> > state. It looks like pvproc got corrupted. > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > I'll leave this in kdb until tomorrow just in case I > > >> >> left out > > >> >> >> >> >> >> something. > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > Roger > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > pe (11)procfs: impossible type (11)procfs: impossible > > >> type > > >> >> >> >> >> (11)procfs: > > >> >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: > > >> >> >> >> >> impossible type > > >> >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type > > >> >> >> >> (11)procfs: > > >> >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: > > >> >> >> >> >> impossible type > > >> >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type > > >> >> >> >> (11)procfs: > > >> >> >> >> >> >> > impossible type (11)procfs: impossible type (11)procfs: > > >> >> >> >> >> impossible type > > >> >> >> >> >> >> > (11)procfs: impossible type (11)procfs: impossible type > > >> >> >> >> >> >> (11)ptrace_unlink: > > >> >> >> >> >> >> > vpop_reclaim failed > > >> >> >> >> >> >> > <4>------------[ cut here ]------------ > > >> >> >> >> >> >> > <1>kernel BUG at kernel/exit.c:1343! > > >> >> >> >> >> >> > <1>invalid operand: 0000 [#1] > > >> >> >> >> >> >> > <4>Modules linked in: nls_utf8 isofs zlib_inflate > > >> loop nfsd > > >> >> >> >> exportfs > > >> >> >> >> >> >> > ipt_MASQUERADE tun ipt_REJECT ipt_state ipt_multiport > > >> >> >> >> iptable_filter > > >> >> >> >> >> >> > iptable_nat ip_conntrack ip_tables binfmt_misc uhci_hcd > > >> >> >> ehci_hcd > > >> >> >> >> >> >> usbcore > > >> >> >> >> >> >> > floppy drbd bonding sata_via libata sk98lin r8169 > > >> via_rhine > > >> >> >> >> dm_mod > > >> >> >> >> >> >> > <4>CPU: 0 > > >> >> >> >> >> >> > <4>EIP: 0060:[<c011da00>] Not tainted VLI > > >> >> >> >> >> >> > <4>EFLAGS: 00010046 (2.6.10-bk7-ssi24) > > >> >> >> >> >> >> > <4>EIP is at wait_task_zombie+0x220/0x230 > > >> >> >> >> >> >> > <4>eax: 00000020 ebx: c8709020 ecx: c0430a2c edx: > > >> >> >> c0430a2c > > >> >> >> >> >> >> > <4>esi: 0001166a edi: f7d11e4c ebp: f7d11cd0 esp: > > >> >> >> f7d11c98 > > >> >> >> >> >> >> > <4>ds: 007b es: 007b ss: 0068 > > >> >> >> >> >> >> > <4>Process child_reaper (pid: 2, threadinfo=f7d10000 > > >> >> >> >> task=c1aeea80) > > >> >> >> >> >> >> > <4>Stack: c8709020 00000000 f7d11cc0 00000000 00000000 > > >> >> >> 00000000 > > >> >> >> >> >> >> 00000000 > > >> >> >> >> >> >> > 00000000 > > >> >> >> >> >> >> > <4> 00000001 0001183d 00011667 dd600600 00000000 > > >> >> >> 00000286 > > >> >> >> >> >> >> f7d11d1c > > >> >> >> >> >> >> > c011de68 > > >> >> >> >> >> >> > <4> c8709020 00000000 00000000 f7d11e4c f7d11e50 > > >> >> >> f7d10000 > > >> >> >> >> >> >> c3378a40 > > >> >> >> >> >> >> > f7d11d1c > > >> >> >> >> >> >> > more> > > >> >> >> >> >> >> > <4>Call Trace: > > >> >> >> >> >> >> > <4> [<c0104a8f>] show_stack+0x7f/0xa0 > > >> >> >> >> >> >> > <4> [<c0104c25>] show_registers+0x155/0x220 > > >> >> >> >> >> >> > <4> [<c0104fac>] die+0xcc/0x190 > > >> >> >> >> >> >> > <4> [<c01050f6>] do_trap+0x86/0xd0 > > >> >> >> >> >> >> > <4> [<c01053e8>] do_invalid_op+0xb8/0xd0 > > >> >> >> >> >> >> > <4> [<c010470b>] error_code+0x2b/0x30 > > >> >> >> >> >> >> > <4> [<c011de68>] pproc_reap+0x228/0x2f0 > > >> >> >> >> >> >> > <4> [<c020f327>] pvpop_reap+0x1d7/0x480 > > >> >> >> >> >> >> > <4> [<c020ea9f>] > > >> >> dpvproc_nocldwait_async_handler+0x13f/0x300 > > >> >> >> >> >> >> > <4> [<c01f2e31>] async_cleanup_task_structs+0x51/0x80 > > >> >> >> >> >> >> > <4> [<c022f9d5>] initproc_postroot_init+0x155/0x230 > > >> >> >> >> >> >> > <4> [<c01f92d2>] ssisys_cluster_initproc+0x12/0x20 > > >> >> >> >> >> >> > <4> [<c01f77c9>] do_ssisys+0x99/0x200 > > >> >> >> >> >> >> > <4> [<c01f797f>] sys_ssisys+0x4f/0x70 > > >> >> >> >> >> >> > <4> [<c0103c55>] sysenter_past_esp+0x52/0x75 > > >> >> >> >> >> >> > <4>Code: 04 89 02 8b 41 08 89 09 89 49 04 c6 41 0c 01 e8 > > >> >> 77 a0 > > >> >> >> >> ff ff > > >> >> >> >> >> >> e9 24 > > >> >> >> >> >> >> > ff ff ff 8b 73 10 85 f6 75 87 e9 24 fe ff ff 8d b6 00 00 > > >> >> 00 00 > > >> >> >> >> <0f> > > >> >> >> >> >> >> 0b 3f 05 > > >> >> >> >> >> >> > 53 a1 3d c0 e9 bd fe ff ff 8d 76 00 55 b9 25 00 00 > > >> >> >> >> >> >> > <4> > > >> >> >> >> >> >> > kdb> > > >> >> >> >> >> >> > kdb> bt > > >> >> >> >> >> >> > Stack traceback for pid 2 > > >> >> >> >> >> >> > 0xc1aeea80 2 0 1 0 R 0xc1aeec40 > > >> >> >> >> *child_reaper > > >> >> >> >> >> >> > EBP EIP Function (args) > > >> >> >> >> >> >> > 0xf7d11cd0 0xc011da00 wait_task_zombie+0x220 > > >> >> (0xc8709020, 0x0, > > >> >> >> >> 0x0, > > >> >> >> >> >> >> > 0xf7d11e4c, 0xf7d11e50) > > >> >> >> >> >> >> > 0xf7d11d1c 0xc011de68 pproc_reap+0x228 (0xc8709020, 0x0, > > >> >> >> >> 0xf7d11e4c, > > >> >> >> >> >> >> > 0xf7d11e50, 0x11666) > > >> >> >> >> >> >> > 0xf7d11e28 0xc020f327 pvpop_reap+0x1d7 (0xdd600600, > > >> >> >> 0xffffffff, > > >> >> >> >> >> 0x20, > > >> >> >> >> >> >> > 0x11666, 0xf7d11e4c) > > >> >> >> >> >> >> > 0xf7d11efc 0xc020ea9f > > >> dpvproc_nocldwait_async_handler+0x13f > > >> >> >> >> >> >> (0xed3c9574, > > >> >> >> >> >> >> > 0xf7d10000, 0xf7d10000, 0xc1aeea80, 0x8) > > >> >> >> >> >> >> > 0xf7d11f18 0xc01f2e31 async_cleanup_task_structs+0x51 > > >> >> >> >> >> (0xc1aeea80, 0x0, > > >> >> >> >> >> >> > 0x40000001, 0x0, 0xc022f870) > > >> >> >> >> >> >> > 0xf7d11f58 0xc022f9d5 initproc_postroot_init+0x155 > > >> >> >> >> >> >> > 0xf7d11f60 0xc01f92d2 ssisys_cluster_initproc+0x12 > > >> >> >> >> >> >> > kdb> call print_task_struct 0xc8709020 > > >> >> >> >> >> >> > state=0x20 > > >> >> >> >> >> >> > flags=0x44c > > >> >> >> >> >> >> > ptrace=0x0 > > >> >> >> >> >> >> > lock_depth=-1 > > >> >> >> >> >> >> > prio=116 > > >> >> >> >> >> >> > static_prio=120 > > >> >> >> >> >> >> > array=00000000 > > >> >> >> >> >> >> > sleep_avg=899989756 > > >> >> >> >> >> >> > interactive_credit=1 > > >> >> >> >> >> >> > timestamp=269658653961877 > > >> >> >> >> >> >> > activated=0x0 > > >> >> >> >> >> >> > policy=0 > > >> >> >> >> >> >> > &cpus_allowed=0xc870906c > > >> >> >> >> >> >> > time_slice=49 > > >> >> >> >> >> >> > first_time_slice=1 > > >> >> >> >> >> >> > tasks.next 0xc042bb78, tasks.prev 0xcc7be5a8 > > >> >> >> >> >> >> > mm=00000000 > > >> >> >> >> >> >> > active_mm=00000000 > > >> >> >> >> >> >> > binfmt=c04b3e68 > > >> >> >> >> >> >> > exit_code=9 > > >> >> >> >> >> >> > exit_signal=-1 > > >> >> >> >> >> >> > pdeath_signal=0 > > >> >> >> >> >> >> > more> > > >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input > > >> ignored > > >> >> >> >> >> >> > personality=0x0 > > >> >> >> >> >> >> > did_exec=0 > > >> >> >> >> >> >> > pid=71274 > > >> >> >> >> >> >> > epid=71274 > > >> >> >> >> >> >> > ppid=71270 > > >> >> >> >> >> >> > tgid=71271 > > >> >> >> >> >> >> > cltnode=0 > > >> >> >> >> >> >> > p_vproc=0xdd600600 > > >> >> >> >> >> >> > p_vfparent=0x00000000 > > >> >> >> >> >> >> > group_leader=0xf22f2550 > > >> >> >> >> >> >> > &pids=0xc87090c4 > > >> >> >> >> >> >> > set_child_tid 0x00000000 > > >> >> >> >> >> >> > clear_child_tid 0x00000000 > > >> >> >> >> >> >> > rt_priority=0x0 > > >> >> >> >> >> >> > it_real_value=0x0 > > >> >> >> >> >> >> > it_prof_value=0x0 > > >> >> >> >> >> >> > it_virt_value=0x0 > > >> >> >> >> >> >> > it_real_incr=0x0 > > >> >> >> >> >> >> > it_prof_incr=0x0 > > >> >> >> >> >> >> > it_virt_incr=0x0 > > >> >> >> >> >> >> > utime=0 > > >> >> >> >> >> >> > stime=0 > > >> >> >> >> >> >> > more> > > >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input > > >> ignored > > >> >> >> >> >> >> > nvcsw=3 > > >> >> >> >> >> >> > nivcsw=0 > > >> >> >> >> >> >> > sig_utime=0 > > >> >> >> >> >> >> > sig_stime=0 > > >> >> >> >> >> >> > cutime=0 > > >> >> >> >> >> >> > cstime=0 > > >> >> >> >> >> >> > sig_nvcsw=0 > > >> >> >> >> >> >> > sig_nivcsw=0 > > >> >> >> >> >> >> > cnvcsw=0 > > >> >> >> >> >> >> > cnivcsw=0 > > >> >> >> >> >> >> > start_time.tv_sec=270328 > > >> >> >> >> >> >> > start_time.tv_nsec=615704896 > > >> >> >> >> >> >> > min_flt=0 > > >> >> >> >> >> >> > maj_flt=0 > > >> >> >> >> >> >> > sig_min_flt=0 > > >> >> >> >> >> >> > sig_maj_flt=0 > > >> >> >> >> >> >> > cmin_flt=0 > > >> >> >> >> >> >> > cmaj_flt=0 > > >> >> >> >> >> >> > uid=0 > > >> >> >> >> >> >> > euid=0 > > >> >> >> >> >> >> > suid=0 > > >> >> >> >> >> >> > fsuid=0 > > >> >> >> >> >> >> > more> > > >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input > > >> ignored > > >> >> >> >> >> >> > gid=0 > > >> >> >> >> >> >> > egid=0 > > >> >> >> >> >> >> > sgid=0 > > >> >> >> >> >> >> > fsgid=0 > > >> >> >> >> >> >> > group_info=0xeaf0b980 > > >> >> >> >> >> >> > cap_effective=0xfffffeff > > >> >> >> >> >> >> > cap_inheritable=0x0 > > >> >> >> >> >> >> > cap_permitted=0xfffffeff > > >> >> >> >> >> >> > keep_capabilities=0 > > >> >> >> >> >> >> > user=0xc0431ae0 > > >> >> >> >> >> >> > &rlim=0xc1af02c4 > > >> >> >> >> >> >> > used_math=1 > > >> >> >> >> >> >> > comm=check_bacula > > >> >> >> >> >> >> > locks=0 > > >> >> >> >> >> >> > link_count=0 > > >> >> >> >> >> >> > total_link_count=1 > > >> >> >> >> >> >> > semvsem.undo_list=f4648200 > > >> >> >> >> >> >> > fs=0x00000000 > > >> >> >> >> >> >> > files=0x00000000 > > >> >> >> >> >> >> > namespace=0x00000000 > > >> >> >> >> >> >> > signal=0xc1af0240 > > >> >> >> >> >> >> > sighand=0xf6c86580 > > >> >> >> >> >> >> > more> > > >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input > > >> ignored > > >> >> >> >> >> >> > &blocked=0xc8709484 > > >> >> >> >> >> >> > &real_blocked=0xc870948c > > >> >> >> >> >> >> > &pending=0xc8709494 > > >> >> >> >> >> >> > sas_ss_sp=0x0 > > >> >> >> >> >> >> > sas_ss_size=0x00000000 > > >> >> >> >> >> >> > notifier_data=0x00000000 > > >> >> >> >> >> >> > notifier_mask=0x00000000 > > >> >> >> >> >> >> > security=0x00000000 > > >> >> >> >> >> >> > audit_context=0x00000000 > > >> >> >> >> >> >> > parent_exec_id=0x16 > > >> >> >> >> >> >> > self_exec_id=0x16 > > >> >> >> >> >> >> > journal_info=0x00000000 > > >> >> >> >> >> >> > proc_dentry=0xcb5fe8d4 > > >> >> >> >> >> >> > backing_dev_info=0x00000000 > > >> >> >> >> >> >> > io_context=0x00000000 > > >> >> >> >> >> >> > ptrace_message=0x0 > > >> >> >> >> >> >> > last_siginfo=0x00000000 > > >> >> >> >> >> >> > p_nodetime=0 > > >> >> >> >> >> >> > p_ticks_delta=0 > > >> >> >> >> >> >> > icsprio=0x0 > > >> >> >> >> >> >> > execnode=0x00000000 > > >> >> >> >> >> >> > node_context=1 > > >> >> >> >> >> >> > more> > > >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input > > >> ignored > > >> >> >> >> >> >> > rcopy_task_size=0 > > >> >> >> >> >> >> > &mosix=0xc8709504 > > >> >> >> >> >> >> > Function print_task_struct returned 0x0 > > >> >> >> >> >> >> > kdb> btp 71274 > > >> >> >> >> >> >> > Stack traceback for pid 71274 > > >> >> >> >> >> >> > 0xc8709020 71274 71270 0 0 E 0xc87091e0 > > >> >> >> >> check_bacula > > >> >> >> >> >> >> > EBP EIP Function (args) > > >> >> >> >> >> >> > 0xd8017e70 0xc03bb2e9 schedule+0x2a9 (0xc8709020, > > >> >> 0xf22f4dc0, > > >> >> >> >> >> >> 0xc012523f, > > >> >> >> >> >> >> > 0xd8017e9c, 0x0) > > >> >> >> >> >> >> > 0xd8017e9c 0xc011d516 do_exit+0x246 (0xc8709020, 0x9, > > >> >> >> 0xd8016000) > > >> >> >> >> >> >> > 0xd8017eb0 0xc011d6f5 do_group_exit+0x35 (0x9, 0x0, 0x0, > > >> >> >> >> 0xd8016000, > > >> >> >> >> >> >> > 0xd8016000) > > >> >> >> >> >> >> > 0xd8017edc 0xc0126f2b get_signal_to_deliver+0x1db > > >> >> (0xd8017f18, > > >> >> >> >> >> >> 0xd8017ef8, > > >> >> >> >> >> >> > 0xd8017fc4, 0x0, 0xc0218c59) > > >> >> >> >> >> >> > 0xd8017fa4 0xc0103a20 do_signal+0x70 (0x0, 0xcc7be550, > > >> >> >> >> 0xcc7be550, > > >> >> >> >> >> >> > 0xc8709020) > > >> >> >> >> >> >> > 0xd8017fbc 0xc0103b57 do_notify_resume+0x57 > > >> >> >> >> >> >> > 0xc0103cf6 work_notifysig+0x13 > > >> >> >> >> >> >> > kdb> btp 71270 > > >> >> >> >> >> >> > Stack traceback for pid 71270 > > >> >> >> >> >> >> > 0xcc7be550 71270 117331 0 0 R 0xcc7be710 > > >> gdb > > >> >> >> >> >> >> > EBP EIP Function (args) > > >> >> >> >> >> >> > 0xf3eade2c 0xc03bb2e9 schedule+0x2a9 (0x0, 0x0, 0x292, > > >> >> >> >> 0xe2d7fecc, > > >> >> >> >> >> >> > 0xf3eade68) > > >> >> >> >> >> >> > 0xf3eadeb8 0xc0218dbe vpop_wait+0x12e (0xdfe0c200, > > >> 0x1166a, > > >> >> >> 0xa0, > > >> >> >> >> >> 0x0, > > >> >> >> >> >> >> > 0xf3eadedc) > > >> >> >> >> >> >> > 0xf3eadf88 0xc011dacc do_wait+0xbc (0x1166a, 0x80000004, > > >> >> 0x0, > > >> >> >> >> >> 0x0, 0x0) > > >> >> >> >> >> >> > 0xf3eadfa4 0xc011dbac sys_wait4+0x3c (0x1166a, 0x0, > > >> >> >> >> 0x80000000, 0x0) > > >> >> >> >> >> >> > 0xf3eadfbc 0xc011dbd5 sys_waitpid+0x25 > > >> >> >> >> >> >> > 0xc0103c55 sysenter_past_esp+0x52 > > >> >> >> >> >> >> > kdb> btp 117331 > > >> >> >> >> >> >> > Stack traceback for pid 117331 > > >> >> >> >> >> >> > 0xf6e0ba80 117331 104395 0 0 S 0xf6e0bc40 > > >> bash > > >> >> >> >> >> >> > EBP EIP Function (args) > > >> >> >> >> >> >> > 0xd5399e2c 0xc03bb2e9 schedule+0x2a9 (0xe608ac00, > > >> >> 0xffffffff, > > >> >> >> >> 0x4, > > >> >> >> >> >> >> 0x1ca53, > > >> >> >> >> >> >> > 0xd5399e94) > > >> >> >> >> >> >> > 0xd5399eb8 0xc0218dbe vpop_wait+0x12e (0xed5e4200, > > >> >> 0xffffffff, > > >> >> >> >> 0x24, > > >> >> >> >> >> >> > 0xbffff038, 0xd5399edc) > > >> >> >> >> >> >> > 0xd5399f88 0xc011dacc do_wait+0xbc (0xffffffff, 0xe, > > >> 0x0, > > >> >> >> >> >> >> 0xbffff038, 0x0) > > >> >> >> >> >> >> > 0xd5399fa4 0xc011dbac sys_wait4+0x3c (0xffffffff, > > >> >> 0xbffff038, > > >> >> >> >> >> 0xa, 0x0) > > >> >> >> >> >> >> > 0xd5399fbc 0xc011dbd5 sys_waitpid+0x25 > > >> >> >> >> >> >> > 0xc0103c55 sysenter_past_esp+0x52 > > >> >> >> >> >> >> > kdb> > > >> >> >> >> >> >> > kdb> call print_pvproc 0xdd600600 > > >> >> >> >> >> >> > pvp_flag=0x63727076 > > >> >> >> >> >> >> > pvp_wstate=0x1166a > > >> >> >> >> >> >> > pvp_pproc=0x00000007 > > >> >> >> >> >> >> > pvp_head_childl=0xdd60061c > > >> >> >> >> >> >> > pvp_childl=0x00000000 > > >> >> >> >> >> >> > pvp_head_pgrpl=0x00000000 > > >> >> >> >> >> >> > pvp_pgrpl=0x00000000 > > >> >> >> >> >> >> > pvp_sessionl=0x00083049 > > >> >> >> >> >> >> > pvp_head_oclist=0x00000005 > > >> >> >> >> >> >> > pvp_oclist=0xc8709020 > > >> >> >> >> >> >> > pvp_ppid=0 > > >> >> >> >> >> >> > pvp_oppid=0 > > >> >> >> >> >> >> > pvp_sid=0 > > >> >> >> >> >> >> > pvp_pgid=-489161216 > > >> >> >> >> >> >> > pvp_pp_sid=0 > > >> >> >> >> >> >> > pvp_pp_pgid=0 > > >> >> >> >> >> >> > pvp_fromnode=0 > > >> >> >> >> >> >> > pvp_tonode=71270 > > >> >> >> >> >> >> > pvp_cttynode=71271 > > >> >> >> >> >> >> > pvp_cttydev=0x1ca53 > > >> >> >> >> >> >> > pvp_jobc=71271 > > >> >> >> >> >> >> > pvp_pgrp_ldr_seqno=1 > > >> >> >> >> >> >> > more> > > >> >> >> >> >> >> > Only 'q' or 'Q' are processed at more prompt, input > > >> ignored > > >> >> >> >> >> >> > pvp_pgrp_mem_seqno=-580909344 > > >> >> >> >> >> >> > pvp_fork_sigmigarg=-580909332 > > >> >> >> >> >> >> > pvp.ml.ml_flag=1 > > >> >> >> >> >> >> > pvp.ml.ml_shr_count=-580909376 > > >> >> >> >> >> >> > pvp.ml.ml_excl_count=0 > > >> >> >> >> >> >> > pvp_loadlevel=-580909332 > > >> >> >> >> >> >> > pvp_pin=0 > > >> >> >> >> >> >> > pvp_localview=0 > > >> >> >> >> >> >> > Function print_pvproc returned 0x0 > > >> >> >> >> >> >> > kdb> > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > > >> >> >> >> >> >> > > >> >> >> >> >> >> ------------------------------------------------------- > > >> >> >> >> >> >> This SF.net email is sponsored by: Splunk Inc. Do you grep > > >> >> >> >> through log > > >> >> >> >> >> >> files > > >> >> >> >> >> >> for problems? Stop! Download the new AJAX search engine > > >> >> that > > >> >> >> >> makes > > >> >> >> >> >> >> searching your log files as easy as surfing the web. > > >> >> DOWNLOAD > > >> >> >> >> >> SPLUNK! > > >> >> >> >> >> >> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > > >> >> >> >> >> >> _______________________________________________ > > >> >> >> >> >> >> ssic-linux-devel mailing list > > >> >> >> >> >> >> ssi...@li... > > >> >> >> >> >> >> > > >> https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > > >> >> >> >> >> >> > > >> >> >> >> >> > > > >> >> >> >> >> > > > >> >> >> >> >> > > >> >> >> >> > > > >> >> >> >> > > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> > > > >> >> >> > > > >> >> >> > > >> >> > > > >> >> > > > >> >> > > >> >> > > >> >> > > >> > > > >> > > > >> > > >> > > >> > > > > > > > > > > > > > > > ------------------------------------------------------- > All the advantages of Linux Managed Hosting--Without the Cost and Risk! > Fully trained technicians. The highest number of Red Hat certifications in > the hosting industry. Fanatical Support. Click to learn more > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > |