From: Mark H. <ma...@os...> - 2004-05-06 22:41:27
|
Jon, Daniel and I have been seeing a tipc hang for 3 or 4 weeks when we kill a running application in a certain order. While running the tipc benchmark program we can get tipc to hang the computer by killing the client while it has the 32 processes running. Although, to get the hang, I have to have tried to run some management port accesses which are stalled due to congestion. After doing some tracing, I have narrowed it down to an exiting process spinning while trying to get the node lock. Our assumption is that some other process hasn't released the lock by accident, although its not obvious where. I have included the stack dump from the sysrq P console command. SysRq : Show Regs Pid: 2001, comm: client_tipc_tp EIP: 0060:[<f8a913d9>] CPU: 0 EIP is at .text.lock.link+0xd7/0x3ce [tipc] EFLAGS: 00000286 Not tainted (2.6.6-rc3) EAX: f7c8ef6c EBX: 00000000 ECX: 01001011 EDX: 00000013 ESI: f7c8eee0 EDI: f359a000 EBP: f359bcf8 DS: 007b ES: 007b CR0: 8005003b CR2: 080e2ce8 CR3: 0053d000 CR4: 000006d0 Call Trace: [<c0126a38>] __do_softirq+0xb8/0xc0 [<f8a9818b>] net_route_msg+0x48b/0x4ad [tipc] [<c015b3a1>] __pte_chain_free+0x81/0x90 [<f8a99e6e>] port_send_proto_msg+0x1ae/0x2d0 [tipc] [<f8a9af73>] port_abort_peer+0x83/0x90 [tipc] [<f8a999a1>] tipc_deleteport+0x181/0x2a0 [tipc] [<f8aa7ae2>] release+0x72/0x130 [tipc] [<c0378ff9>] sock_release+0x99/0xf0 [<c0379a16>] sock_close+0x36/0x50 [<c016740d>] __fput+0x12d/0x140 [<c0165857>] filp_close+0x57/0x90 [<c0123adc>] put_files_struct+0x7c/0xf0 [<c0124b1c>] do_exit+0x26c/0x600 [<c012cc05>] __dequeue_signal+0xf5/0x1b0 [<c0125057>] do_group_exit+0x107/0x190 [<c012cced>] dequeue_signal+0x2d/0x90 [<c012f14c>] get_signal_to_deliver+0x28c/0x590 [<c0105286>] do_signal+0xb6/0xf0 [<c037a736>] sys_send+0x36/0x40 [<c037af8e>] sys_socketcall+0x12e/0x240 [<c010531b>] do_notify_resume+0x5b/0x5d [<c010554a>] work_notifysig+0x13/0x15 You can see that the process is trying to exit. I have traced the EIP to the spin_lock_bh(&node->lock) in link_lock_select from a disassembly of link.o. Any ideas on this? Mark. -- Mark Haverkamp <ma...@os...> |