Thread: [Tipc-discussion] hang while deleting ports

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Jon,

Daniel and I have been seeing a tipc hang for 3 or 4 weeks when we kill
a running application in a certain order.

While running the tipc benchmark program we can get tipc to hang the
computer by killing the client while it has the 32 processes running.
Although, to get the hang, I have to have tried to run some management
port accesses which are stalled due to congestion.  After doing some
tracing, I have narrowed it down to an exiting process spinning while
trying to get the node lock.  Our assumption is that some other process
hasn't released the lock by accident, although its not obvious where.  I
have included the stack dump from the sysrq P console command.

SysRq : Show Regs

Pid: 2001, comm:       client_tipc_tp
EIP: 0060:[<f8a913d9>] CPU: 0
EIP is at .text.lock.link+0xd7/0x3ce [tipc]
 EFLAGS: 00000286    Not tainted  (2.6.6-rc3)
EAX: f7c8ef6c EBX: 00000000 ECX: 01001011 EDX: 00000013
ESI: f7c8eee0 EDI: f359a000 EBP: f359bcf8 DS: 007b ES: 007b
CR0: 8005003b CR2: 080e2ce8 CR3: 0053d000 CR4: 000006d0
Call Trace:
 [<c0126a38>] __do_softirq+0xb8/0xc0
 [<f8a9818b>] net_route_msg+0x48b/0x4ad [tipc]
 [<c015b3a1>] __pte_chain_free+0x81/0x90
 [<f8a99e6e>] port_send_proto_msg+0x1ae/0x2d0 [tipc]
 [<f8a9af73>] port_abort_peer+0x83/0x90 [tipc]
 [<f8a999a1>] tipc_deleteport+0x181/0x2a0 [tipc]
 [<f8aa7ae2>] release+0x72/0x130 [tipc]
 [<c0378ff9>] sock_release+0x99/0xf0
 [<c0379a16>] sock_close+0x36/0x50
 [<c016740d>] __fput+0x12d/0x140
 [<c0165857>] filp_close+0x57/0x90
 [<c0123adc>] put_files_struct+0x7c/0xf0
 [<c0124b1c>] do_exit+0x26c/0x600
 [<c012cc05>] __dequeue_signal+0xf5/0x1b0
 [<c0125057>] do_group_exit+0x107/0x190
 [<c012cced>] dequeue_signal+0x2d/0x90
 [<c012f14c>] get_signal_to_deliver+0x28c/0x590
 [<c0105286>] do_signal+0xb6/0xf0
 [<c037a736>] sys_send+0x36/0x40
 [<c037af8e>] sys_socketcall+0x12e/0x240
 [<c010531b>] do_notify_resume+0x5b/0x5d
 [<c010554a>] work_notifysig+0x13/0x15

You can see that the process is trying to exit. I have traced the EIP to
the spin_lock_bh(&node->lock) in link_lock_select from a disassembly of
link.o.

Any ideas on this?

Mark.
-- 
Mark Haverkamp <ma...@os...>

Thread: [Tipc-discussion] hang while deleting ports

Cluster wide IPC providing datagram, connection, and bus messaging

tipc-discussion