[Tipc-discussion] Possible spin lock deadlock

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I have code running on 4 nodes using multicast to distribute messages
between the nodes.  After some hours of sending/and receiving one or
more of my nodes will hang.  The last time 3 of 4 machines were hung and
I was able to get a dump from one of them.  This one seems to indicate
that there may be a spin lock deadlock in buf_safe_discard.  It shows up
twice in this stack dump.  It looks like the first buf_safe_discard gets
interrupted while holding the lock.  The second buf_safe_discard seems
to be called from link_recv_proto_msg (the address pointed to in
tipc_recv_msg is just after the call to link_recv_proto_msg.

SysRq : Show Regs

Pid: 1599, comm:         event_server
EIP: 0060:[<f8e69cdf>] CPU: 0
EIP is at buf_safe_discard+0x6f/0x270 [tipc]
 EFLAGS: 00000246    Not tainted  (2.6.7-rc2)
EAX: ef329bf8 EBX: ef328f50 ECX: 0b6b03c9 EDX: 00000000
ESI: ef328f94 EDI: ef326f50 EBP: efb9db48 DS: 007b ES: 007b
CR0: 8005003b CR2: 4206f5e0 CR3: 35326000 CR4: 000006c0
 [<c01032d5>] show_regs+0x145/0x170
 [<c026b541>] __handle_sysrq+0x71/0x100
 [<c02824bc>] receive_chars+0x12c/0x280
 [<c02829c6>] serial8250_interrupt+0x176/0x1d0
 [<c010785b>] handle_IRQ_event+0x3b/0x70
 [<c0107cc1>] do_IRQ+0xe1/0x230
 [<c0105cd0>] common_interrupt+0x18/0x20
 [<f8e50398>] tipc_recv_msg+0x788/0x8a0 [tipc]
 [<f8e6e2f9>] recv_msg+0x39/0x50 [tipc]
 [<c037e052>] netif_receive_skb+0x172/0x1b0
 [<c037e114>] process_backlog+0x84/0x120
 [<c037e230>] net_rx_action+0x80/0x120
 [<c0126068>] __do_softirq+0xb8/0xc0
 [<c01260a5>] do_softirq+0x35/0x40
 [<f8e69d23>] buf_safe_discard+0xb3/0x270 [tipc]
 [<f8e66723>] nameseq_deliver+0x83/0x420 [tipc]
 [<f8e66ced>] bcast_port_recv+0x4d/0x80 [tipc]
 [<f8e67e65>] tipc_forward_buf2nameseq+0x1c5/0x270 [tipc]
 [<f8e681eb>] tipc_multicast+0x2db/0x4e0 [tipc]
 [<f8e6b20a>] send_msg+0x18a/0x210 [tipc]
 [<c03747ce>] sock_sendmsg+0x8e/0xb0
 [<c0375dc1>] sys_sendto+0xe1/0x100
 [<c03766ba>] sys_socketcall+0x17a/0x240
 [<c0105363>] syscall_call+0x7/0xb

-- 
Mark Haverkamp <ma...@os...>

[Tipc-discussion] Possible spin lock deadlock

Cluster wide IPC providing datagram, connection, and bus messaging

[Tipc-discussion] Possible spin lock deadlock