Thread: [Tipc-discussion] spinlock deadlock

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Here is what looks to be happening with the spin lock deadlock.  I
replaced all the spin_lock_bh calls with a wrapper that tries to get the
lock for a while then prints out a debug message if it can't get the
lock.

As an experiment, I changed the spin_lock_bh in link_wakeup_ports to a
trylock and exited if it couldn't get the lock.  I am now not able to
get the deadlock.

CPU 0:
release -- 
   tipc_delete_port (get port lock) --
      port_abort_peer --
         port_send_proto_msg --
            net_route_msg --
               link_send (get node lock) -- (hung spinning)

CPU 1:
common_interrupt --
   do_softirq --
      net_rx_action --
         netif_receive_skb --
            recv_msg (tipc eth) --
               tipc_recv_msg (get node lock) --
                  link_wakeup_ports (get port lock) -- (hung spinning)

Stack dumps:

port lock timeout
Call Trace:
 [<f8a837ab>] link_wakeup_ports+0x9b/0x230 [tipc]
 [<f8a87c2e>] tipc_recv_msg+0x7fe/0x8c0 [tipc]
 [<c014949d>] __kmalloc+0x19d/0x250
 [<f8aa5db9>] recv_msg+0x39/0x50 [tipc]
 [<c0375af2>] netif_receive_skb+0x172/0x1b0
 [<c0375bb4>] process_backlog+0x84/0x120
 [<c0375cd0>] net_rx_action+0x80/0x120
 [<c0124bc8>] __do_softirq+0xb8/0xc0
 [<c0124c05>] do_softirq+0x35/0x40
 [<c0107ce5>] do_IRQ+0x175/0x230
 [<c0105ce0>] common_interrupt+0x18/0x20
 [<c0221c91>] copy_from_user+0x1/0x80
 [<f8a866bf>] link_send_sections_long+0x30f/0xb30 [tipc]
 [<c0221694>] __delay+0x14/0x20
 [<f8a8366f>] link_schedule_port+0x13f/0x1e0 [tipc]
 [<f8a860f5>] link_send_sections_fast+0x5b5/0x870 [tipc]
 [<c011b12a>] __wake_up_common+0x3a/0x60
 [<f8a97bf2>] tipc_send+0x92/0x9d0 [tipc]
 [<c011d736>] __mmdrop+0x36/0x50
 [<c03f15b7>] schedule+0x467/0x7a0
 [<f8aa33e6>] recv_msg+0x2b6/0x560 [tipc]
 [<f8aa2d90>] send_packet+0x90/0x180 [tipc]
 [<c011b0d0>] default_wake_function+0x0/0x20
 [<c036c83e>] sock_sendmsg+0x8e/0xb0
 [<f8aa5db9>] recv_msg+0x39/0x50 [tipc]
 [<c01435ba>] buffered_rmqueue+0xfa/0x220
 [<c036c61a>] sockfd_lookup+0x1a/0x80
 [<c036dd61>] sys_sendto+0xe1/0x100
 [<c0128f62>] del_timer_sync+0x42/0x140
 [<c036d109>] sock_poll+0x29/0x30
 [<c017884b>] do_pollfd+0x5b/0xa0
 [<c036ddb6>] sys_send+0x36/0x40
 [<c036e60e>] sys_socketcall+0x12e/0x240
 [<c0105373>] syscall_call+0x7/0xb

&node->lock lock timeout
Call Trace:
 [<f8a8549a>] link_send+0xda/0x2a0 [tipc]
 [<f8a92cee>] net_route_msg+0x41e/0x43d [tipc]
 [<f8a949c2>] port_send_proto_msg+0x1a2/0x2a0 [tipc]
 [<f8a95983>] port_abort_peer+0x83/0x90 [tipc]
 [<f8a9458f>] tipc_deleteport+0x19f/0x280 [tipc]
 [<f8aa25b2>] release+0x72/0x130 [tipc]
 [<c036c76b>] sock_release+0x7b/0xc0
 [<c036d176>] sock_close+0x36/0x50
 [<c016315a>] __fput+0x10a/0x120
 [<c0161597>] filp_close+0x57/0x90
 [<c0121dbc>] put_files_struct+0x7c/0xf0
 [<c0122d5a>] do_exit+0x23a/0x5a0
 [<c012aa35>] __dequeue_signal+0xf5/0x1b0
 [<c0123240>] do_group_exit+0xe0/0x150
 [<c012ab1d>] dequeue_signal+0x2d/0x90
 [<c012cbef>] get_signal_to_deliver+0x26f/0x510
 [<c0105136>] do_signal+0xb6/0xf0
 [<c036ddb6>] sys_send+0x36/0x40
 [<c036e60e>] sys_socketcall+0x12e/0x240
 [<c01051cb>] do_notify_resume+0x5b/0x5d
 [<c01053be>] work_notifysig+0x13/0x15

-- 
Mark Haverkamp <ma...@os...>

Thread: [Tipc-discussion] spinlock deadlock

Cluster wide IPC providing datagram, connection, and bus messaging

tipc-discussion