From: Mark H. <ma...@os...> - 2004-05-12 18:24:38
|
Here is what looks to be happening with the spin lock deadlock. I replaced all the spin_lock_bh calls with a wrapper that tries to get the lock for a while then prints out a debug message if it can't get the lock. As an experiment, I changed the spin_lock_bh in link_wakeup_ports to a trylock and exited if it couldn't get the lock. I am now not able to get the deadlock. CPU 0: release -- tipc_delete_port (get port lock) -- port_abort_peer -- port_send_proto_msg -- net_route_msg -- link_send (get node lock) -- (hung spinning) CPU 1: common_interrupt -- do_softirq -- net_rx_action -- netif_receive_skb -- recv_msg (tipc eth) -- tipc_recv_msg (get node lock) -- link_wakeup_ports (get port lock) -- (hung spinning) Stack dumps: port lock timeout Call Trace: [<f8a837ab>] link_wakeup_ports+0x9b/0x230 [tipc] [<f8a87c2e>] tipc_recv_msg+0x7fe/0x8c0 [tipc] [<c014949d>] __kmalloc+0x19d/0x250 [<f8aa5db9>] recv_msg+0x39/0x50 [tipc] [<c0375af2>] netif_receive_skb+0x172/0x1b0 [<c0375bb4>] process_backlog+0x84/0x120 [<c0375cd0>] net_rx_action+0x80/0x120 [<c0124bc8>] __do_softirq+0xb8/0xc0 [<c0124c05>] do_softirq+0x35/0x40 [<c0107ce5>] do_IRQ+0x175/0x230 [<c0105ce0>] common_interrupt+0x18/0x20 [<c0221c91>] copy_from_user+0x1/0x80 [<f8a866bf>] link_send_sections_long+0x30f/0xb30 [tipc] [<c0221694>] __delay+0x14/0x20 [<f8a8366f>] link_schedule_port+0x13f/0x1e0 [tipc] [<f8a860f5>] link_send_sections_fast+0x5b5/0x870 [tipc] [<c011b12a>] __wake_up_common+0x3a/0x60 [<f8a97bf2>] tipc_send+0x92/0x9d0 [tipc] [<c011d736>] __mmdrop+0x36/0x50 [<c03f15b7>] schedule+0x467/0x7a0 [<f8aa33e6>] recv_msg+0x2b6/0x560 [tipc] [<f8aa2d90>] send_packet+0x90/0x180 [tipc] [<c011b0d0>] default_wake_function+0x0/0x20 [<c036c83e>] sock_sendmsg+0x8e/0xb0 [<f8aa5db9>] recv_msg+0x39/0x50 [tipc] [<c01435ba>] buffered_rmqueue+0xfa/0x220 [<c036c61a>] sockfd_lookup+0x1a/0x80 [<c036dd61>] sys_sendto+0xe1/0x100 [<c0128f62>] del_timer_sync+0x42/0x140 [<c036d109>] sock_poll+0x29/0x30 [<c017884b>] do_pollfd+0x5b/0xa0 [<c036ddb6>] sys_send+0x36/0x40 [<c036e60e>] sys_socketcall+0x12e/0x240 [<c0105373>] syscall_call+0x7/0xb &node->lock lock timeout Call Trace: [<f8a8549a>] link_send+0xda/0x2a0 [tipc] [<f8a92cee>] net_route_msg+0x41e/0x43d [tipc] [<f8a949c2>] port_send_proto_msg+0x1a2/0x2a0 [tipc] [<f8a95983>] port_abort_peer+0x83/0x90 [tipc] [<f8a9458f>] tipc_deleteport+0x19f/0x280 [tipc] [<f8aa25b2>] release+0x72/0x130 [tipc] [<c036c76b>] sock_release+0x7b/0xc0 [<c036d176>] sock_close+0x36/0x50 [<c016315a>] __fput+0x10a/0x120 [<c0161597>] filp_close+0x57/0x90 [<c0121dbc>] put_files_struct+0x7c/0xf0 [<c0122d5a>] do_exit+0x23a/0x5a0 [<c012aa35>] __dequeue_signal+0xf5/0x1b0 [<c0123240>] do_group_exit+0xe0/0x150 [<c012ab1d>] dequeue_signal+0x2d/0x90 [<c012cbef>] get_signal_to_deliver+0x26f/0x510 [<c0105136>] do_signal+0xb6/0xf0 [<c036ddb6>] sys_send+0x36/0x40 [<c036e60e>] sys_socketcall+0x12e/0x240 [<c01051cb>] do_notify_resume+0x5b/0x5d [<c01053be>] work_notifysig+0x13/0x15 -- Mark Haverkamp <ma...@os...> |