From: Mark H. <ma...@os...> - 2004-06-07 17:40:57
|
On Mon, 2004-06-07 at 09:58, Mark Haverkamp wrote: > I have code running on 4 nodes using multicast to distribute messages > between the nodes. After some hours of sending/and receiving one or > more of my nodes will hang. The last time 3 of 4 machines were hung and > I was able to get a dump from one of them. This one seems to indicate > that there may be a spin lock deadlock in buf_safe_discard. It shows up > twice in this stack dump. It looks like the first buf_safe_discard gets > interrupted while holding the lock. The second buf_safe_discard seems > to be called from link_recv_proto_msg (the address pointed to in > tipc_recv_msg is just after the call to link_recv_proto_msg. After talking with Daniel, this is probably not a dead lock. Since the spin_lock_bh stops the softirqs. Anyway, I'll try to put in some debug code in the buf_safe_discard/buf_discard functions to try to narrow it down. Mark. -- Mark Haverkamp <ma...@os...> |