From: Daniel M. <da...@os...> - 2004-05-13 16:12:44
|
On Thu, 2004-05-13 at 08:10, Mark Haverkamp wrote: > On Wed, 2004-05-12 at 13:25, Mark Haverkamp wrote: > > On Wed, 2004-05-12 at 12:09, Jon Maloy wrote: > > > Yeah, that's a classical one. I also think your solution is ok; the > > > pending ports will be awakened at next message reception, so > > > no harm done. > > > > > > Thanks /jon > > > > > I found another one. I got another hang yesterday after the current > deadlock fix. I re-added my spinlock debug code and found out that > we're getting a deadlock between the node lock and the tipc_port lock. > It looks like the port timeout handler is running on one CPU and a > recv_msg is running on the other. I suppose as a workaround, we could > make all the spin lock access in wakeup be conditional, but that will > probably just make the problem just show up somewhere else. There > should probably be an analysis of code paths and determine how the locks > interact with each other. I agree. The locking hierarchy should be documented. Even if this is just comments a source file, it needs to be done. It should also include exactly what the lock is protecting. > I have noticed that there is at least one > place where three locks are required. This can cause problems like > we've seen when different code paths need multiple locks unless there is > some sort of back off method to insure no deadlocks. > We also need to analyze if the multiple locks are actually giving us any performance and/or parallelism benefits. If we have to take multiple locks in a common path, that might be causing worse performance and more deadlock potential. Daniel |