|
From: David F. <fa...@kd...> - 2012-11-07 23:16:00
|
On Wednesday 07 November 2012 23:00:51 Philippe Waroquiers wrote:
> On Wed, 2012-11-07 at 10:51 +0100, David Faure wrote:
> > > The idea of helgrind is that it detects lock order problems and/or
> > > race condition problems *even* if no deadlock happens and/or if no
> > > race condition really happened.
> > > Maybe it is very unlikely that the trylock fails. Still would be nice
> > > to discover the bug. And if the trylock does not fail, then the race
> > > condition will then not be detected by helgrind.
> >
> > The simpler construct that can lead to this problem is
> > * trylock mutex1
> > * access shared data
>
> "discover the bug" is related to the doubful construct, not
> to a race condition
If there's no race condition and no deadlock, I'm not sure what bug you want
to detect :-)
> Note that currently, laog is producing messages which should
> be considered as "lock order warning", not as
> "for sure there is a deadlock order problem".
Yes, but why does it warn about lock order? Because it could cause deadlocks.
I agree, this is about potential deadlocks, not actual deadlocks. But
trylock 1 + trylock 2 vs trylock2 + trylock1 (the case we're talking about in
this part of the mail) is not even a potential deadlock. It can't ever
deadlock. So there's nothing to warn about.
> The trylock is one case of "warning, not an error" which could/should
> be improved.
> But there are others e.g. laog does not understand the concept of
> "guard locks" which is (IIUC): each thread can acquire a single lock
> in a set of locks. If a thread wants to acquire more than one lock
> (in any order then), it first has to acquire the "guard lock",
> and then can lock in any order any nr of locks in the lock set.
> With this guard lock, not possible to have a deadlock,
> but for sure this is not understood by the current helgrind
> laog algorithm.
Right. That one definitely needs annotations in the source code, I would think.
There's no way for the tool to detect that these mutexes all go together.
> > At the time of the trylock, it is the last one -> no warning at that
> > precise moment. This sounds like a simple enough change in the current
> > algorithm? Basically adding one if() ... if only I knew where ;)
>
> To avoid doing a "lock order warning" when doing the trylock is easy
> I believe:
> in hg_main.c:3697, put a condition
> 'if (!is_a_try_lock)
> before:
> other = laog__do_dfs_from_to(lk, thr->locksetA);
> if (other) {
> ....
> (where is_a_trylock has to be given by the caller).
I gave it a try, but I'm hitting a problem with exactly that, passing
isTryLock to that code. isTryLock is set in HG_PTHREAD_MUTEX_LOCK_PRE and
similar, while the above code is called from HG_PTHREAD_MUTEX_LOCK_POST and
similar. If I understand correctly, adding an argument to the _POST variant
would break source compatibility for the existing userland macros?
> I think it is almost mechanical work to add arguments to *POST event
> handlers and corresponding requests to transfer the "is a try lock"
> from the helgrind interception to the line 3697).
Ah OK, so you don't seem to see a problem with adding an argument :)
No source compatibility for the user request macros is OK?
I'll finish the patch then, but only if you agree with the approach, otherwise
this would be dead code, i.e. a wasted effort. At this point you don't seem
fully convinced :)
> But I suspect that the insertion of a trylock in the graph
> might later on cause a 'wrong' cycle to be detected.
> E.g. (L = lock, T = trylock, L and T followed by lock nr)
> threadA L1 T2
> threadB L2 L3
> threadC L3 L1
> cannot deadlock (I think :) if threadA releases lock 1
> when T2 fails.
Well, if T2 fails then we have no cycle, and if it succeeds we have a real
potential deadlock. The question is whether we want to remember the T2 attempt
(and warn later) even when T2 fails. I would say, if it failed, it's like it
didn't happen.
Do you actually store failed trylocks currently?
> But when L3 L1 will be inserted, a cycle will be found
> via T2 (if the graph has not remembered this is a trylock).
> So, I am still (somewhat intuitively) thinking that we need
> to have nodes and/or edges marked with "this is a trylock"
> and have the graph exploration taking these marking into account
> to not generate such "false warnings".
My idea is rather that a successful trylock is just like a lock (except that
it shouldn't warn at that precise moment, but if any later real-lock happens
out of order with it, then we should warn, this is why successful trylocks
should be recorded), and a failed trylock should NOT be recorded.
So at this point I don't see a need for remembering "this was a trylock". I
don't see how this could matter later on.
--
David Faure, fa...@kd..., http://www.davidfaure.fr
Working on KDE, in particular KDE Frameworks 5
|