|
From: David F. <fa...@kd...> - 2012-11-07 09:49:24
|
On Tuesday 06 November 2012 22:56:32 Philippe Waroquiers wrote: > On Tue, 2012-11-06 at 13:43 +0100, David Faure wrote: > > On Monday 05 November 2012 23:19:42 Philippe Waroquiers wrote: > > > On Mon, 2012-11-05 at 18:59 +0100, David Faure wrote: > > > > The testcase http://www.davidfaure.fr/2012/qmutex_trylock.cpp > > > > (from https://bugs.kde.org/show_bug.cgi?id=243232) > > > > shows that an optimization inside Qt leads to a helgrind warning about > > > > wrong lock ordering, making the use of that feature impossible for > > > > detecting actual problems elsewhere (i.e. I have to use > > > > --track-lockorders=no all the time). > > > > > > > > Technically if we ignore the distinction between lock and tryLock, > > > > helgrind is right, we did lock the mutexes in reverse order. But > > > > because > > > > it's a tryLock, it couldn't possibly deadlock. > > > > > > > > Should helgrind simply ignore all pthread_mutex_trylock calls, for the > > > > lockorder detection, even if they succeed? I think so, actually (by > > > > definition they couldn't deadlock, which is what track-lockorders is > > > > all > > > > about). > > > > > > A deadlock can appear if there is a mixture of trylock and lock on > > > the same lock. So, trylock cannot just be ignored. > > > E.g. > > > > > > Thread 1: trylock mutex1 > > > > > > lock mutex2 > > > > > > Thread 2: lock mutex2 > > > > > > lock mutex1 > > > > > > might deadlock. > > > > True. > > This means that only "trylock in second place" should be ignored. More on > > this below. > > More generally, I guess that you mean "trylock in last place" should > be ignored (rather than the special case of 2nd place). Right. > This might be difficult to implement as each time a lock is taken, > helgrind checks for order violation. I suspect a later lock operation > might then transform a "trylock in last place" to a "trylock which is > now not anymore in last place". Right. So let's record it, but let's not warn at the precise time the successful trylock happens. If another lock happens later, out of order due to the successful trylock, then yes, let's warn. > But of course, when the trylock operation has just been done, > this trylock is "last place" and so if we would ignore it, then > this would be similar to always ignore the trylock, which is not ok. Right, not "ignore it as if it didn't happen", but "ignore it as in don't warn right now", and still record it. > Currently, helgrind maintains a graph of lock order. > I suspect we might need different graph node types and/or edge types > to cope with trylock. For sure, more investigations needed looking > in depth at the current algorithm. I think it would be good enough to add the successful trylock to the graph, just without a warning at that point. > > > Even without mixture, isn't the below slightly bizarre/dangerous ? > > > > > > Thread 1: trylock mutex1 > > > > > > trylock mutex2 > > > > > > Thread 2: trylock mutex2 > > > > > > trylock mutex1 > > > > No deadlock can ever happen here. > > Yes, no deadlock can occur. However, this is really a really doubful > construction. The question is: should helgrind report a "lock order" > warning for such constructs ? I don't think so, due to the valid use cases I'm showing here. > The idea of helgrind is that it detects lock order problems and/or > race condition problems *even* if no deadlock happens and/or if no > race condition really happened. > Maybe it is very unlikely that the trylock fails. Still would be nice > to discover the bug. And if the trylock does not fail, then the race > condition will then not be detected by helgrind. The simpler construct that can lead to this problem is * trylock mutex1 * access shared data If the trylock fails, then we have a race condition. If it succeeds, then there is no problem. I don't see how you can detect the potential race condition with helgrind when the trylock succeeds, unless you go as far as saying "all trylocks are bad" (which was actually my thinking until reading the code of QOrderedMutexLocker which has a valid use case for trylock -- maybe there are more, of course). But this seems impossible to detect. I mean, the code could say if (trylock succeeds) // do something else // do something else You're working on a runtime analysis tool, not on a static analysis tool, so you can't know anything about the branch that is not being taken in a given run of helgrind. Therefore I see no way of saying "there could have been a race if this trylock had failed, but it actually succeeded in this run". So I don't disagree on "it would be nice to discover the bug", but if it's impossible then let's drop the idea and come back to what is actually possible :) > > Yes this is exactly that QOrderedMutexLocker does. > > > > Thread 1: lock mutex1 > > > > lock mutex2 > > > > Thread 2: lock mutex2 > > > > trylock mutex1 > > if that failed, > > > > unlock mutex2 > > lock mutex1 > > lock mutex2 > > QOrderedMutexLocker really retries locking in an other order > than the operations done by Thread 2 ? > Or is that a typo ? Did you read it wrong? It ends up locking mutex1 and then mutex2, in the "retry" block. This is in the same order as thread 1. QOrderedMutexLocker starts at "trylock mutex1", i.e. we ask that class to ensure that mutex1 and mutex2 are locked, but it could happen that we're calling that class with mutex2 already locked. So it will try to lock mutex1, and if that fails (already locked), then unlock+lock1+lock2. If it succeeds, then no other thread had locked mutex1, so no deadlock. Yes, even though we ended up locking out of order (2 then 1). > For sure, generally, an application can do a big variety of behaviours. > I suspect however that an application might (this is sane at least) try > to obey a certain "design pattern" such as: > in all cases, if a trylock fails, unlock all and retry all. This is too monolithic thinking :) An application that uses a given framework (e.g. Qt) could very well be doing things differently than the framework does internally. As a developer debugging a large application written by other people, using a framework written by other people, how can I guarantee helgrind that all trylock uses follow a single design pattern? > > Deadlock can only happen if both threads use normal locking as their > > second operation. A trylock as the second operation doesn't deadlock. > > Replace second by last. But how do we know that an operation is the last > one ? I think this is a non trivial change in the current algorithm. At the time of the trylock, it is the last one -> no warning at that precise moment. This sounds like a simple enough change in the current algorithm? Basically adding one if() ... if only I knew where ;) -- David Faure, fa...@kd..., http://www.davidfaure.fr Working on KDE, in particular KDE Frameworks 5 |