|
From: Konstantin S. <kon...@gm...> - 2009-01-26 15:46:02
|
Hello Valgrind developers, Did you ever think that two data race detectors (Helgrind and Drd) is too much for the Valgrind project? In fact, I think that two is too few. :) Please welcome ThreadSanitizer, yet another data race detector based on Valgrind. http://code.google.com/p/data-race-test/wiki/ThreadSanitizer I'd appreciate your feedback, Thanks, --kcc |
|
From: Nicholas N. <n.n...@gm...> - 2009-01-26 22:43:29
|
On Tue, Jan 27, 2009 at 2:45 AM, Konstantin Serebryany <kon...@gm...> wrote: > > Did you ever think that two data race detectors (Helgrind and Drd) is > too much for the Valgrind project? > In fact, I think that two is too few. :) > > Please welcome ThreadSanitizer, yet another data race detector based > on Valgrind. > http://code.google.com/p/data-race-test/wiki/ThreadSanitizer > > I'd appreciate your feedback, How similar is it to Helgrind -- could it conceivably be folded into Helgrind? Nick |
|
From: Konstantin S. <kon...@gm...> - 2009-01-27 05:13:36
|
On Tue, Jan 27, 2009 at 12:46 AM, Nicholas Nethercote <n.n...@gm...> wrote: > On Tue, Jan 27, 2009 at 2:45 AM, Konstantin Serebryany > <kon...@gm...> wrote: >> >> Did you ever think that two data race detectors (Helgrind and Drd) is >> too much for the Valgrind project? >> In fact, I think that two is too few. :) >> >> Please welcome ThreadSanitizer, yet another data race detector based >> on Valgrind. >> http://code.google.com/p/data-race-test/wiki/ThreadSanitizer >> >> I'd appreciate your feedback, > > How similar is it to Helgrind http://code.google.com/p/data-race-test/wiki/ThreadSanitizerVsOthers > -- could it conceivably be folded into Helgrind? Yes, and I will be glad if it does! --kcc > > Nick > |
|
From: Bart V. A. <bar...@gm...> - 2009-02-06 19:29:48
|
On Mon, Jan 26, 2009 at 4:45 PM, Konstantin Serebryany <kon...@gm...> wrote: > Did you ever think that two data race detectors (Helgrind and Drd) is > too much for the Valgrind project? > In fact, I think that two is too few. :) > > Please welcome ThreadSanitizer, yet another data race detector based > on Valgrind. > http://code.google.com/p/data-race-test/wiki/ThreadSanitizer > > I'd appreciate your feedback, Hello Konstantin, I had a quick look at the command line options supported by ThreadSanitizer, and found this: --ignore-in-dtor=yes|no, default yes, Ignore reports with a C++ class Destructor in stack trace. Can you explain why this option defaults to yes, and why it even exists ? Any race I have seen in C++ destructors until now indicated a real problem. A very common cause for data races triggered inside a C++ destructor is the following: - A class has been defined that wraps thread creation and destruction, e.g. CThread. - One of the methods in that class is declared virtual and is run inside the created thread, e.g. CThread::Run(). - There exists methods for starting the thread and waiting until the thread stopped, e.g. CThread::StartRunning() and CThread::WaitUntilStopped(). This last method notifies CThread::Run() that it should stop and then waits until that thread stopped, e.g. via pthread_join(). - Calling CThread::WaitUntilStopped() from the CThread destructor will trigger a race condition on the vtbl pointer of class CThread. This is a real bug and a bug that can cause severe trouble, not something that should be hidden. Bart. |
|
From: Konstantin S. <kon...@gm...> - 2009-02-06 20:22:00
|
Hi Bart, On Fri, Feb 6, 2009 at 10:29 PM, Bart Van Assche <bar...@gm...> wrote: > On Mon, Jan 26, 2009 at 4:45 PM, Konstantin Serebryany > <kon...@gm...> wrote: >> Did you ever think that two data race detectors (Helgrind and Drd) is >> too much for the Valgrind project? >> In fact, I think that two is too few. :) >> >> Please welcome ThreadSanitizer, yet another data race detector based >> on Valgrind. >> http://code.google.com/p/data-race-test/wiki/ThreadSanitizer >> >> I'd appreciate your feedback, > > Hello Konstantin, > > I had a quick look at the command line options supported by > ThreadSanitizer, and found this: > > --ignore-in-dtor=yes|no, default yes, Ignore reports with a C++ class > Destructor in stack trace. > > Can you explain why this option defaults to yes, and why it even > exists ? Oh well. This is probably one of the most painful questions: to choose between false positives and false negatives. So, tsan has 3 flags that control noise vs signal: --pure-happens-before (default = no): Even though I really love the default hybrid state machine, I start thinking that the default should change to --pure-happens-before=yes. Many, many developers are not used to race detectors, false positives may frighten them... More on hybrid vs pure happens-before is here: http://code.google.com/p/data-race-test/wiki/ThreadSanitizerAlgorithm#Pure_happens-before_vs_Hybrid --fast-mode (default=yes): makes sense only for the hybrid mode. A close analog of Eraser's initialization mode. Misses roughly half of the races. But also cuts the number of false positives of the hybrid machine by something like 90%. http://code.google.com/p/data-race-test/wiki/ThreadSanitizerAlgorithm#Fast_mode --ignore-in-dtor (default=yes): One of the most frequent cause of false positives is reference counting. Ref counting implemented via atomics is not understood even by pure happens-before detectotrs. And the reports are almost always in DTORs. See test401: http://code.google.com/p/data-race-test/source/browse/trunk/unittest/racecheck_unittest.cc#6468 In hybrid mode there are more reasons for this flag. Well, maybe in pure-happens-before mode the default for this should be 'no'. Dunno. Note, that even if you use the most aggressive mode (--pure-happens-before=no --fast-mode=no --ignore-in-dtor=no), it is possible to avoid all false positives by annotating the code with http://code.google.com/p/data-race-test/wiki/DynamicAnnotations But it requires some skill. IMHO, the default values should be set in a way that will not frighten novice users too much. Those who will be brave enough to read the docs and will understand the value of the tool, will likely start using more aggressive flags. Yet I have no clear vision on which settings should be default. >Any race I have seen in C++ destructors until now indicated a > real problem. A very common cause for data races triggered inside a > C++ destructor is the following: > - A class has been defined that wraps thread creation and destruction, > e.g. CThread. > - One of the methods in that class is declared virtual and is run > inside the created thread, e.g. CThread::Run(). > - There exists methods for starting the thread and waiting until the > thread stopped, e.g. CThread::StartRunning() and > CThread::WaitUntilStopped(). This last method notifies CThread::Run() > that it should stop and then waits until that thread stopped, e.g. via > pthread_join(). > - Calling CThread::WaitUntilStopped() from the CThread destructor will > trigger a race condition on the vtbl pointer of class CThread. This is > a real bug and a bug that can cause severe trouble, not something that > should be hidden. Hehe, yea... Something similar: test86 and test87 http://code.google.com/p/data-race-test/source/browse/trunk/unittest/racecheck_unittest.cc#4060 --kcc > > Bart. > |
|
From: Bart V. A. <bar...@gm...> - 2009-02-07 09:25:49
|
On Fri, Feb 6, 2009 at 9:21 PM, Konstantin Serebryany <kon...@gm...> wrote: > --ignore-in-dtor (default=yes): > One of the most frequent cause of false positives is reference counting. > Ref counting implemented via atomics is not understood even by pure > happens-before detectotrs. > And the reports are almost always in DTORs. > See test401: > http://code.google.com/p/data-race-test/source/browse/trunk/unittest/racecheck_unittest.cc#6468 Are you sure that if a pure happens before data race detector complains on reference counting that the complaint is a false positive ? In most shared pointer implementations, including shared_ptr<T> included in Boost, assigning a value to a shared pointer in one thread and calling the destructor of the shared pointer in another thread is a real race, not something that may be suppressed. This is documented behavior of the Boost shared_ptr<T> class. By the way, the threading behavior of tr1::shared_ptr<> is identical to that of Boost. This is not a coincidence: it is difficult to implement lock-free smart pointers that implement shared ownership. A quote from http://www.open-std.org/JTC1/sc22/wg21/docs/papers/2003/n1450.html#Implementation-difficulty: D. Implementation difficulty The Boost developers found a shared-ownership smart pointer exceedingly difficult to implement correctly. Others have made the same observation. For example, Scott Meyers [Meyers01] says: The STL itself contains no reference-counting smart pointer, and writing a good one - one that works correctly all the time - is tricky enough that you don't want to do it unless you have to. I published the code for a reference-counting smart pointer in More Effective C++ in 1996, and despite basing it on established smart pointer implementations and submitting it to extensive pre-publication reviewing by experienced developers, a small parade of valid bug reports has trickled in for years. The number of subtle ways in which reference-counting smart pointers can fail is remarkable. Bart. |
|
From: Konstantin S. <kon...@gm...> - 2009-02-09 14:01:12
|
On Sat, Feb 7, 2009 at 12:25 PM, Bart Van Assche <bar...@gm...> wrote: > On Fri, Feb 6, 2009 at 9:21 PM, Konstantin Serebryany > <kon...@gm...> wrote: >> --ignore-in-dtor (default=yes): >> One of the most frequent cause of false positives is reference counting. >> Ref counting implemented via atomics is not understood even by pure >> happens-before detectotrs. >> And the reports are almost always in DTORs. >> See test401: >> http://code.google.com/p/data-race-test/source/browse/trunk/unittest/racecheck_unittest.cc#6468 > > Are you sure that if a pure happens before data race detector > complains on reference counting that the complaint is a false positive > ? Of course, not. Some are false positives, some are real reports. But test401 *is* a false positive. (http://code.google.com/p/data-race-test/source/browse/trunk/unittest/racecheck_unittest.cc#6468) --kcc > In most shared pointer implementations, including shared_ptr<T> > included in Boost, assigning a value to a shared pointer in one thread > and calling the destructor of the shared pointer in another thread is > a real race, not something that may be suppressed. This is documented > behavior of the Boost shared_ptr<T> class. By the way, the threading > behavior of tr1::shared_ptr<> is identical to that of Boost. This is > not a coincidence: it is difficult to implement lock-free smart > pointers that implement shared ownership. A quote from > http://www.open-std.org/JTC1/sc22/wg21/docs/papers/2003/n1450.html#Implementation-difficulty: > > D. Implementation difficulty > > The Boost developers found a shared-ownership smart pointer > exceedingly difficult to implement correctly. Others have made the > same observation. For example, Scott Meyers [Meyers01] says: > > The STL itself contains no reference-counting smart pointer, and > writing a good one - one that works correctly all the time - is tricky > enough that you don't want to do it unless you have to. I published > the code for a reference-counting smart pointer in More Effective C++ > in 1996, and despite basing it on established smart pointer > implementations and submitting it to extensive pre-publication > reviewing by experienced developers, a small parade of valid bug > reports has trickled in for years. The number of subtle ways in which > reference-counting smart pointers can fail is remarkable. > > Bart. > |
|
From: Bart V. A. <bar...@gm...> - 2009-02-09 18:48:55
|
On Mon, Feb 9, 2009 at 3:01 PM, Konstantin Serebryany <kon...@gm...> wrote: > But test401 *is* a false positive. > (http://code.google.com/p/data-race-test/source/browse/trunk/unittest/racecheck_unittest.cc#6468) Sure, I agree that the race reported on test401 is a false positive. And the easiest way to inform a data-race detector about the inter-thread ordering introduced by reference counting is via client requests. IMHO it would be a significant advantage for Valgrind users if there would be a single set of source code annotations that is understood by all data race detectors built on top of Valgrind (Helgrind, DRD and ThreadSanitizer). Bart. |
|
From: Konstantin S. <kon...@gm...> - 2009-02-10 06:42:32
|
On Mon, Feb 9, 2009 at 9:48 PM, Bart Van Assche <bar...@gm...> wrote: > On Mon, Feb 9, 2009 at 3:01 PM, Konstantin Serebryany > <kon...@gm...> wrote: >> But test401 *is* a false positive. >> (http://code.google.com/p/data-race-test/source/browse/trunk/unittest/racecheck_unittest.cc#6468) > > Sure, I agree that the race reported on test401 is a false positive. > And the easiest way to inform a data-race detector about the > inter-thread ordering introduced by reference counting is via client > requests. > > IMHO it would be a significant advantage for Valgrind users if there > would be a single set of source code annotations that is understood by > all data race detectors built on top of Valgrind (Helgrind, DRD and > ThreadSanitizer). Yes. Another question: should these client requests be binary compatible with each other, or only source compatible? (source compatible == macro definitions with different implementations for different tools) If we push for binary compatibility, we may loose the ability to be compatible with non-valgrind-based tools. --kcc > > Bart. > |
|
From: Bart V. A. <bar...@gm...> - 2009-02-11 17:53:17
|
On Tue, Feb 10, 2009 at 7:42 AM, Konstantin Serebryany <kon...@gm...> wrote: > On Mon, Feb 9, 2009 at 9:48 PM, Bart Van Assche > <bar...@gm...> wrote: >> IMHO it would be a significant advantage for Valgrind users if there >> would be a single set of source code annotations that is understood by >> all data race detectors built on top of Valgrind (Helgrind, DRD and >> ThreadSanitizer). > > Yes. > Another question: should these client requests be binary compatible > with each other, or only source compatible? > (source compatible == macro definitions with different > implementations for different tools) > If we push for binary compatibility, we may loose the ability to be > compatible with non-valgrind-based tools. Do you think it is possible to define the annotations such that recompilation would only be needed when switching between Valgrind-tools and non-Valgrind tools ? Bart. |
|
From: Konstantin S. <kon...@gm...> - 2009-02-11 20:00:43
|
On Wed, Feb 11, 2009 at 8:53 PM, Bart Van Assche <bar...@gm...> wrote: > On Tue, Feb 10, 2009 at 7:42 AM, Konstantin Serebryany > <kon...@gm...> wrote: >> On Mon, Feb 9, 2009 at 9:48 PM, Bart Van Assche >> <bar...@gm...> wrote: >>> IMHO it would be a significant advantage for Valgrind users if there >>> would be a single set of source code annotations that is understood by >>> all data race detectors built on top of Valgrind (Helgrind, DRD and >>> ThreadSanitizer). >> >> Yes. >> Another question: should these client requests be binary compatible >> with each other, or only source compatible? >> (source compatible == macro definitions with different >> implementations for different tools) >> If we push for binary compatibility, we may loose the ability to be >> compatible with non-valgrind-based tools. > > Do you think it is possible to define the annotations such that > recompilation would only be needed when switching between > Valgrind-tools and non-Valgrind tools ? Yes, sure. either define the same client requests in all three tools or intercept the same functions (like AnnotateBlahBlahBlah()) --kcc --kcc > > Bart. > |