|
From: <sv...@va...> - 2008-02-17 18:51:05
|
Author: bart Date: 2008-02-17 18:51:06 +0000 (Sun, 17 Feb 2008) New Revision: 7420 Log: Rewrote the README.txt document. Modified: trunk/exp-drd/docs/README.txt Modified: trunk/exp-drd/docs/README.txt =================================================================== --- trunk/exp-drd/docs/README.txt 2008-02-17 18:13:00 UTC (rev 7419) +++ trunk/exp-drd/docs/README.txt 2008-02-17 18:51:06 UTC (rev 7420) @@ -1,51 +1,120 @@ DRD: a Data Race Detector ========================= -Last update: December 3, 2007 by Bart Van Assche. +Last update: February 16, 2008 by Bart Van Assche. -The Difficulty of Multithreading Programming --------------------------------------------- -Multithreading is a concept to model multiple concurrent activities within a -single process. Since the invention of the multithreading concept, there is an -ongoing debate about which way to model concurrent activities is better -- -multithreading or message passing. This debate exists because -multithreaded programming is error prone: multithreaded programs can exhibit -data races and/or deadlocks. Despite these risks multithreaded programming is -popular: for many applications multithreading is a more natural programming -style, and multithreaded code often runs faster than the same application -implemented via message passing. +Introduction +------------ -In the context of DRD, a data race is defined as two concurrent memory -accesses, where at least one of these two memory accesses is a store operation, -and these accesses are not protected by proper locking constructs. Data -races are harmful because these may lead to unpredictable results in -multithreaded programs. There is a general consensus that data races -should be avoided in multithreaded programs. +Multithreading is a concept to model multiple concurrent activities +within a single process. Each such concurrent activity is called a +thread. All threads that are active within a process share the same +set of memory locations. Data is exchanged between threads by writing +to and reading from the shared memory. Since the invention of the +multithreading concept, there is an ongoing debate about which way to +model concurrent activities is better -- shared memory programming or +message passing [Ousterhout 1996]. This debate exists because each +model has significant advantages and disadvantages. While shared +memory programming relieves the programmer from writing code for the +exchange of data between concurrent activities and while shared memory +programming has a performance advantage over message passing, shared +memory programming is error prone. Shared memory programs can exhibit +data races and/or deadlocks. Data races are harmful because these may +lead to unpredictable results and nondeterministic behavior in +multithreaded programs. There are two ways to detect data races and +deadlocks: static analysis and runtime detection by a tool. Since +there do not yet exist any tools that can carry out static analysis of +data races or deadlocks, the only option to statically detect such +anomolies is source reading by a human. It takes a huge effort however +to detect all possible data races or deadlocks via source +reading. This is why tools for detecting data races and deadlocks at +runtime are essential. +Data Races +---------- + +Threads in a multithreaded process exchange information by writing to +and reading from memory locations shared by the threads. Two accesses +to the same memory location by different threads are called +conflicting accesses if at least one of these two accesses modifies +the contents of the memory location. + +A deterministic exchange of data between threads is only possible if +conflicting accesses happen in a well-defined order. It is the role of +synchronization actions to enforce the runtime execution order of +conflicting accesses. Examples of such synchronization actions are +pthread_mutex_lock(), pthread_mutex_unlock(), sem_wait(), sem_post(), +... + +An important concept with regard to the ordering of load and store +operations on shared memory is the happens-before-1 relation or hb1 +[Adve 1991]. The hb1 relation is a partial order defined over all +shared memory operations. The hb1 relation includes both the +intrathread execution order and the interthread ordering imposed by +synchronization operations. All intrathread accesses of a single +thread are totally ordered by hb1. Since hb1 is a partial order for +interthread memory accesses, interthread memory accesses are either +ordered or not ordered by hb1. A data race is defined by Adve et +al. as two conflicting accesses that are not ordered by the +happens-before-1 relation. Or: which accesses are considered as data +races depends on the runtime behavior of a program. + +There is an interesting relationship between runtime behavior and +multithreaded design patterns. The most straightforward way to ensure +that different threads access shared data in an orderly fashion is to +ensure that at most one thread can access the object at any given +time. This can be realized by a programmer to surround all shared data +accesses with calls to proper synchronization functions. Such a source +code strategy for avoiding data races is also called a locking +discipline. An important property of programs that follow this +strategy is that these programs are data-race free. + +There exist two kinds of tools for verifying the runtime behavior of +multithreaded programs. One class of tools verifies a locking +strategy, and another class of tools verifies the absence of data +races. The difference is subtle but important. + +The most well know algorithm for runtime verification of a locking +strategy is the so called Eraser algorithm [Savage 1997]. While this +algorithm allows to catch more programming errors than the conflicting +accesses classified as data races by the definition of Sarita Adve et +al., unfortunately the Eraser algorithm also reports a lot of false +positives. It is tedious to review the output of the Eraser tool +manually and to verify which reported pairs of accesses are false +positives and which pairs are real data races. There is still research +ongoing about how to reduce the number of false positives reported by +the Eraser algorithm -- see e.g. [Müehlenfeld 2007]. The Helgrind +tool is a refinement of the Eraser algorithm. + +A second class of data race detection tools detects all conflicting +accesses that are data races according to the definition of Sarita +Adve et al. While in theory there is no guarantee that these tools +detect all locking discipline violations, these tools do not report +false positives. These tools are the most practical tools to +use. Examples of this class of tools are DIOTA [Ronsse 2004], Intel(R) +Thread Checker [Banerjee 2006a, Banerjee 2006b, Sack 2006] and DRD. + + About DRD --------- -The current version of DRD is able to perform data race detection on small -programs -- DRD quickly runs out of memory for realistically sized programs. -The current version runs well under Linux on x86 CPU's for multithreaded -programs that use the POSIX threading library. Regular POSIX threads, detached -threads, mutexes, condition variables and spinlocks are supported. POSIX -semaphores, barriers and reader-writer locks are not yet supported. -Extensive scientific research has been carried out on the area of data-race -detection. The two most important algorithms are known as the Eraser algorithm -and the algorithm based on the happens-before relationship, first documented by -Netzer. The Eraser algorithm can result in false positives, while the Netzer -algorithm guarantees not to report false positives. The Netzer algorithm -ignores a certain class of data races however. Both algorithms have been -implemented in Valgrind. The helgrind tool implements the Eraser algorithm, -and the DRD tool implements the Netzer algorithm. Although [Savage 1997] -claims that the Netzer algorithm is harder to implement efficiently, as of -version 3.3.0 drd runs significantly faster on several regression tests than -helgrind. +DRD is still under development, that is why the tool is named exp-drd. +The current version of DRD is able to perform data race detection on +small programs -- DRD quickly runs out of memory for realistically +sized programs. The current version runs well under Linux on x86 +CPU's for multithreaded programs that use the POSIX threading +library. Regular POSIX threads, detached threads, mutexes, condition +variables, spinlocks, semaphores and barriers are supported. POSIX +reader-writer locks are not yet supported. +Although [Savage 1997] claims that a happens-before detector is harder +to implement efficiently than the Eraser algorithm, as of Valgrind +version 3.3.0 exp-drd runs significantly faster on several regression +tests than Helgrind. + How to use DRD -------------- To use this tool, specify --tool=drd on the Valgrind command line. @@ -54,30 +123,114 @@ Future DRD Versions ------------------- The following may be expected in future versions of DRD: -* More extensive documentation. * Drastically reduced memory consumption, such that realistic applications can be analyzed with DRD. * Faster operation. -* Support for semaphores, barriers and reader-writer locks. +* More extensive documentation. +* Support for reader-writer locks. * Support for PowerPC CPU's. * A lock dependency analyzer, as a help in deadlock prevention. -* Support for more than 256 mutexes per process. +* Elimination of several artificial limitations. -See also --------- -* Robert H. B. Netzer and Barton P. Miller. What are race - conditions? Some issues and formalizations. ACM Letters 136 - on Programming Languages and Systems, 1(1):74–88, March 1992. - -* John Ousterhout, Why Threads Are A Bad Idea (for most - purposes), Invited Talk at the 1996 USENIX Technical Conference (January - 25, 1996). http://home.pacbell.net/ouster/threads.pdf - -* Stefan Savage, Michael Burrows, Greg Nelson, Patrick - Sobalvarro and Thomas Anderson, Eraser: A Dynamic Data Race Detector for - Multithreaded Programs, ACM Transactions on Computer Systems, - 15(4):391-411, November 1997. +References +---------- +[Lamport 1978] + Leslie Lamport. + Time, clocks, and the ordering of events in a distributed system. + Communications of the ACM archive, Volume 21, Issue 7, 1978. + http://research.microsoft.com/users/lamport/pubs/time-clocks.pdf + http://portal.acm.org/citation.cfm?id=359563 +[Netzer 1992] + Robert H. B. Netzer and Barton P. Miller. + What are race conditions? Some issues and formalizations. + ACM Letters on Programming Languages and Systems, 1(1):74–88, March 1992. + http://www.securitytechnet.com/resource/security/os/race-conditions.pdf + http://portal.acm.org/citation.cfm?id=130623 +[Adve 1991] + Sarita V. Adve, Mark D. Hill, Barton P. Miller, Robert H. B. Netzer. + Detecting data races on weak memory systems. + Proceedings of the 18th annual international symposium on Computer + architecture, Toronto, Ontario, Canada, pp 234-243, 1991. + http://rsim.cs.uiuc.edu/~sadve/Publications/isca91.dataraces.ps + http://portal.acm.org/citation.cfm?doid=115953.115976 + +[Ousterhout 1996] + John Ousterhout. + Why Threads Are A Bad Idea (for most purposes). + Invited Talk at the 1996 USENIX Technical Conference (January 25, 1996). + http://home.pacbell.net/ouster/threads.pdf + +[Savage 1997] + Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro and + Thomas Anderson. + Eraser: A Dynamic Data Race Detector for Multithreaded Programs. + ACM Transactions on Computer Systems, 15(4):391-411, November 1997. + http://www.cs.ucsd.edu/users/savage/papers/Tocs97.pdf + http://portal.acm.org/citation.cfm?id=265927 + +[Ronsse 1999] + Michiel Ronsse, Koen De Bosschere. + RecPlay: a fully integrated practical record/replay system. + ACM Transactions on Computer Systems (TOCS), Volume 17, Issue 2 (May 1999), + pp. 133-152, 1999. + http://portal.acm.org/citation.cfm?id=312214 + +[Ronsse 2004] + Michiel Ronsse, Jonas Maebe, Koen De Bosschere. + Detecting Data Races in Sequential Programs with DIOTA. + Proceedings of the 10th International Euro-Par Conference, Springer-Verlag, + Lecture Notes in Computer Science, pp. 82-89, 2004. + http://escher.elis.ugent.be/publ/Edocs/DOC/P104_076.pdf + +[Banerjee 2006a] + Utpal Banerjee, Brian Bliss, Zhiqiang Ma, Paul Petersen. + Unraveling Data Race Detection in the Intel® Thread Checker. + First Workshop on Software Tools for Multi-core Systems (STMCS), in + conjunction with IEEE/ACM International Symposium on Code Generation and + Optimization (CGO), March 26, 2006, Manhattan, New York, NY. + +[Banerjee 2006b] + Utpal Banerjee, Brian Bliss, Zhiqiang Ma, Paul Petersen. + A theory of data race detection + Proceeding of the 2006 workshop on Parallel and distributed systems: testing + and debugging, Portland, Maine, USA, pp. 69-78, 2006. + http://www.cs.ucsb.edu/~tiwari/papers/threadchecker06 + http://portal.acm.org/citation.cfm?id=1147416 + +[Lu 2006] + Shan Lu, Joseph Tucek, Feng Qin, Yuanyuan Zhou. + AVIO: detecting atomicity violations via access interleaving invariants. + Proceedings of the 12th international conference on Architectural support + for programming languages and operating systems, San Jose, California, USA, + pp. 37-48, 2006. + http://www.cse.ohio-state.edu/~qin/pub-papers/2006andbefore/asplos062-lu.pdf + http://portal.acm.org/citation.cfm?id=1168864 + +[Sack 2006] + Paul Sack, Brian E. Bliss, Zhiqiang Ma, Paul Petersen, Josep Torrellas + Accurate and efficient filtering for the Intel thread checker race detector. + Proceedings of the 1st workshop on Architectural and system support for + improving software dependability, San Jose, California, pp. 34-41, 2006. + http://iacoma.cs.uiuc.edu/iacoma-papers/asid06.pdf + http://portal.acm.org/citation.cfm?id=1181309.1181315 + +[Müehlenfeld 2007] + Arndt Müehlenfeld, Franz Wotawa. + Fault detection in multi-threaded c++ server applications. + Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of + parallel programming, San Jose, California, USA, poster session, + pp. 142-143, 2007. + http://valgrind.org/docs/muehlenfeld2006.pdf + http://portal.acm.org/citation.cfm?id=1229457 + +[Zhou 2007] + Pin Zhou, Radu Teodorescu, Yuanyuan Zhou. + HARD: Hardware-Assisted Lockset-based Race Detection. + Proceedings of the 2007 IEEE 13th International Symposium on High + Performance Computer Architecture, pp. 121-132, 2007. + http://opera.cs.uiuc.edu/paper/Hard-HPCA07.pdf + http://portal.acm.org/citation.cfm?id=1317533.1318108 |
|
From: Nicholas N. <nj...@cs...> - 2008-02-17 20:57:07
|
On Sun, 17 Feb 2008 sv...@va... wrote: > +to and reading from the shared memory. Since the invention of the > +multithreading concept, there is an ongoing debate about which way to > +model concurrent activities is better -- shared memory programming or > +message passing [Ousterhout 1996]. Isn't what you've called here "multithreading" more typically called "shared memory multithreading" or something like that? Nice write-up, BTW. Nick |
|
From: Bart V. A. <bar...@gm...> - 2008-02-18 18:33:07
|
On Feb 17, 2008 9:57 PM, Nicholas Nethercote <nj...@cs...> wrote: > On Sun, 17 Feb 2008 sv...@va... wrote: > > +to and reading from the shared memory. Since the invention of the > > +multithreading concept, there is an ongoing debate about which way to > > +model concurrent activities is better -- shared memory programming or > > +message passing [Ousterhout 1996]. > > Isn't what you've called here "multithreading" more typically called "shared > memory multithreading" or something like that? The threads that exist within a single process share memory by definition. The historical perspective is as follows: * The first computers did not have an OS and ran one program at a time. * The first operating systems were capable to run multiple processes, but all these processes consisted of a single thread. * The concepts of mutual exclusion and semaphores were already described by Per Brinch Hansen in his paper "A comparison of two synchronizing concepts" (Acta Informatica, 1972). * The monitor concept was introduced by C.A.R. Hoare in his publication titled "Monitors: An Operating System Structuring Concept" (Communications of the ACM, October 1974). * The oldest publication I could find in which the thread concept was mentioned dates from 1986: "Mach: A New Kernel Foundation For UNIX Development" by Accetta et al. (USENIX 1986). > Nice write-up, BTW. Thanks :-) Bart. |
|
From: Julian S. <js...@ac...> - 2008-02-20 22:54:50
|
On Sunday 17 February 2008 21:57, Nicholas Nethercote wrote: > On Sun, 17 Feb 2008 sv...@va... wrote: > > +to and reading from the shared memory. Since the invention of the > > +multithreading concept, there is an ongoing debate about which way to > > +model concurrent activities is better -- shared memory programming or > > +message passing [Ousterhout 1996]. > > Isn't what you've called here "multithreading" more typically called > "shared memory multithreading" or something like that? > > Nice write-up, BTW. I agree. I would add that POSIX pthreads is the de-facto standard a way to do shared memory programming, and MPI is the de-facto standard way to do message passing. I'm sure that message-passing has some failure modes (deadlocks) in common with shared memory programming, and I wouldn't be at all surprised to hear it could suffer from races too. --- One of the things I have come to realise in the past year or so is what a terrible programming model explicit shared-memory parallelism is. It's simply too hard for humans to understand and reason about (in all but the most trivial of applications): even small threaded programs are extremely hard to make sense of. J |
|
From: Bart V. A. <bar...@gm...> - 2008-02-21 11:33:37
|
On Wed, Feb 20, 2008 at 11:51 PM, Julian Seward <js...@ac...> wrote: > I would add that > POSIX pthreads is the de-facto standard a way to do shared > memory programming, and MPI is the de-facto standard way to do > message passing. > > I'm sure that message-passing has some failure modes (deadlocks) > in common with shared memory programming, and I wouldn't be at > all surprised to hear it could suffer from races too. The following failures can occur in message-passing software: * Deadlocks. A deadlock can occur when two or more threads or processes pass messages synchronously and are waiting in a cycle on each other. A deadlock can also occur when synchronously sending a message to a queue that has reached its maximal size, and the messaging implementation is such that it waits in this case. * Race conditions. If two or more threads or processes send a message to the same destination thread or process, and the receiver does not specify which sender it wants to receive from, then this is a race condition. * Message queue size problems. Some message passing implementations use queues with no other bound on the queue size than the amount of available memory. For a programmer it can be very convenient not having to compute the maximal possible size of a message queue. I consider this as a design bug however -- if it is not known at design time how big a message queue can become, how can it be known that the queue will fit in the available memory ? A typical problem that occurs in multithreaded software that uses message passing instead of shared memory for interthread communication is performance degradation due to cache misses. If each thread performs a small task, then frequently work has to be passed between threads and there are many context switches needed. Each context switch has a performance penalty not only because of the time needed for the context switch but more importantly because of the cache misses it causes. > One of the things I have come to realise in the past year or so > is what a terrible programming model explicit shared-memory parallelism > is. It's simply too hard for humans to understand and reason about > (in all but the most trivial of applications): even small threaded > programs are extremely hard to make sense of. It depends. Although understanding concurrent activities is always hard, it is possible to write multithreaded software that is relatively easy to read and to maintain. What I have learned during the past ten years about writing multithreaded software is a.o. the following: * Encapsulate the mutex and thread concepts in an object, this makes multithreaded software more compact and saves a lot of typing. * Use scoped locking instead of explicit lock / unlock calls. * Never use POSIX threads condition variables directly -- use higher level abstractions, e.g. the Mesa-style monitor concept. Using POSIX condition variables directly can easily introduce race conditions. * Encapsulate all thread-shared data in classes. Make sure these classes have a limited number of data members and a limited number of member functions. This makes it possible for a human to verify a locking policy via source reading. * Make sure that the locking policy can be verified by verifying one class at a time. This implies that classes may never return references to their members (which violates data hiding anyway). * With regard to deadlock avoidance, assign a locking order to mutexes and other synchronization objects -- a locking order is the order in which nested locking must be performed. * Make sure the locking order is verified at runtime. This is possible either by using a threading library that supports this or via a tool that verifies this at runtime. Verifying the locking order reduces the complexity of deadlock detection from a multithreaded to a single-threaded problem. This is a huge win for reproducibility. * Make sure the software consists of modules, and that there exists a hierarchy between modules. This is necessary such that function calls can be labeled as either "high-level module calls low-level module" (downcall) or a "callback". * Performing a downcall while a mutex is locked is OK. When performing a callback however, make sure that no mutexes are locked by the thread from which the callback is performed. The above guidelines have allowed me to develop (embedded) software that works very well: e.g. a 70 KLOC (embedded) application with about ten threads never crashed or deadlocked at a customer site. There was only one threading-related bug that was reported by a customer: strange data was sometimes displayed in one specific data field. The cause of this was that in one place shared data was not protected by locking. Bart. |
|
From: Nicholas N. <nj...@cs...> - 2008-02-21 21:16:25
|
On Thu, 21 Feb 2008, Bart Van Assche wrote: >> One of the things I have come to realise in the past year or so >> is what a terrible programming model explicit shared-memory parallelism >> is. It's simply too hard for humans to understand and reason about >> (in all but the most trivial of applications): even small threaded >> programs are extremely hard to make sense of. > > It depends. Although understanding concurrent activities is always > hard, it is possible to write multithreaded software that is > relatively easy to read and to maintain. What I have learned during > the past ten years about writing multithreaded software is a.o. the > following: So basically you need various higher-level abstractions layered over pthreads, plus a strong dose of programmer discipline. So it's doable, but hoping everyone will get it right is optimistic. N |
|
From: Bart V. A. <bar...@gm...> - 2008-02-22 07:24:45
|
On Thu, Feb 21, 2008 at 10:16 PM, Nicholas Nethercote <nj...@cs...> wrote: > On Thu, 21 Feb 2008, Bart Van Assche wrote: > > >> One of the things I have come to realise in the past year or so > >> is what a terrible programming model explicit shared-memory parallelism > >> is. It's simply too hard for humans to understand and reason about > >> (in all but the most trivial of applications): even small threaded > >> programs are extremely hard to make sense of. > > > > It depends. Although understanding concurrent activities is always > > hard, it is possible to write multithreaded software that is > > relatively easy to read and to maintain. What I have learned during > > the past ten years about writing multithreaded software is a.o. the > > following: > > So basically you need various higher-level abstractions layered over > pthreads, plus a strong dose of programmer discipline. So it's doable, but > hoping everyone will get it right is optimistic. My opinion is that software developers should do a serious effort at preventing problems and that they should not just rely on error detection tools. The same holds for memory leaks / dangling pointers: in large applications it is important that there is a clear strategy with regard to ownership of dynamically allocated memory. Sophisticated memory checking and leak checking tools do not provide a solution in case there is no clear ownership policy in the software being checked. Bart. |