|
From: Konstantin S. <kon...@gm...> - 2007-12-05 09:56:31
|
Hi Julian, all,
I get a report about a race from helgrind, while everything looks well
synchronized to me.
Am I wrong?
Thanks,
--kcc
% cat cv.cc
#define _MULTI_THREADED
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
static int COND = 0;
static pthread_cond_t CV = PTHREAD_COND_INITIALIZER;
static pthread_mutex_t MU = PTHREAD_MUTEX_INITIALIZER;
void *run_pm(void*) {
sleep(2); // we want waiter to get there first
pthread_mutex_lock(&MU);
COND++;
fprintf(stderr, "Signal: COND: %d\n", COND);
pthread_cond_signal(&CV);
pthread_mutex_unlock(&MU);
return NULL;
}
int main() {
pthread_t threadid;
pthread_create(&threadid, NULL, run_pm, NULL);
pthread_mutex_lock(&MU);
while (COND != 1) {
fprintf(stderr, "Wait: COND: %d\n", COND);
pthread_cond_wait(&CV, &MU);
}
pthread_mutex_unlock(&MU);
fprintf(stderr, "COND: %d\n", COND);
pthread_join(threadid, NULL);
return 0;
}
% g++ -g cv.cc -lpthread
% ~/race/valgrind32orig/Inst/bin/valgrind --tool=helgrind ./a.out
==31530== Helgrind, a thread error detector.
==31530== Copyright (C) 2007-2007, and GNU GPL'd, by OpenWorks LLP et al.
==31530== Using LibVEX rev 1793, a library for dynamic binary translation.
==31530== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==31530== Using valgrind-3.3.0.RC1, a dynamic binary instrumentation
framework.
==31530== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==31530== For more details, rerun with: -v
==31530==
Wait: COND: 0
Signal: COND: 1
==31530== Thread #1 is the program's root thread
==31530==
==31530== Thread #2 was created
==31530== at 0x4FF1A4D8: clone (in /lib/tls/i686/cmov/libc-2.3.6.so)
==31530== by 0x410B37E2: pthread_create@@GLIBC_2.1 (in
/lib/tls/i686/cmov/libpthread-2.3.6.so)
==31530== by 0x47A749A: pthread_create@* (hg_intercepts.c:213)
==31530== by 0x80486FC: main (cv.cc:23)
==31530==
==31530== Possible data race during read of size 4 at 0x8049AB8
==31530== at 0x8048754: main (cv.cc:32)
==31530== Old state: shared-modified by threads #1, #2
==31530== New state: shared-modified by threads #1, #2
==31530== Reason: this thread, #1, holds no consistent locks
==31530== Last consistently used lock for 0x8049AB8 was first observed
==31530== at 0x47A7916: pthread_mutex_lock (hg_intercepts.c:408)
==31530== by 0x8048708: main (cv.cc:25)
COND: 1
==31530==
==31530== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 16 from 4)
%
|
|
From: Julian S. <js...@ac...> - 2007-12-05 10:41:52
|
> I get a report about a race from helgrind, while everything looks well > synchronized to me. > Am I wrong? > fprintf(stderr, "COND: %d\n", COND); > pthread_join(threadid, NULL); The problem is here. By the time you get here, Helgrind knows that COND has been accessed by both parent and child threads, and MU is used to protect it. But now you are using it without a lock. Helgrind cannot know that the child thread (run_pm) does not access COND again after the pthread_cond_signal (how could it know this without doing static analysis of the child's code?) If you change the order of these two lines, to > pthread_join(threadid, NULL); > fprintf(stderr, "COND: %d\n", COND); then it stops complaining. Because now the pthread_join tells Helgrind that the child can no longer access COND. That means, after the pthread_join, the parent is the only thread able to access COND, so it is OK for it to access COND without a lock. J |
|
From: Konstantin S. <kon...@gm...> - 2007-12-05 10:51:09
|
>> Helgrind cannot know that the child thread (run_pm) does not access COND again after the pthread_cond_signal I thought the logic related to cond_wait() is supposed to catch this. COND is accessed before cond_signal() in child thread and after cond_wait() in parent thread. If COND is accessed outside of critical sections in both threads (but still, before cond_signal in child and after cond_wait in parent), helgrind is quiet. >> If you change the order of these two lines, to Sure. But thread_join is not what is usually done in presence of thread pools... Threads tend to live forever. This program is just a short reproducer. --kcc On Dec 5, 2007 1:41 PM, Julian Seward <js...@ac...> wrote: > > > I get a report about a race from helgrind, while everything looks well > > synchronized to me. > > Am I wrong? > > > fprintf(stderr, "COND: %d\n", COND); > > pthread_join(threadid, NULL); > > The problem is here. By the time you get here, Helgrind knows that > COND has been accessed by both parent and child threads, and MU is > used to protect it. But now you are using it without a lock. > > Helgrind cannot know that the child thread (run_pm) does not access > COND again after the pthread_cond_signal (how could it know this > without doing static analysis of the child's code?) > > If you change the order of these two lines, to > > > pthread_join(threadid, NULL); > > fprintf(stderr, "COND: %d\n", COND); > > then it stops complaining. Because now the pthread_join tells Helgrind > that the child can no longer access COND. That means, after the > pthread_join, > the parent is the only thread able to access COND, so it is OK for it > to access COND without a lock. > > J > |
|
From: Julian S. <js...@ac...> - 2007-12-05 11:25:39
|
> COND is accessed before cond_signal() in child thread
yes
> and after cond_wait() in parent thread.
no:
while (COND != 1) {
fprintf(stderr, "Wait: COND: %d\n", COND);
pthread_cond_wait(&CV, &MU);
}
If the parent only accessed COND after cond_wait, then Helgrind would
not complain, as then it would understand that "exclusive ownership" is
transferred from child to parent by the signal/wait pair. However, both
parent and child access it before the signal/wait pair, COND is marked
as being in shared ownership, and the above doesn't apply.
> >> If you change the order of these two lines, to
>
> Sure. But thread_join is not what is usually done in presence of thread
> pools...
> Threads tend to live forever.
> This program is just a short reproducer.
Yes. I suspect what you are seeing is a problem to do with memory
recycling. See Helgrind manual Sec 7.5 point 2.
Can you send a slightly bigger reproducer, which has the child thread
doing a few units of work that are passed to it from the parent thread?
J
|
|
From: Konstantin S. <kon...@gm...> - 2007-12-05 11:58:28
|
On Dec 5, 2007 2:24 PM, Julian Seward <js...@ac...> wrote:
>
> > COND is accessed before cond_signal() in child thread
>
> yes
>
> > and after cond_wait() in parent thread.
>
> no:
>
> while (COND != 1) {
> fprintf(stderr, "Wait: COND: %d\n", COND);
> pthread_cond_wait(&CV, &MU);
> }
>
> If the parent only accessed COND after cond_wait, then Helgrind would
> not complain, as then it would understand that "exclusive ownership" is
> transferred from child to parent by the signal/wait pair. However, both
> parent and child access it before the signal/wait pair, COND is marked
> as being in shared ownership, and the above doesn't apply.
Yes, COND *is* accessed before cond_wait() by the parent thread -- it the
condition of the cond_wait's while loop :)
Therefore, I agree, it is ShMod by both threads with nonempty lockset.
But once we did signal/wait we only access COND in the parent thread.
why can't we transfer ShMod->Exclusive after cond_signal/cond_wait().
More generally if a thread #x does cond_signal and thread #y does cond_wait
we transfer ShMod(#x, #y, #z) -> ShMod(#y, #z).
Will we loose any true positives with such logic?
>
>
> > >> If you change the order of these two lines, to
> >
> > Sure. But thread_join is not what is usually done in presence of thread
> > pools...
> > Threads tend to live forever.
> > This program is just a short reproducer.
>
> Yes. I suspect what you are seeing is a problem to do with memory
> recycling. See Helgrind manual Sec 7.5 point 2.
I did put VALGRIND_HG_CLEAN_MEMORY everywhere I could find :)
I don't think this is it.
>
>
> Can you send a slightly bigger reproducer, which has the child thread
> doing a few units of work that are passed to it from the parent thread?
>
I should (and will) certainly do this, but it's rather hard...
We have our own mutexes and our own thread pool with callback machinery.
The point is that we usually call thread_join at the very end of the program
and helgrind's logic related to thread_join does not kick in.
As I understand, _VG_USERREQ__HG_PTHREAD_JOIN_POST will work only if the
thread actually did quit.
Do you think it is possible to implement a similar request to indicate the
end of ThreadPool's callback?
Again, I'll try to come back with a reproducer.
>
> J
>
|
|
From: Konstantin S. <kon...@gm...> - 2007-12-05 13:52:22
Attachments:
q.c
|
>> Can you send a slightly bigger reproducer, which has the child thread
>> doing a few units of work that are passed to it from the parent thread?
Here it is (also attached q.c). Comments embedded.
#define _MULTI_THREADED
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
// This test case tries to be a minimal reproducer of thread pool usage.
//
// We have a permanently working thread worker() which takes
// functions (callbacks) from somewhere (add_callback()) and executes them.
// This thread does not cary any state between callbacks.
// So, logically this is the same as if we destroyed the thread and
recreated it
// each time we have a new callback.
//
// Helgrind can hardly understand this by itself,
// but a simple source annotation should help.
// See ANNOTATE_BEGINNING_OF_CALLBACK/ANNOTATE_END_OF_CALLBACK
//
// This test program has 3 global variables, GLOB1, GLOB2, GLOB3
// GLOB1: no races reported
// GLOB2: helgrind reports a race but ANNOTATE_... should help.
// GLOB3: helgrind reports a race and ANNOTATE_... will not help. Can this
be fixed by other means?
int COND = 0; // condition for
cond_wait() loop
pthread_mutex_t MU = PTHREAD_MUTEX_INITIALIZER; // for CV
pthread_cond_t CV = PTHREAD_COND_INITIALIZER; // works with MU
typedef int (*F)(void);
pthread_mutex_t MU_CB = PTHREAD_MUTEX_INITIALIZER; // for callbacks
static F callback = NULL; // protected by MU
// not protected by locks, synchronized via cond_wait()
static int GLOB1 = 0, GLOB2 = 0, GLOB3 = 0;
// execute untill one of callbacks returns 1
void *worker(void *parm)
{
int stop = 0;
F f;
while(!stop){
pthread_mutex_lock(&MU_CB);
if(callback){
f = callback;
callback = NULL; // we are ready to take next callback
}else{
f = 0;
}
pthread_mutex_unlock(&MU_CB);
if(f){
// ANNOTATE_BEGINNING_OF_CALLBACK (helgrind's client request here)
stop = f();
// ANNOTATE_END_OF_CALLBACK (helgrind's client request here)
}
sleep(1); // don't burn CPU. Real thread pool will block instead of
sleeping.
}
}
void add_callback(F f)
{
pthread_mutex_lock(&MU_CB);
assert(callback == NULL);
callback = f;
pthread_mutex_unlock(&MU_CB);
}
int callback_do_something_usefull()
{
sleep(2); // we want waiter to get there first
GLOB1 = 2; // no race is reported
GLOB2 = 2; // helgrind reports a race
pthread_mutex_lock(&MU);
GLOB3 = 2;
pthread_mutex_unlock(&MU);
pthread_mutex_lock(&MU);
COND = 1;
pthread_cond_signal(&CV);
pthread_mutex_unlock(&MU);
return 0; // continue
}
int callback_finish_thread()
{
return 1; // stop
}
int main(int argc, char **argv)
{
pthread_t threadid;
// accessed before pthread_create()
GLOB1 = 1;
pthread_create(&threadid, NULL, worker, NULL);
// accessed after pthread_create() but before
ANNOTATE_BEGINNING_OF_CALLBACK
GLOB2 = 1;
add_callback(callback_do_something_usefull);
pthread_mutex_lock(&MU);
GLOB3 = 1;
pthread_mutex_unlock(&MU);
// now we wait untill callback_do_something_usefull() signals.
pthread_mutex_lock(&MU);
while (COND != 1) {
pthread_cond_wait(&CV, &MU);
}
pthread_mutex_unlock(&MU);
// at this point callback_do_something_usefull() has signalled
// and will never write to any of these variables again.
// I beleive that worker() can be removed from TSETs of
// all tree variables.
//
// It is still possible that callback_do_something_usefull()
// did not exit yet and we did not execute ANNOTATE_END_OF_CALLBACK.
//
//
// Currently, helgrind has the following info:
// GLOB1: Exclusive(main)
// GLOB2: ShMod(tset={main, worker}, lset={})
// GLOB3: ShMod(tset={main, worker}, lset={MU})
GLOB1 = 2; // helgrind does *not* report a race
GLOB2 = 2; // helgrind already reported a race in
callback_do_something_usefull()
GLOB3 = 2; // helgrind reports race here
// fprintf(stderr, "GLOB1: %d\n", GLOB1);
// fprintf(stderr, "GLOB2: %d\n", GLOB2);
// fprintf(stderr, "GLOB3: %d\n", GLOB3);
// real program will give more usefull callbacks here
add_callback(callback_finish_thread);
pthread_join(threadid, NULL);
return 0;
}
On Dec 5, 2007 2:24 PM, Julian Seward <js...@ac...> wrote:
>
> > COND is accessed before cond_signal() in child thread
>
> yes
>
> > and after cond_wait() in parent thread.
>
> no:
>
> while (COND != 1) {
> fprintf(stderr, "Wait: COND: %d\n", COND);
> pthread_cond_wait(&CV, &MU);
> }
>
> If the parent only accessed COND after cond_wait, then Helgrind would
> not complain, as then it would understand that "exclusive ownership" is
> transferred from child to parent by the signal/wait pair. However, both
> parent and child access it before the signal/wait pair, COND is marked
> as being in shared ownership, and the above doesn't apply.
>
> > >> If you change the order of these two lines, to
> >
> > Sure. But thread_join is not what is usually done in presence of thread
> > pools...
> > Threads tend to live forever.
> > This program is just a short reproducer.
>
> Yes. I suspect what you are seeing is a problem to do with memory
> recycling. See Helgrind manual Sec 7.5 point 2.
>
> Can you send a slightly bigger reproducer, which has the child thread
> doing a few units of work that are passed to it from the parent thread?
>
> J
>
|
|
From: Bart V. A. <bar...@gm...> - 2007-12-05 12:10:47
|
On Dec 5, 2007 10:56 AM, Konstantin Serebryany < kon...@gm...> wrote: > Hi Julian, all, > > I get a report about a race from helgrind, while everything looks well > synchronized to me. > Am I wrong? > I can be wrong, but my understanding is that the Eraser algorithm (helgrind) reports any conflicting accesses to shared variables that are not protected by proper locking, so I expect helgrind to complain on the last COND access. DRD on the other hand looks for possible causes of nondeterminism. There are no such issues in the program cv.cc, hence DRD does not complain on any of the COND accesses in cv.cc. Regards, Bart Van Assche. |
|
From: Konstantin S. <kon...@gm...> - 2007-12-05 12:16:38
|
As far as I understand, helgrind is not more than just Eraser :)
For example, the following program does not report any race, while GLOB is
*not* protected by any lock.
% cat cv2.cc
#define _MULTI_THREADED
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
static int COND = 0;
static int GLOB = 0;
static pthread_cond_t CV = PTHREAD_COND_INITIALIZER;
static pthread_mutex_t MU = PTHREAD_MUTEX_INITIALIZER;
void *run_pm(void*) {
sleep(2); // we want waiter to get there first
GLOB = 1;
pthread_mutex_lock(&MU);
COND++;
fprintf(stderr, "Signal: COND: %d\n", COND);
pthread_cond_signal(&CV);
pthread_mutex_unlock(&MU);
return NULL;
}
int main() {
pthread_t threadid;
GLOB = 2;
pthread_create(&threadid, NULL, run_pm, NULL);
pthread_mutex_lock(&MU);
while (COND != 1) {
pthread_cond_wait(&CV, &MU);
}
pthread_mutex_unlock(&MU);
GLOB = 2;
fprintf(stderr, "GLOB: %d\n", GLOB);
pthread_join(threadid, NULL);
return 0;
}
% g++ -g cv2.cc -lpthread ; ~/race/valgrind32orig/Inst/bin/valgrind -q
--tool=helgrind ./a.out
Signal: COND: 1
GLOB: 2
%
On Dec 5, 2007 3:10 PM, Bart Van Assche <bar...@gm...> wrote:
> On Dec 5, 2007 10:56 AM, Konstantin Serebryany <
> kon...@gm...> wrote:
>
> > Hi Julian, all,
> >
> > I get a report about a race from helgrind, while everything looks well
> > synchronized to me.
> > Am I wrong?
> >
>
> I can be wrong, but my understanding is that the Eraser algorithm
> (helgrind) reports any conflicting accesses to shared variables that are not
> protected by proper locking, so I expect helgrind to complain on the last
> COND access. DRD on the other hand looks for possible causes of
> nondeterminism. There are no such issues in the program cv.cc, hence DRD
> does not complain on any of the COND accesses in cv.cc.
>
> Regards,
>
> Bart Van Assche.
|
|
From: Konstantin S. <kon...@gm...> - 2007-12-05 12:17:23
|
>> As far as I understand, helgrind is not more than just Eraser :)
helgrind is more than just Eraser :)
On Dec 5, 2007 3:16 PM, Konstantin Serebryany <
kon...@gm...> wrote:
> As far as I understand, helgrind is not more than just Eraser :)
>
> For example, the following program does not report any race, while GLOB is
> *not* protected by any lock.
>
> % cat cv2.cc
> #define _MULTI_THREADED
> #include <pthread.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> static int COND = 0;
> static int GLOB = 0;
> static pthread_cond_t CV = PTHREAD_COND_INITIALIZER;
> static pthread_mutex_t MU = PTHREAD_MUTEX_INITIALIZER;
>
> void *run_pm(void*) {
> sleep(2); // we want waiter to get there first
> GLOB = 1;
> pthread_mutex_lock(&MU);
> COND++;
> fprintf(stderr, "Signal: COND: %d\n", COND);
> pthread_cond_signal(&CV);
> pthread_mutex_unlock(&MU);
> return NULL;
> }
>
> int main() {
> pthread_t threadid;
> GLOB = 2;
> pthread_create(&threadid, NULL, run_pm, NULL);
>
> pthread_mutex_lock(&MU);
> while (COND != 1) {
> pthread_cond_wait(&CV, &MU);
> }
> pthread_mutex_unlock(&MU);
>
> GLOB = 2;
> fprintf(stderr, "GLOB: %d\n", GLOB);
> pthread_join(threadid, NULL);
> return 0;
> }
> % g++ -g cv2.cc -lpthread ; ~/race/valgrind32orig/Inst/bin/valgrind -q
> --tool=helgrind ./a.out
> Signal: COND: 1
> GLOB: 2
> %
>
>
>
> On Dec 5, 2007 3:10 PM, Bart Van Assche <bar...@gm...> wrote:
>
> > On Dec 5, 2007 10:56 AM, Konstantin Serebryany <
> > kon...@gm...> wrote:
> >
> > > Hi Julian, all,
> > >
> > > I get a report about a race from helgrind, while everything looks well
> > > synchronized to me.
> > > Am I wrong?
> > >
> >
> > I can be wrong, but my understanding is that the Eraser algorithm
> > (helgrind) reports any conflicting accesses to shared variables that are not
> > protected by proper locking, so I expect helgrind to complain on the last
> > COND access. DRD on the other hand looks for possible causes of
> > nondeterminism. There are no such issues in the program cv.cc, hence DRD
> > does not complain on any of the COND accesses in cv.cc.
> >
> > Regards,
> >
> > Bart Van Assche.
>
>
>
|
|
From: Julian S. <js...@ac...> - 2007-12-05 12:19:53
|
> I can be wrong, but my understanding is that the Eraser algorithm > (helgrind) reports any conflicting accesses to shared variables that are > not protected by proper locking, so I expect helgrind to complain on the > last COND access. Yes. > DRD on the other hand looks for possible causes of > nondeterminism. There are no such issues in the program cv.cc, hence DRD > does not complain on any of the COND accesses in cv.cc. Yes. So I have a question: can you clarify what you mean by "possible causes of nondeterminism"? I have the impression that DRD-style algorithms are scheduling-sensitive, whereas Helgrind is not (or at least, less so). J |
|
From: Bart V. A. <bar...@gm...> - 2007-12-05 16:01:51
|
On Dec 5, 2007 1:19 PM, Julian Seward <js...@ac...> wrote: > > > DRD on the other hand looks for possible causes of > > nondeterminism. There are no such issues in the program cv.cc, hence DRD > > does not complain on any of the COND accesses in cv.cc. > > Yes. > > So I have a question: can you clarify what you mean by "possible causes > of nondeterminism"? I have the impression that DRD-style algorithms are > scheduling-sensitive, whereas Helgrind is not (or at least, less so). > > J > If DRD reports a data race, this means that it found conflicting accesses whose order was not defined by synchronization primitives. Such conflicting accesses make the execution of a multithreaded program nondeterministic. And it is right that Helgrind is less sensitive to thread scheduling, and that a pure vector clock based race detector can miss some data races. But the better detection of Helgrind has its price: it runs considerably slower than DRD. It would also be interesting to know how big the chance is that a data race is missed by the vector clock algorithm but that it is detected by the Eraser algorithm. Regards, Bart. |
|
From: Julian S. <js...@ac...> - 2007-12-05 12:30:24
|
On Wednesday 05 December 2007 13:17, Konstantin Serebryany wrote: > >> As far as I understand, helgrind is not more than just Eraser :) > helgrind is more than just Eraser :) True. Helgrind is Eraser + usage of happens-before dependencies resulting from thread creation, joinage, condition-variable signalling (to the extent these dependencies are visible), and through semaphore events, + the ability to transfer memory from shared states to exclusive ownership states at pthread_join points. J |
|
From: Bart V. A. <bar...@gm...> - 2007-12-06 07:18:01
|
On Dec 5, 2007 1:29 PM, Julian Seward <js...@ac...> wrote: > On Wednesday 05 December 2007 13:17, Konstantin Serebryany wrote: > > >> As far as I understand, helgrind is not more than just Eraser :) > > helgrind is more than just Eraser :) > > True. Helgrind is Eraser + usage of happens-before dependencies > resulting from thread creation, joinage, condition-variable signalling > (to the extent these dependencies are visible), and through semaphore > events, + the ability to transfer memory from shared states to exclusive > ownership states at pthread_join points. Although I have a lot of respect for the work you have done on the Helgrind tool, I still do not understand why you started from the Eraser algorithm. This algorithm is easy to mislead, e.g. it doesn't recognize user-implemented synchronization primitives. Many developers implement reader-writer locks on top of mutexes because this makes code easier to port between Linux and Windows. Obtaining a reader-lock is then implemented as (simplified) (lock mutex, increment reader count, unlock mutex). The Eraser algorithm won't recognize such code as a reader lock. Another issue with the Eraser algorithms are HPC algorithms where computations consist of several steps, where the different steps are only separated by barriers. The memory accesses within a step of the different threads do not conflict, but the memory accesses of different steps contain conflicts. A vector-clock based algorithm won't report any false positive for this scenario, but an Eraser-style algorithm will report all conflicting accesses as data races. Regards, Bart Van Assche. |
|
From: Christoph B. <bar...@or...> - 2007-12-05 17:10:17
|
Am Mittwoch, 5. Dezember 2007 schrieb Julian Seward:
> On Wednesday 05 December 2007 16:32, Christoph Bartoschek wrote:
> > The read of COND in the parent thread happens in a segment that is after
> > the accesses that established the shared-modified state in the
> > happens-before graph.
> >
> > Given that COND should not trigger an error.
>
> But why should we assume that ownership of COND should change from
> shared-modified to exclusively-owned-by-parent at the point of the
> signal/wait pair?
Currently I think that one does not change the state of the variable. But when
one detects an error with the current approach, one would check the
happens-before graph to see whether there really is a problem.
> In this example, it would cause Helgrind to shut up, but it's not obvious
> to me that this is a good thing to do in general.
>
> It would be easy enough (although expensive) to modify Helgrind so that,
> when thread X signals at Y, memory shared by X and Y is made exclusive
> to Y. Is that helpful or correct? I don't know. I don't know what the
> consequences would be, just now.
There is one problem when several signals are emitted. Which one should one
use to establish the happens-before relation? This could make the tool
dependent on the scheduler. For example the following code:
Thread 1: Thread 2:
a. pthread_mutex_lock(mutex); 1. pthread_mutex_lock(mutex);
b. while (COND != 1) { 2. COND = 1;
c. pthread_cond_wait(cv); 3. pthread_cond_signal(cv);
d. } 4. pthread_mutex_unlock(mutex);
e. pthread_mutex_unlock(mutex); 5. pthread_mutex_lock(mutex);
f. COND = 3; 6. COND = 2;
7. pthread_cond_signal(cv);
8. pthread_mutex_unlock(mutex);
If thread 1 wakes up on the first signal in line 3, then there is a problem in
lines f and 6. If thread 1 wakes up on the second signal in line 7, one would
not detect an error. Detecting the error in the first case would be the hard
case in my opinion.
...
> Instead of using source annotations, you can also avoid this
> missing-dependency problem with CVs by using semaphores instead.
> Helgrind tracks dependencies through semaphores exactly. A semaphore
> with an initial value of zero is useful for signalling events between
> threads, without having to mess with these boolean conditions on
> CVs.
You are right. But lots of code uses condition variables. I would like to use
helgrind on several code parts that use condition variables, but currently
the false positive count is too high.
|
|
From: Julian S. <js...@ac...> - 2007-12-06 20:02:25
|
> > It would be easy enough (although expensive) to modify Helgrind so that,
> > when thread X signals at Y, memory shared by X and Y is made exclusive
> > to Y. Is that helpful or correct? I don't know. I don't know what the
> > consequences would be, just now.
>
> There is one problem when several signals are emitted. Which one should one
> use to establish the happens-before relation? This could make the tool
> dependent on the scheduler. For example the following code:
>
> Thread 1: Thread 2:
>
> a. pthread_mutex_lock(mutex); 1. pthread_mutex_lock(mutex);
> b. while (COND != 1) { 2. COND = 1;
> c. pthread_cond_wait(cv); 3. pthread_cond_signal(cv);
> d. } 4. pthread_mutex_unlock(mutex);
> e. pthread_mutex_unlock(mutex); 5. pthread_mutex_lock(mutex);
> f. COND = 3; 6. COND = 2;
> 7. pthread_cond_signal(cv);
> 8. pthread_mutex_unlock(mutex);
>
> If thread 1 wakes up on the first signal in line 3, then there is a problem
> in lines f and 6. If thread 1 wakes up on the second signal in line 7, one
> would not detect an error. Detecting the error in the first case would be
> the hard case in my opinion.
You will get scheduler-related results even from a simpler example
(which is equivalent to Konstantin's original program):
initially COND = 0
Thread 1: Thread 2:
a. pthread_mutex_lock(mutex); 1. pthread_mutex_lock(mutex);
b. while (COND != 1) { 2. COND = 1;
c. pthread_cond_wait(cv); 3. pthread_cond_signal(cv);
d. } 4. pthread_mutex_unlock(mutex);
e. pthread_mutex_unlock(mutex);
f. COND = 3; 5. pthread_mutex_lock(mutex);
6. COND = 4;
7. pthread_mutex_unlock(mutex);
(Assuming the wait happens before the signal, so the signal is not lost).
State changes to COND are:
initially COND is New
After (b) COND is Excl(Thread1)
After (2) COND is ShM({Thread1,Thread2},{mutex})
After (3) signals at (c), COND is Excl(Thread1) as per proposed rule
Now if f happens before 5/6/7:
After (f) COND is Excl(Thread1)
After (5/6/7) COND is ShM({Thread1,Thread2},{mutex}), and no error
But if 5/6/7 happens before f:
After (5/6/7) COND is ShM({Thread1,Thread2},{mutex})
After (f) COND is ShM({Thread1,Thread2},{}) -- error
Without the proposed rule, the error is detected regardless of the
scheduling.
Such a rule can easily enough be installed in Helgrind. But you can't
have it both ways:
If the rule is installed, effectively you are promising that after the
signal from Thread 2 to Thread 1, Thread 2 will not access COND again
until Thread 1 first signals back at Thread 2. That makes Helgrind shut
up, but you lose some ability of Helgrind to report errors resulting
from later violations of the promise, depending on the scheduling.
If the rule is not installed, then Helgrind complains, but it can
at least check what happens next, independent of scheduling.
J
|
|
From: Julian S. <js...@ac...> - 2007-12-07 02:04:58
Attachments:
happens-before-cvhack.patch
|
On Wednesday 05 December 2007 18:10, Christoph Bartoschek wrote: > Am Mittwoch, 5. Dezember 2007 schrieb Julian Seward: > > On Wednesday 05 December 2007 16:32, Christoph Bartoschek wrote: > > > The read of COND in the parent thread happens in a segment that is > > > after the accesses that established the shared-modified state in the > > > happens-before graph. > > > > > > Given that COND should not trigger an error. > > > > But why should we assume that ownership of COND should change from > > shared-modified to exclusively-owned-by-parent at the point of the > > signal/wait pair? Ok. Try the attached patch; it implements this change. Once patched, you need the flag --happens-before=cvhack, else Helgrind behaves exactly as before. The transition it does, at the event _VG_USERREQ__HG_PTHREAD_COND_WAIT_POST, is: signalling thread gives up ownership of shared locations, giving them to the waiting thread instead. If a location was previously accessed only by the waiting thread and the signalling thread, then the waiting thread acquires exclusive ownership. It makes H shut up with Konstantin's original test case cv.cc. I did not test anything else. Note this is an insanely inefficient implementation -- this patch is just to find out if the idea is useful. J |
|
From: Julian S. <js...@ac...> - 2007-12-05 21:09:17
|
> On Wednesday 05 December 2007 18:10, Christoph Bartoschek wrote: > [...] > > You are right. But lots of code uses condition variables. I would like to > use helgrind on several code parts that use condition variables, but > currently the false positive count is too high. Thanks for the feedback. I might be able to improve the situation, but that will have to wait until after 3.3.0 is released now. J |
|
From: Konstantin S. <kon...@gm...> - 2007-12-06 08:53:02
Attachments:
q2.cc
|
> > You are right. But lots of code uses condition variables. I would like
to
> > use helgrind on several code parts that use condition variables, but
> > currently the false positive count is too high.
Oh yes!
> Thanks for the feedback. I might be able to improve the situation, but
> that will have to wait until after 3.3.0 is released now.
That''l be great!
I've modified my test (attached, q2.cc), hope it will be helpful :)
It now has N worker threads. If N >= 2 the race is reported even for GLOB1.
GLOB[12] are now read-only in workers.
As I understand, helgrind currently can transfer Exclusive(worker) ->
Exclusive(parent) but can not transfer
ShRO(worker1,worker2)->Exclusive(parent) in presence of cond_wait().
I did not find VALGRIND_HG_POST_WAIT anywhere in valgrind nor in the net.
Is it supposed to be used like this?
#include "helgrind.h"
...
pthread_mutex_lock(&MU);
while (COND != n_tasks) {
pthread_cond_wait(&CV, &MU);
}
VALGRIND_HG_POST_WAIT(&CV)
pthread_mutex_unlock(&MU);
Thanks,
--kcc
On Dec 6, 2007 12:08 AM, Julian Seward <js...@ac...> wrote:
>
> > On Wednesday 05 December 2007 18:10, Christoph Bartoschek wrote:
> > [...]
> >
> > You are right. But lots of code uses condition variables. I would like
> to
> > use helgrind on several code parts that use condition variables, but
> > currently the false positive count is too high.
>
> Thanks for the feedback. I might be able to improve the situation, but
> that will have to wait until after 3.3.0 is released now.
>
> J
>
> -------------------------------------------------------------------------
> SF.Net email is sponsored by: The Future of Linux Business White Paper
> from Novell. From the desktop to the data center, Linux is going
> mainstream. Let it simplify your IT future.
> http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
>
|
|
From: Julian S. <js...@ac...> - 2007-12-07 11:40:34
|
On Thursday 06 December 2007 09:53, Konstantin Serebryany wrote:
> I've modified my test (attached, q2.cc), hope it will be helpful :)
> It now has N worker threads. If N >= 2 the race is reported even for GLOB1.
Last night's patch (happens-before-cvhack.patch) also makes this, q2.cc,
run without race warnings.
> I did not find VALGRIND_HG_POST_WAIT anywhere in valgrind nor in the net.
> Is it supposed to be used like this?
>
> #include "helgrind.h"
> ...
> pthread_mutex_lock(&MU);
> while (COND != n_tasks) {
> pthread_cond_wait(&CV, &MU);
> }
> VALGRIND_HG_POST_WAIT(&CV)
> pthread_mutex_unlock(&MU);
What is the behaviour of VALGRIND_HG_POST_WAIT supposed to be?
Are you referring to the VALGRIND_HG_POST_WAIT that is introduced
at the bottom of page 31 of Arndt Muehlenfeld's PhD thesis, or some
other thing?
J
|
|
From: Konstantin S. <kon...@gm...> - 2007-12-07 10:30:43
|
>> Last night's patch (happens-before-cvhack.patch) also makes this, q2.cc,
>> run without race warnings.
Amazing! --happens-before=cvhack does help with this patch!
>From the comments it does look 'insanely inefficient', but it's better than
nothing. :)
I'll try other tests with cond vars.
>> Are you referring to the VALGRIND_HG_POST_WAIT that is introduced
>> at the bottom of page 31 of Arndt Muehlenfeld's PhD thesis, or some
>> other thing?
Can you give me a link to Arndt's PhD? I can't find it :(
--kcc
On Dec 7, 2007 12:44 PM, Julian Seward <js...@ac...> wrote:
>
> On Thursday 06 December 2007 09:53, Konstantin Serebryany wrote:
>
> > I've modified my test (attached, q2.cc), hope it will be helpful :)
> > It now has N worker threads. If N >= 2 the race is reported even for
> GLOB1.
>
> Last night's patch (happens-before-cvhack.patch) also makes this, q2.cc,
> run without race warnings.
>
>
> > I did not find VALGRIND_HG_POST_WAIT anywhere in valgrind nor in the
> net.
> > Is it supposed to be used like this?
> >
> > #include "helgrind.h"
> > ...
> > pthread_mutex_lock(&MU);
> > while (COND != n_tasks) {
> > pthread_cond_wait(&CV, &MU);
> > }
> > VALGRIND_HG_POST_WAIT(&CV)
> > pthread_mutex_unlock(&MU);
>
> What is the behaviour of VALGRIND_HG_POST_WAIT supposed to be?
>
> Are you referring to the VALGRIND_HG_POST_WAIT that is introduced
> at the bottom of page 31 of Arndt Muehlenfeld's PhD thesis, or some
> other thing?
>
> J
>
|
|
From: Konstantin S. <kon...@gm...> - 2007-12-13 07:44:27
Attachments:
hg_race_sem_and_mu.cc
|
Hi,
The same false positive appears when mixing mutexes and semaphores. Test
attached.
As with cond vars, we are able to transition Excl(T1)->Excl(T2), but can not
do ShM(T1,T2)->Excl(T2).
As I understand, fixing this properly will require storing SegmentID in all
shadow words, not only in those that are in Exclusive state.
This will hardly squeeze into 32 bits though, need 64... :(
--kcc
#include <pthread.h>
#include <semaphore.h>
#include <unistd.h>
#include <stdio.h>
// This test is race free, but helgrind 3.3.0 reports a race
// main().........................worker()
// 1. start(worker)
// 2. lock(MU)
// 3. write(GLOB)
// 4. unlock(MU)
// a. lock(MU)
// b. write(GLOB)
// c. unlock(MU)
// 5. post(SEM)--------\
// \------> d. wait(SEM)
// e. write(GLOB)
// 6. join(worker)
// 7. read(GLOB)
int GLOB = 0;
sem_t SEM;
pthread_mutex_t MU = PTHREAD_MUTEX_INITIALIZER;
void *worker(void *) {
pthread_mutex_lock(&MU);
GLOB = 22;
pthread_mutex_unlock(&MU);
sem_wait(&SEM);
GLOB = 2; // RACE is reported here, but shouldn't
return NULL;
}
int main() {
sem_init(&SEM, 0, 0);
pthread_t thread;
pthread_create(&thread, NULL, worker, NULL);
pthread_mutex_lock(&MU);
GLOB = 11;
pthread_mutex_unlock(&MU);
sem_post(&SEM);
pthread_join(thread, NULL);
printf("glob=%d\n", GLOB);
}
On Dec 7, 2007 1:30 PM, Konstantin Serebryany <
kon...@gm...> wrote:
> >> Last night's patch (happens-before-cvhack.patch) also makes this, q2.cc
> ,
> >> run without race warnings.
>
> Amazing! --happens-before=cvhack does help with this patch!
> From the comments it does look 'insanely inefficient', but it's better
> than nothing. :)
> I'll try other tests with cond vars.
>
>
> >> Are you referring to the VALGRIND_HG_POST_WAIT that is introduced
> >> at the bottom of page 31 of Arndt Muehlenfeld's PhD thesis, or some
> >> other thing?
> Can you give me a link to Arndt's PhD? I can't find it :(
>
> --kcc
>
>
>
>
> On Dec 7, 2007 12:44 PM, Julian Seward < js...@ac...> wrote:
>
> >
> > On Thursday 06 December 2007 09:53, Konstantin Serebryany wrote:
> >
> > > I've modified my test (attached, q2.cc), hope it will be helpful :)
> > > It now has N worker threads. If N >= 2 the race is reported even for
> > GLOB1.
> >
> > Last night's patch (happens-before-cvhack.patch ) also makes this, q2.cc
> > ,
> > run without race warnings.
> >
> >
> > > I did not find VALGRIND_HG_POST_WAIT anywhere in valgrind nor in the
> > net.
> > > Is it supposed to be used like this?
> > >
> > > #include "helgrind.h"
> > > ...
> > > pthread_mutex_lock(&MU);
> > > while (COND != n_tasks) {
> > > pthread_cond_wait(&CV, &MU);
> > > }
> > > VALGRIND_HG_POST_WAIT(&CV)
> > > pthread_mutex_unlock(&MU);
> >
> > What is the behaviour of VALGRIND_HG_POST_WAIT supposed to be?
> >
> > Are you referring to the VALGRIND_HG_POST_WAIT that is introduced
> > at the bottom of page 31 of Arndt Muehlenfeld's PhD thesis, or some
> > other thing?
> >
> > J
> >
>
>
|
|
From: Julian S. <js...@ac...> - 2007-12-13 11:06:41
|
On Thursday 13 December 2007 08:44, Konstantin Serebryany wrote: > Hi, > > The same false positive appears when mixing mutexes and semaphores. Test > attached. > As with cond vars, we are able to transition Excl(T1)->Excl(T2), but can > not do ShM(T1,T2)->Excl(T2). > > As I understand, fixing this properly will require storing SegmentID in all > shadow words, not only in those that are in Exclusive state. > This will hardly squeeze into 32 bits though, need 64... :( No, I don't think so. We just need to do the same memory ownership transition for semaphores that the "cvhack" patch does for CVs. I'll extend the current patch. J |
|
From: Konstantin S. <kon...@gm...> - 2007-12-13 08:14:28
|
But isn't it expensive? On Dec 13, 2007 11:01 AM, Julian Seward <js...@ac...> wrote: > On Thursday 13 December 2007 08:44, Konstantin Serebryany wrote: > > Hi, > > > > The same false positive appears when mixing mutexes and semaphores. Test > > attached. > > As with cond vars, we are able to transition Excl(T1)->Excl(T2), but can > > not do ShM(T1,T2)->Excl(T2). > > > > As I understand, fixing this properly will require storing SegmentID in > all > > shadow words, not only in those that are in Exclusive state. > > This will hardly squeeze into 32 bits though, need 64... :( > > No, I don't think so. We just need to do the same memory ownership > transition for semaphores that the "cvhack" patch does for CVs. I'll > extend the current patch. > > J > |
|
From: Konstantin S. <kon...@gm...> - 2007-12-13 08:16:37
|
Just speculating, my understand of helgrind's algorithm is still poor ... May be we shall store SegementIDs in TSET's instead of ThreadIDs? On Dec 13, 2007 11:07 AM, Konstantin Serebryany < kon...@gm...> wrote: > But isn't it expensive? > > > On Dec 13, 2007 11:01 AM, Julian Seward <js...@ac...> wrote: > > > On Thursday 13 December 2007 08:44, Konstantin Serebryany wrote: > > > Hi, > > > > > > The same false positive appears when mixing mutexes and semaphores. > > Test > > > attached. > > > As with cond vars, we are able to transition Excl(T1)->Excl(T2), but > > can > > > not do ShM(T1,T2)->Excl(T2). > > > > > > As I understand, fixing this properly will require storing SegmentID > > in all > > > shadow words, not only in those that are in Exclusive state. > > > This will hardly squeeze into 32 bits though, need 64... :( > > > > No, I don't think so. We just need to do the same memory ownership > > transition for semaphores that the "cvhack" patch does for CVs. I'll > > extend the current patch. > > > > J > > > > |
|
From: Julian S. <js...@ac...> - 2007-12-13 11:27:32
|
On Thursday 13 December 2007 09:07, Konstantin Serebryany wrote: > But isn't it expensive? Potentially, yes in the naive implementation. But I can think of at least two other ways to make the transition cheaper. Those require more implementation effort though. J |