|
From: Dave G. <go...@mc...> - 2010-07-07 22:20:58
|
I have some multithreaded code that signals object "completion" with a flag variable. Initially the flag value is non-zero (almost always 1) and one thread (A) will do some work to determine the values for various fields in the object. When thread A is finished with the object, it indicates that the object is complete by setting the flag to zero. This is a one-way transition, the object will be read exclusively by the (guaranteed) single consumer thread (B) and eventually destroyed by thread B.
I have inserted memory barriers in both the producer's and consumer's code to provide store-release/load-acquire semantics with the flag variable. Well, in the example code at the bottom, I used full barriers, but I think that acquire/release is really all I need.
These reads and writes are of course flagged by the various valgrind threading tools (helgrind/drd/tsan) as potential races. However I was surprised to find that I could not successfully annotate this sort of behavior in the same way as I would an atomic reference counting scheme. At the bottom of this mail is some example code that illustrates my scenario. The read itself is flagged as racy before the HAPPENS_AFTER annotation. In DRD and TSan, I can at least ANNOTATE_BENIGN_RACE as a last resort, but Helgrind doesn't support this annotation.
Changing the unconditional write in the producer thread to an atomic decrement makes the annotations work again, as in refcounting. This may be what I use in the end to solve this problem, but I do still want to understand why I can't annotate it correctly.
Q1: I know it's potentially error prone in terms of usage and implementation, but I'm attempting something that is basically sane, right?
Q2: Is there a way to correctly annotate the store/load pair, ideally across all three tools? Or for helgrind am I stuck with suppressions?
Thanks,
-Dave
------8<------
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>
#include <pthread.h>
/* change this to change tool annotations */
#define USE_HELGRIND
/* don't define this to get atomic_fetch_and_decr */
#define USE_PLAIN_STORE
#if defined(USE_HELGRIND)
#include <valgrind/helgrind.h>
#elif defined(USE_DRD)
#include <valgrind/drd.h>
#elif defined(USE_TSAN)
#include <dynamic_annotations.h>
#endif
volatile int flag = 1;
int shared_data;
void *completer(void *context)
{
sleep(5); /* do some "work" */
/* this thread exclusively owns shared_data */
shared_data = 0xdeadbeef;
/* now indicate the shared_data is complete and owned by the other thread */
__sync_synchronize(); /* full memory barrier */
ANNOTATE_HAPPENS_BEFORE(&flag);
#if defined(USE_PLAIN_STORE)
flag = 0;
#else
__sync_sub_and_fetch(&flag, 1);
#endif
return NULL;
}
int main(int argc, char **argv)
{
int err;
pthread_t other_thread;
err = pthread_create(&other_thread, NULL, completer, NULL);
assert(!err);
while (1) {
if (0 == flag) {
ANNOTATE_HAPPENS_AFTER(&flag);
__sync_synchronize(); /* full memory barrier */
break;
}
}
/* this thread now owns the shared data and can access it freely */
printf("shared_data=%#x\n", shared_data);
pthread_join(other_thread, NULL);
return 0;
}
------8<------
And here's the output from Helgrind:
------8<------
% valgrind --tool=helgrind --read-var-info=yes -q ./a.out
==27470== Thread #2 was created
==27470== at 0x511EB0E: clone (in /lib/libc-2.7.so)
==27470== by 0x4E31A11: pthread_create@@GLIBC_2.2.5 (in /lib/libpthread-2.7.so)
==27470== by 0x4C286AA: pthread_create_WRK /sandbox/goodell/drt/third_party/valgrind/helgrind/hg_intercepts.c:257
==27470== by 0x4C287AA: pthread_create@* /sandbox/goodell/drt/third_party/valgrind/helgrind/hg_intercepts.c:288
==27470== by 0x40099B: main /sandbox/goodell/scratch/flag-hb/foo.c:45
==27470==
==27470== Thread #1 is the program's root thread
==27470==
==27470== Possible data race during write of size 4 at 0x600f08 by thread #2
==27470== at 0x400964: completer /sandbox/goodell/scratch/flag-hb/foo.c:34
==27470== by 0x4C2882D: mythread_wrapper /sandbox/goodell/drt/third_party/valgrind/helgrind/hg_intercepts.c:221
==27470== by 0x4E313F6: start_thread (in /lib/libpthread-2.7.so)
==27470== by 0x511EB4C: clone (in /lib/libc-2.7.so)
==27470== This conflicts with a previous read of size 4 by thread #1
==27470== at 0x4009BE: main /sandbox/goodell/scratch/flag-hb/foo.c:49
==27470== Location 0x600f08 is 0 bytes inside global var "flag"
==27470== declared at foo.c:20
==27470==
==27470== Possible data race during read of size 4 at 0x600f08 by thread #1
==27470== at 0x4009BE: main /sandbox/goodell/scratch/flag-hb/foo.c:49
==27470== This conflicts with a previous write of size 4 by thread #2
==27470== at 0x400964: completer /sandbox/goodell/scratch/flag-hb/foo.c:34
==27470== by 0x4C2882D: mythread_wrapper /sandbox/goodell/drt/third_party/valgrind/helgrind/hg_intercepts.c:221
==27470== by 0x4E313F6: start_thread (in /lib/libpthread-2.7.so)
==27470== by 0x511EB4C: clone (in /lib/libc-2.7.so)
==27470== Location 0x600f08 is 0 bytes inside global var "flag"
==27470== declared at foo.c:20
==27470==
shared_data=0xdeadbeef
------8<------
|
|
From: Konstantin S. <kon...@gm...> - 2010-07-08 06:50:35
|
Hi Dave,
If I run your test code under tsan (with dynamic annotations enabled)
I get the race report on flag and *don't* get a race report on
shared_data.
I.e. the annotations ANNOTATE_HAPPENS_BEFORE/AFTER work as expected.
$ gcc -g -lpthread atomic_notification.c dynamic_annotations.c -I.
-DDYNAMIC_ANNOTATIONS_ENABLED=1
$ tsan ./a.out
==4972== WARNING: Possible data race during write of size 4 at 0x601870: {{{
==4972== T1 (locks held: {}):
==4972== #0 completer /tmp/ttt/atomic_notification.c:34
==4972== #1 ThreadSanitizerStartThread
/tmp/tsan/drt/tsan/ts_valgrind_intercepts.c:600
==4972== Concurrent read(s) happened at (OR AFTER) these points:
==4972== T0 (locks held: {}):
==4972== #0 main /tmp/ttt/atomic_notification.c:49
==4972== Address 0x601870 is 0 bytes inside data symbol "flag"
==4972== Race verifier data: 0x4007B0,0x40080A
==4972== }}}
If you add ANNOTATE_BENIGN_RACE(&flag, "lock-free synchronization
using barriers"); this report will be gone too.
If you are accessing 'flag' via __sync builtins, you don't need to use
ANNOTATE_BENIGN_RACE (at least with tsan) because tsan ignores all
instructions with lock prefix.
On Thu, Jul 8, 2010 at 2:20 AM, Dave Goodell <go...@mc...> wrote:
> I have some multithreaded code that signals object "completion" with a flag variable. Initially the flag value is non-zero (almost always 1) and one thread (A) will do some work to determine the values for various fields in the object. When thread A is finished with the object, it indicates that the object is complete by setting the flag to zero. This is a one-way transition, the object will be read exclusively by the (guaranteed) single consumer thread (B) and eventually destroyed by thread B.
>
> I have inserted memory barriers in both the producer's and consumer's code to provide store-release/load-acquire semantics with the flag variable. Well, in the example code at the bottom, I used full barriers, but I think that acquire/release is really all I need.
>
> These reads and writes are of course flagged by the various valgrind threading tools (helgrind/drd/tsan) as potential races. However I was surprised to find that I could not successfully annotate this sort of behavior in the same way as I would an atomic reference counting scheme. At the bottom of this mail is some example code that illustrates my scenario. The read itself is flagged as racy before the HAPPENS_AFTER annotation. In DRD and TSan, I can at least ANNOTATE_BENIGN_RACE as a last resort, but Helgrind doesn't support this annotation.
>
> Changing the unconditional write in the producer thread to an atomic decrement makes the annotations work again, as in refcounting. This may be what I use in the end to solve this problem, but I do still want to understand why I can't annotate it correctly.
>
> Q1: I know it's potentially error prone in terms of usage and implementation, but I'm attempting something that is basically sane, right?
This looks sane and even correct to me, at least on x86.
I am not a real expert in lock-free synchronization, so I tend to use
someone else's wisdom, e.g.
Release_Store and Acquire_Load from
http://src.chromium.org/viewvc/chrome/trunk/src/base/atomicops.h?revision=21709&view=markup
>
> Q2: Is there a way to correctly annotate the store/load pair, ideally across all three tools? Or for helgrind am I stuck with suppressions?
So, here you have two problems:
1. Explain that Release_Store/Acquire_Load creates a happens-before
arc (you've done it right and it should work for all three)
2. Explain that the race on flag is benign. Yes, I afraid with current
helgrind you are stuck with suppressions. Julian?
--kcc
>
> Thanks,
> -Dave
>
> ------8<------
> #include <stdio.h>
> #include <stdlib.h>
> #include <assert.h>
> #include <unistd.h>
> #include <pthread.h>
>
> /* change this to change tool annotations */
> #define USE_HELGRIND
> /* don't define this to get atomic_fetch_and_decr */
> #define USE_PLAIN_STORE
>
> #if defined(USE_HELGRIND)
> #include <valgrind/helgrind.h>
> #elif defined(USE_DRD)
> #include <valgrind/drd.h>
> #elif defined(USE_TSAN)
> #include <dynamic_annotations.h>
> #endif
>
> volatile int flag = 1;
> int shared_data;
>
> void *completer(void *context)
> {
> sleep(5); /* do some "work" */
>
> /* this thread exclusively owns shared_data */
> shared_data = 0xdeadbeef;
>
> /* now indicate the shared_data is complete and owned by the other thread */
> __sync_synchronize(); /* full memory barrier */
> ANNOTATE_HAPPENS_BEFORE(&flag);
> #if defined(USE_PLAIN_STORE)
> flag = 0;
> #else
> __sync_sub_and_fetch(&flag, 1);
> #endif
> return NULL;
> }
>
> int main(int argc, char **argv)
> {
> int err;
> pthread_t other_thread;
> err = pthread_create(&other_thread, NULL, completer, NULL);
> assert(!err);
>
> while (1) {
> if (0 == flag) {
> ANNOTATE_HAPPENS_AFTER(&flag);
> __sync_synchronize(); /* full memory barrier */
> break;
> }
> }
>
> /* this thread now owns the shared data and can access it freely */
> printf("shared_data=%#x\n", shared_data);
> pthread_join(other_thread, NULL);
> return 0;
> }
> ------8<------
>
> And here's the output from Helgrind:
>
> ------8<------
> % valgrind --tool=helgrind --read-var-info=yes -q ./a.out
> ==27470== Thread #2 was created
> ==27470== at 0x511EB0E: clone (in /lib/libc-2.7.so)
> ==27470== by 0x4E31A11: pthread_create@@GLIBC_2.2.5 (in /lib/libpthread-2.7.so)
> ==27470== by 0x4C286AA: pthread_create_WRK /sandbox/goodell/drt/third_party/valgrind/helgrind/hg_intercepts.c:257
> ==27470== by 0x4C287AA: pthread_create@* /sandbox/goodell/drt/third_party/valgrind/helgrind/hg_intercepts.c:288
> ==27470== by 0x40099B: main /sandbox/goodell/scratch/flag-hb/foo.c:45
> ==27470==
> ==27470== Thread #1 is the program's root thread
> ==27470==
> ==27470== Possible data race during write of size 4 at 0x600f08 by thread #2
> ==27470== at 0x400964: completer /sandbox/goodell/scratch/flag-hb/foo.c:34
> ==27470== by 0x4C2882D: mythread_wrapper /sandbox/goodell/drt/third_party/valgrind/helgrind/hg_intercepts.c:221
> ==27470== by 0x4E313F6: start_thread (in /lib/libpthread-2.7.so)
> ==27470== by 0x511EB4C: clone (in /lib/libc-2.7.so)
> ==27470== This conflicts with a previous read of size 4 by thread #1
> ==27470== at 0x4009BE: main /sandbox/goodell/scratch/flag-hb/foo.c:49
> ==27470== Location 0x600f08 is 0 bytes inside global var "flag"
> ==27470== declared at foo.c:20
> ==27470==
> ==27470== Possible data race during read of size 4 at 0x600f08 by thread #1
> ==27470== at 0x4009BE: main /sandbox/goodell/scratch/flag-hb/foo.c:49
> ==27470== This conflicts with a previous write of size 4 by thread #2
> ==27470== at 0x400964: completer /sandbox/goodell/scratch/flag-hb/foo.c:34
> ==27470== by 0x4C2882D: mythread_wrapper /sandbox/goodell/drt/third_party/valgrind/helgrind/hg_intercepts.c:221
> ==27470== by 0x4E313F6: start_thread (in /lib/libpthread-2.7.so)
> ==27470== by 0x511EB4C: clone (in /lib/libc-2.7.so)
> ==27470== Location 0x600f08 is 0 bytes inside global var "flag"
> ==27470== declared at foo.c:20
> ==27470==
> shared_data=0xdeadbeef
> ------8<------
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
>
|
|
From: Dave G. <go...@mc...> - 2010-07-08 16:20:09
|
On Jul 8, 2010, at 1:50 AM CDT, Konstantin Serebryany wrote: > Hi Dave, > > If I run your test code under tsan (with dynamic annotations enabled) > I get the race report on flag and *don't* get a race report on > shared_data. > I.e. the annotations ANNOTATE_HAPPENS_BEFORE/AFTER work as expected. Sure, I wasn't obtaining a race on shared_data in any of the tools, IIRC. So in that sense the HB/HA annotations are indeed working. >> Q2: Is there a way to correctly annotate the store/load pair, ideally across all three tools? Or for helgrind am I stuck with suppressions? > > So, here you have two problems: > 1. Explain that Release_Store/Acquire_Load creates a happens-before > arc (you've done it right and it should work for all three) Right, that part was working for me all along. I don't recall receiving a race warning for "shared_data". My question in my original email was imprecise. > 2. Explain that the race on flag is benign. Yes, I afraid with current > helgrind you are stuck with suppressions. Julian? This is what I was more concerned about. For DRD and TSan it's easy to mark these as benign. However creating suppressions for a large library where this pattern occurs in many places is harder to do. I would be inclined to ditch Helgrind altogether because of this, but its lock-ordering checks are extremely helpful for me right now. Maybe I'll hack around it for now by changing the plain store to an atomic swap operation when building/running under valgrind. That ought to silence all three tools. Thanks, -Dave |
|
From: Bart V. A. <bva...@ac...> - 2010-07-08 10:27:14
|
On Thu, Jul 8, 2010 at 12:20 AM, Dave Goodell <go...@mc...> wrote: > > I have some multithreaded code that signals object "completion" with a flag variable. Initially the flag value is non-zero (almost always 1) and one thread (A) will do some work to determine the values for various fields in the object. When thread A is finished with the object, it indicates that the object is complete by setting the flag to zero. This is a one-way transition, the object will be read exclusively by the (guaranteed) single consumer thread (B) and eventually destroyed by thread B. > > I have inserted memory barriers in both the producer's and consumer's code to provide store-release/load-acquire semantics with the flag variable. Well, in the example code at the bottom, I used full barriers, but I think that acquire/release is really all I need. > > These reads and writes are of course flagged by the various valgrind threading tools (helgrind/drd/tsan) as potential races. However I was surprised to find that I could not successfully annotate this sort of behavior in the same way as I would an atomic reference counting scheme. At the bottom of this mail is some example code that illustrates my scenario. The read itself is flagged as racy before the HAPPENS_AFTER annotation. In DRD and TSan, I can at least ANNOTATE_BENIGN_RACE as a last resort, but Helgrind doesn't support this annotation. > > Changing the unconditional write in the producer thread to an atomic decrement makes the annotations work again, as in refcounting. This may be what I use in the end to solve this problem, but I do still want to understand why I can't annotate it correctly. > > Q1: I know it's potentially error prone in terms of usage and implementation, but I'm attempting something that is basically sane, right? > > Q2: Is there a way to correctly annotate the store/load pair, ideally across all three tools? Or for helgrind am I stuck with suppressions? Please check whether the issue described in the following bug report affects your test program: https://bugs.kde.org/show_bug.cgi?id=243935. Bart. |
|
From: Dave G. <go...@mc...> - 2010-07-08 15:56:15
|
On Jul 8, 2010, at 5:27 AM CDT, Bart Van Assche wrote: > On Thu, Jul 8, 2010 at 12:20 AM, Dave Goodell <go...@mc...> wrote: [...] >> >> Q1: I know it's potentially error prone in terms of usage and implementation, but I'm attempting something that is basically sane, right? >> >> Q2: Is there a way to correctly annotate the store/load pair, ideally across all three tools? Or for helgrind am I stuck with suppressions? > > Please check whether the issue described in the following bug report > affects your test program: > https://bugs.kde.org/show_bug.cgi?id=243935. I can reproduce #243935 on my machine (@r11206). I did not realize that the DRD and Helgrind HB/HA annotations were supposed to be binary compatible, which seems to be what this ticket is saying. I only assumed source level compatibility and rebuilt with different headers between different tool runs. At the bottom of this email is the result of a run of DRD with USE_DRD defined instead of USE_HELGRIND. It also warns about the "flag" variable, so I don't think I'm being hit by #243935. The HB/HA annotations do work for shared_data, but not for the flag variable. This makes sense to me, because I'm only indicating the HA point after the "racy" read. However I don't understand the correct way to annotate this without just telling the tool to ignore races on this memory location. I feel like what I need is an "atomic load" that will be treated the same way as a more complicated atomic operation like CAS or fetch_and_add. The various tools don't seem to have a problem with those operations (undef'ing USE_PLAIN_STORE in the test code makes DRD happy again). -Dave ------8<------ % valgrind --tool=drd --read-var-info=yes -q ./a.out ==32285== Thread 2: ==32285== Conflicting store by thread 2 at 0x00600f78 size 4 ==32285== at 0x40090E: completer /sandbox/goodell/scratch/flag-hb/foo.c:34 ==32285== by 0x4C29CFF: vgDrd_thread_wrapper /sandbox/goodell/drt/third_party/valgrind/drd/drd_pthread_intercepts.c:272 ==32285== by 0x4E3B3F6: start_thread (in /lib/libpthread-2.7.so) ==32285== by 0x5128B4C: clone (in /lib/libc-2.7.so) ==32285== Location 0x600f78 is 0 bytes inside global var "flag" ==32285== declared at foo.c:20 ==32285== Other segment start (thread 1) ==32285== at 0x4C308E4: pthread_create@* /sandbox/goodell/drt/third_party/valgrind/drd/drd_pthread_intercepts.c:431 ==32285== by 0x4009A6: main /sandbox/goodell/scratch/flag-hb/foo.c:45 ==32285== Other segment end (thread 1) ==32285== at 0x4009C9: main /sandbox/goodell/scratch/flag-hb/foo.c:49 ==32285== ==32285== Thread 1: ==32285== Conflicting load by thread 1 at 0x00600f78 size 4 ==32285== at 0x4009C9: main /sandbox/goodell/scratch/flag-hb/foo.c:49 ==32285== Location 0x600f78 is 0 bytes inside global var "flag" ==32285== declared at foo.c:20 ==32285== Other segment start (thread 2) ==32285== (thread finished, call stack no longer available) ==32285== Other segment end (thread 2) ==32285== (thread finished, call stack no longer available) ==32285== shared_data=0xdeadbeef ------8<------ |
|
From: Bart V. A. <bva...@ac...> - 2010-07-08 16:22:44
|
On Thu, Jul 8, 2010 at 5:56 PM, Dave Goodell <go...@mc...> wrote: > On Jul 8, 2010, at 5:27 AM CDT, Bart Van Assche wrote: > >> On Thu, Jul 8, 2010 at 12:20 AM, Dave Goodell <go...@mc...> wrote: > [...] >>> >>> Q1: I know it's potentially error prone in terms of usage and implementation, but I'm attempting something that is basically sane, right? >>> >>> Q2: Is there a way to correctly annotate the store/load pair, ideally across all three tools? Or for helgrind am I stuck with suppressions? >> >> Please check whether the issue described in the following bug report >> affects your test program: >> https://bugs.kde.org/show_bug.cgi?id=243935. > > I can reproduce #243935 on my machine (@r11206). I did not realize that the DRD and Helgrind HB/HA annotations were supposed to be binary compatible, which seems to be what this ticket is saying. I only assumed source level compatibility and rebuilt with different headers between different tool runs. DRD should understand all Helgrind annotations implemented via client requests, but not the other way around. > At the bottom of this email is the result of a run of DRD with USE_DRD defined instead of USE_HELGRIND. It also warns about the "flag" variable, so I don't think I'm being hit by #243935. If you modify your test program such that more than one thread is waiting for the value of the variable "flag" becoming zero, I expect that you will be hit by #243935. > The HB/HA annotations do work for shared_data, but not for the flag variable. This makes sense to me, because I'm only indicating the HA point after the "racy" read. However I don't understand the correct way to annotate this without just telling the tool to ignore races on this memory location. I feel like what I need is an "atomic load" that will be treated the same way as a more complicated atomic operation like CAS or fetch_and_add. The various tools don't seem to have a problem with those operations (undef'ing USE_PLAIN_STORE in the test code makes DRD happy again). There are two ways to get rid of the race report on the flag variable, both which were already mentioned by Konstantin: * Adding an ANNOTATE_BENIGN_RACE() annotation. * Replacing the assignment "flag = 0" by an atomic instruction that sets "flag" to zero. Bart. |
|
From: Dave G. <go...@mc...> - 2010-07-08 16:57:13
|
On Jul 8, 2010, at 11:22 AM CDT, Bart Van Assche wrote: > On Thu, Jul 8, 2010 at 5:56 PM, Dave Goodell <go...@mc...> wrote: >> On Jul 8, 2010, at 5:27 AM CDT, Bart Van Assche wrote: > > DRD should understand all Helgrind annotations implemented via client > requests, but not the other way around. Ahh, thanks for the clarification. >> At the bottom of this email is the result of a run of DRD with USE_DRD defined instead of USE_HELGRIND. It also warns about the "flag" variable, so I don't think I'm being hit by #243935. > > If you modify your test program such that more than one thread is > waiting for the value of the variable "flag" becoming zero, I expect > that you will be hit by #243935. That's possible. In the case of my real code, I never have more than one thread waiting on a given flag thanks to MPI_Request semantics imposed by the MPI standard. However I'll keep it in mind in case I hit any similar case in the future. >> The HB/HA annotations do work for shared_data, but not for the flag variable. This makes sense to me, because I'm only indicating the HA point after the "racy" read. However I don't understand the correct way to annotate this without just telling the tool to ignore races on this memory location. I feel like what I need is an "atomic load" that will be treated the same way as a more complicated atomic operation like CAS or fetch_and_add. The various tools don't seem to have a problem with those operations (undef'ing USE_PLAIN_STORE in the test code makes DRD happy again). > > There are two ways to get rid of the race report on the flag variable, > both which were already mentioned by Konstantin: > * Adding an ANNOTATE_BENIGN_RACE() annotation. > * Replacing the assignment "flag = 0" by an atomic instruction that > sets "flag" to zero. OK, it seems that the consensus is just to do this then. I'll probably go with the atomic instruction just to keep Helgrind happy as well. It would be nice if Helgrind eventually implemented ANNOTATE_BENIGN_RACE or some roughly equivalent functionality. Thanks to all for the clarifications and suggestions. -Dave |