|
From: Konstantin S. <kon...@gm...> - 2009-03-30 13:17:36
|
Hi valgrind developers, Does valgrind core do something special about TLS (thread-local storage)? I've recently discovered a false positive in ThreadSanitizer which is caused by TLS. Helgrind also emits a race report on the same test case. Test: // - Thread1 starts // - Thread1 touches per_thread_global // - Thread1 ends // - Thread2 starts (and there is no happens-before relation between it and Thread1) // - Thread2 touches per_thread_global // It may happen so that Thread2 will have per_thread_global in the same address // as Thread1. Since there is no happens-before relation between threads, // ThreadSanitizer reports a race. The runable code is in http://code.google.com/p/data-race-test/source/browse/trunk/unittest/racecheck_unittest.cc?spec=svn939&r=939#6017 As I understand, the valgrind core handles stack allocation and deallocation itself. All that tool needs to do is to call VG_(track_new_mem_stack) and similar. Is there anything like this for TLS? Thanks, --kcc |
|
From: Nicholas N. <n.n...@gm...> - 2009-03-30 13:45:19
|
On Mon, Mar 30, 2009 at 8:17 AM, Konstantin Serebryany <kon...@gm...> wrote: > Hi valgrind developers, > > Does valgrind core do something special about TLS (thread-local storage)? > > I've recently discovered a false positive in ThreadSanitizer which is > caused by TLS. > Helgrind also emits a race report on the same test case. > Test: > // - Thread1 starts > // - Thread1 touches per_thread_global > // - Thread1 ends > // - Thread2 starts (and there is no happens-before relation between > it and Thread1) > // - Thread2 touches per_thread_global > // It may happen so that Thread2 will have per_thread_global in the same address > // as Thread1. Since there is no happens-before relation between threads, > // ThreadSanitizer reports a race. > The runable code is in > http://code.google.com/p/data-race-test/source/browse/trunk/unittest/racecheck_unittest.cc?spec=svn939&r=939#6017 > > As I understand, the valgrind core handles stack allocation and > deallocation itself. > All that tool needs to do is to call VG_(track_new_mem_stack) and similar. > Is there anything like this for TLS? No; it's never come up before... Nick |
|
From: Konstantin S. <kon...@gm...> - 2009-03-31 09:33:56
|
The problem seems to be even worth. I've got a false positive on a stack object. Both Helgrind and ThreadSanitizer are affected. The scenario is the same: - Thread1 starts - Thread1 touches stack - Thread1 ends - Thread2 starts (and there is no happens-before relation between it and Thread1) - Thread2 touches stack It may happen so that Thread2 will have it's stack in the same address as Thread1. Since there is no happens-before relation between threads, ThreadSanitizer & Helgrind report a race. % ~/valgrind/trunk/Inst/bin/valgrind --tool=helgrind ./racecheck_unittest 131 ... ==23797== Possible data race during write of size 4 at 0x1282b4fc by thread #11 ==23797== at 0x403093: test131::RealWorker() (racecheck_unittest.cc:6066) ==23797== by 0x41E9F8: MyThread::ThreadBody(MyThread*) (thread_wrappers_pthread.h:320) ==23797== by 0xD54070F: mythread_wrapper (hg_intercepts.c:194) ==23797== by 0xD7490C9: start_thread (in /usr/grte/v1/lib64/libpthread-2.3.6.so) ==23797== by 0xE1AF811: clone (in /usr/grte/v1/lib64/libc-2.3.6.so) ==23797== This conflicts with a previous write of size 4 by thread #9 ==23797== at 0x403093: test131::RealWorker() (racecheck_unittest.cc:6066) ==23797== by 0x41E9F8: MyThread::ThreadBody(MyThread*) (thread_wrappers_pthread.h:320) ==23797== by 0xD54070F: mythread_wrapper (hg_intercepts.c:194) ==23797== by 0xD7490C9: start_thread (in /usr/grte/v1/lib64/libpthread-2.3.6.so) ==23797== by 0xE1AF811: clone (in /usr/grte/v1/lib64/libc-2.3.6.so) It looks like neither Helgrind nor ThreadSanitizer clears the state of the stack memory (and TLS data) after a thread dies. Any idea? --kcc On Mon, Mar 30, 2009 at 5:34 PM, Nicholas Nethercote <n.n...@gm...> wrote: > On Mon, Mar 30, 2009 at 8:17 AM, Konstantin Serebryany > <kon...@gm...> wrote: >> Hi valgrind developers, >> >> Does valgrind core do something special about TLS (thread-local storage)? >> >> I've recently discovered a false positive in ThreadSanitizer which is >> caused by TLS. >> Helgrind also emits a race report on the same test case. >> Test: >> // - Thread1 starts >> // - Thread1 touches per_thread_global >> // - Thread1 ends >> // - Thread2 starts (and there is no happens-before relation between >> it and Thread1) >> // - Thread2 touches per_thread_global >> // It may happen so that Thread2 will have per_thread_global in the same address >> // as Thread1. Since there is no happens-before relation between threads, >> // ThreadSanitizer reports a race. >> The runable code is in >> http://code.google.com/p/data-race-test/source/browse/trunk/unittest/racecheck_unittest.cc?spec=svn939&r=939#6017 >> >> As I understand, the valgrind core handles stack allocation and >> deallocation itself. >> All that tool needs to do is to call VG_(track_new_mem_stack) and similar. >> Is there anything like this for TLS? > > No; it's never come up before... > > Nick > |
|
From: Julian S. <js...@ac...> - 2009-03-31 09:59:35
|
On Tuesday 31 March 2009, Konstantin Serebryany wrote: > The problem seems to be even worth. > I've got a false positive on a stack object. Both Helgrind and > ThreadSanitizer are affected. > The scenario is the same: > > - Thread1 starts > - Thread1 touches stack > - Thread1 ends > - Thread2 starts (and there is no happens-before relation between it > and Thread1) > - Thread2 touches stack > It may happen so that Thread2 will have it's stack in the same address > as Thread1. > Since there is no happens-before relation between threads, > ThreadSanitizer & Helgrind report a race. I know that libpthread re-uses stacks from finished threads for new threads, so as to avoid the overhead of munmapping them and then mmaping them for the new thread. But I don't see how it can reuse a stack for a new thread without knowing that the old thread is finished. In other words libpthread must know there is a h-b dependency there, even if TS and H don't see it. Do you think this is a correct analysis? J |
|
From: Konstantin S. <kon...@gm...> - 2009-03-31 10:05:43
|
On Tue, Mar 31, 2009 at 2:03 PM, Julian Seward <js...@ac...> wrote: > On Tuesday 31 March 2009, Konstantin Serebryany wrote: >> The problem seems to be even worth. >> I've got a false positive on a stack object. Both Helgrind and >> ThreadSanitizer are affected. >> The scenario is the same: >> >> - Thread1 starts >> - Thread1 touches stack >> - Thread1 ends >> - Thread2 starts (and there is no happens-before relation between it >> and Thread1) >> - Thread2 touches stack >> It may happen so that Thread2 will have it's stack in the same address >> as Thread1. >> Since there is no happens-before relation between threads, >> ThreadSanitizer & Helgrind report a race. > > I know that libpthread re-uses stacks from finished threads for new > threads, so as to avoid the overhead of munmapping them and then mmaping > them for the new thread. But I don't see how it can reuse a stack for a > new thread without knowing that the old thread is finished. In other words > libpthread must know there is a h-b dependency there, even if TS and H don't > see it. Oh, sure there is a h-b dependency somewhere inside pthread lib. But we don't see it. Since H is pure h-b, this h-b dependency is not done by pthread_mutex_lock. --kcc > > Do you think this is a correct analysis? > > J > |
|
From: Konstantin S. <kon...@gm...> - 2009-03-31 10:08:09
|
On Tue, Mar 31, 2009 at 2:05 PM, Konstantin Serebryany <kon...@gm...> wrote: > On Tue, Mar 31, 2009 at 2:03 PM, Julian Seward <js...@ac...> wrote: >> On Tuesday 31 March 2009, Konstantin Serebryany wrote: >>> The problem seems to be even worth. >>> I've got a false positive on a stack object. Both Helgrind and >>> ThreadSanitizer are affected. >>> The scenario is the same: >>> >>> - Thread1 starts >>> - Thread1 touches stack >>> - Thread1 ends >>> - Thread2 starts (and there is no happens-before relation between it >>> and Thread1) >>> - Thread2 touches stack >>> It may happen so that Thread2 will have it's stack in the same address >>> as Thread1. >>> Since there is no happens-before relation between threads, >>> ThreadSanitizer & Helgrind report a race. >> >> I know that libpthread re-uses stacks from finished threads for new >> threads, so as to avoid the overhead of munmapping them and then mmaping >> them for the new thread. But I don't see how it can reuse a stack for a >> new thread without knowing that the old thread is finished. In other words >> libpthread must know there is a h-b dependency there, even if TS and H don't >> see it. > > Oh, sure there is a h-b dependency somewhere inside pthread lib. But > we don't see it. > Since H is pure h-b, this h-b dependency is not done by pthread_mutex_lock. There are at least two ways to fix this: - somehow recognize this h-b dependency introduced by libpthread - clear the state of stack/tls memory. Second way is preferable since it will not hide real races... --kcc > > --kcc > >> >> Do you think this is a correct analysis? >> >> J >> > |
|
From: Bart V. A. <bar...@gm...> - 2009-03-31 11:16:57
|
On Tue, Mar 31, 2009 at 12:08 PM, Konstantin Serebryany
<kon...@gm...> wrote:
> There are at least two ways to fix this:
> - somehow recognize this h-b dependency introduced by libpthread
> - clear the state of stack/tls memory.
>
> Second way is preferable since it will not hide real races...
When the Valgrind core notifies a tool that a thread exits the stack
pointer of that thread is still below the top of the stack because of
the area reserved by the NPTL. In the DRD tool the state of the
top-of-stack area is cleared explicitly when a thread exits. The
following code in DRD's pre_thread_ll_exit handler performs this task:
drd_stop_using_mem(DRD_(thread_get_stack_min)(drd_tid),
DRD_(thread_get_stack_max)(drd_tid)
- DRD_(thread_get_stack_min)(drd_tid),
True);
Bart.
|
|
From: Konstantin S. <kon...@gm...> - 2009-04-01 09:19:41
|
On Tue, Mar 31, 2009 at 3:16 PM, Bart Van Assche <bar...@gm...> wrote: > On Tue, Mar 31, 2009 at 12:08 PM, Konstantin Serebryany > <kon...@gm...> wrote: >> There are at least two ways to fix this: >> - somehow recognize this h-b dependency introduced by libpthread >> - clear the state of stack/tls memory. >> >> Second way is preferable since it will not hide real races... > > When the Valgrind core notifies a tool that a thread exits the stack > pointer of that thread is still below the top of the stack because of > the area reserved by the NPTL. In the DRD tool the state of the > top-of-stack area is cleared explicitly when a thread exits. The > following code in DRD's pre_thread_ll_exit handler performs this task: > > drd_stop_using_mem(DRD_(thread_get_stack_min)(drd_tid), > DRD_(thread_get_stack_max)(drd_tid) > - DRD_(thread_get_stack_min)(drd_tid), > True); I tried the same thing. It helps sometimes, but not always. I see that the stack provided to clone() syscall is not the same as returned by VG_(thread_get_stack_max) (grr, hard to get a small reproducer) SYSCALL[1799,6]( 56) sys_clone ( 3d0f00, 0xf8cf240, [stack_max=]0xf8d09f0, 0xf8d09f0, [tls=]0xf8d0960 ) --> [pre-success] Success(0x71c) VG_(thread_get_stack_max)=0xF8D0000 VG_(threads)[tid].arch.vex.guest_FS_ZERO=0xF8D0960 (this is tls) As you can see, the 3-rd parameter of sys_clone() is a bit different from what VG_(thread_get_stack_max) returns. Why? Is there any way for a tool to get the 3-rd parameter of sys_clone? On amd64-linux I can get the tls from VG_(threads)[tid].arch.vex.guest_FS_ZERO, but that's too hackish. Thanks, --kcc > > Bart. > |
|
From: Bart V. A. <bar...@gm...> - 2009-04-01 14:47:10
|
On Wed, Apr 1, 2009 at 11:19 AM, Konstantin Serebryany
<kon...@gm...> wrote:
> As you can see, the 3-rd parameter of sys_clone() is a bit different
> from what VG_(thread_get_stack_max) returns.
> Why?
This source code from coregrind/m_syswrap/syswrap-amd64-linux.c
explains it (VG_(thread_get_stack_max)() returns
client_stack_highest_word):
/* We don't really know where the client stack is, because its
allocated by the client. The best we can do is look at the
memory mappings and try to derive some useful information. We
assume that esp starts near its highest possible value, and can
only go down to the start of the mmaped segment. */
seg = VG_(am_find_nsegment)((Addr)rsp);
if (seg && seg->kind != SkResvn) {
ctst->client_stack_highest_word = (Addr)VG_PGROUNDUP(rsp);
ctst->client_stack_szB = ctst->client_stack_highest_word - seg->start;
VG_(register_stack)(seg->start, ctst->client_stack_highest_word);
Bart.
|
|
From: Konstantin S. <kon...@gm...> - 2009-04-01 14:44:15
|
On Wed, Apr 1, 2009 at 6:15 PM, Bart Van Assche
<bar...@gm...> wrote:
> On Wed, Apr 1, 2009 at 11:19 AM, Konstantin Serebryany
> <kon...@gm...> wrote:
>> As you can see, the 3-rd parameter of sys_clone() is a bit different
>> from what VG_(thread_get_stack_max) returns.
>> Why?
>
> This source code from coregrind/m_syswrap/syswrap-amd64-linux.c
> explains it (VG_(thread_get_stack_max)() returns
> client_stack_highest_word):
>
> /* We don't really know where the client stack is, because its
> allocated by the client.
I am not sure this is entirely true, especially for amd64-linux.
We do have the top of the stack as one of the parameters of clone().
No?
--kcc
> The best we can do is look at the
> memory mappings and try to derive some useful information. We
> assume that esp starts near its highest possible value, and can
> only go down to the start of the mmaped segment. */
> seg = VG_(am_find_nsegment)((Addr)rsp);
> if (seg && seg->kind != SkResvn) {
> ctst->client_stack_highest_word = (Addr)VG_PGROUNDUP(rsp);
> ctst->client_stack_szB = ctst->client_stack_highest_word - seg->start;
>
> VG_(register_stack)(seg->start, ctst->client_stack_highest_word);
>
> Bart.
>
|
|
From: Bart V. A. <bar...@gm...> - 2009-04-01 14:47:24
|
On Wed, Apr 1, 2009 at 4:20 PM, Konstantin Serebryany <kon...@gm...> wrote: > On Wed, Apr 1, 2009 at 6:15 PM, Bart Van Assche > <bar...@gm...> wrote: >> On Wed, Apr 1, 2009 at 11:19 AM, Konstantin Serebryany >> <kon...@gm...> wrote: >>> As you can see, the 3-rd parameter of sys_clone() is a bit different >>> from what VG_(thread_get_stack_max) returns. >>> Why? >> >> This source code from coregrind/m_syswrap/syswrap-amd64-linux.c >> explains it (VG_(thread_get_stack_max)() returns >> client_stack_highest_word): >> >> /* We don't really know where the client stack is, because its >> allocated by the client. > I am not sure this is entirely true, especially for amd64-linux. > We do have the top of the stack as one of the parameters of clone(). > No? The comment in coregrind/m_syswrap/syswrap-amd64-linux.c was clearly copied from syswrap-x86-linux.c, so I don't know how accurate the comment is for amd64. But there must have been a reason why the comment was added in syswrap-x86-linux.c originally. Bart. |
|
From: Bart V. A. <bar...@gm...> - 2009-03-30 14:22:46
|
On Mon, Mar 30, 2009 at 3:17 PM, Konstantin Serebryany <kon...@gm...> wrote: > Does valgrind core do something special about TLS (thread-local storage)? Are you familiar with the implementation of TLS in the NPTL ? Bart. |
|
From: Konstantin S. <kon...@gm...> - 2009-03-30 14:33:54
|
On Mon, Mar 30, 2009 at 6:22 PM, Bart Van Assche <bar...@gm...> wrote: > On Mon, Mar 30, 2009 at 3:17 PM, Konstantin Serebryany > <kon...@gm...> wrote: >> Does valgrind core do something special about TLS (thread-local storage)? > > Are you familiar with the implementation of TLS in the NPTL ? Nope. You? The logs produced by helgrind make me think that we need to intercept something like _dl_allocate_tls_init: ==22589== Possible data race during read of size 1 at 0x1203495c by thread #11 ==22589== at 0x40CBB6: test130::RealWorker() (racecheck_unittest.cc:6038) ==22589== by 0x41E8FC: MyThread::ThreadBody(MyThread*) (thread_wrappers_pthread.h:320) ==22589== by 0xD54070F: mythread_wrapper (hg_intercepts.c:194) ==22589== by 0xD7490C9: start_thread (in /usr/grte/v1/lib64/libpthread-2.3.6.so) ==22589== by 0xE1AF811: clone (in /usr/grte/v1/lib64/libc-2.3.6.so) ==22589== This conflicts with a previous write of size 1 by thread #7 ==22589== at 0xC930508: memset (in /usr/grte/v1/lib64/ld-2.3.6.so) ==22589== by 0xC92D643: _dl_allocate_tls_init (in /usr/grte/v1/lib64/ld-2.3.6.so) ==22589== by 0xD7493F9: pthread_create@@GLIBC_2.2.5 (in /usr/grte/v1/lib64/libpthread-2.3.6.so) ==22589== by 0xD5405E4: pthread_create@* (hg_intercepts.c:214) ==22589== by 0x41E84A: MyThread::Start() (thread_wrappers_pthread.h:312) ==22589== by 0x41EB2E: MyThreadArray::Start() (racecheck_unittest.cc:266) ==22589== by 0x4099EB: test130::Worker() (racecheck_unittest.cc:6048) ==22589== by 0x410FE6: test130::Worker2() (racecheck_unittest.cc:6053) ==22589== > > Bart. > |
|
From: Bart V. A. <bar...@gm...> - 2009-03-30 14:42:13
|
On Mon, Mar 30, 2009 at 4:33 PM, Konstantin Serebryany <kon...@gm...> wrote: > On Mon, Mar 30, 2009 at 6:22 PM, Bart Van Assche > <bar...@gm...> wrote: >> On Mon, Mar 30, 2009 at 3:17 PM, Konstantin Serebryany >> <kon...@gm...> wrote: >>> Does valgrind core do something special about TLS (thread-local storage)? >> >> Are you familiar with the implementation of TLS in the NPTL ? > > Nope. You? Did you check whether DRD reports a false positive on your TLS test case ? If I remember correctly, the NPTL allocates an area at the top of the stack of each thread for internal use. This area contains a.o. a lookup table for pthread_key_t values, and this lookup table is accessed from more than one thread. The race conditions triggered by using TLS are in this lookup table. This is why the Drd tool ignores the top-of-stack area allocated by the NPTL. Suppressing race reports on _dl_allocate_tls_init() in Helgrind and ThreadSanitizer might help, but I'm not sure this is a full solution. Bart. |
|
From: Konstantin S. <kon...@gm...> - 2009-03-30 14:53:13
|
On Mon, Mar 30, 2009 at 6:42 PM, Bart Van Assche
<bar...@gm...> wrote:
> On Mon, Mar 30, 2009 at 4:33 PM, Konstantin Serebryany
> <kon...@gm...> wrote:
>> On Mon, Mar 30, 2009 at 6:22 PM, Bart Van Assche
>> <bar...@gm...> wrote:
>>> On Mon, Mar 30, 2009 at 3:17 PM, Konstantin Serebryany
>>> <kon...@gm...> wrote:
>>>> Does valgrind core do something special about TLS (thread-local storage)?
>>>
>>> Are you familiar with the implementation of TLS in the NPTL ?
>>
>> Nope. You?
>
> Did you check whether DRD reports a false positive on your TLS test case ?
DRD says nothing about TLS in this test (it prints lots of reports
about _dl_lookup_symbol_x though)
Helgrind report is given in my previous message.
ThreadSanitizer prints this:
==24659== WARNING: Possible data race during read of size 4 at 0x1243595C: {{{
==24659== T16 (locks held: {}):
==24659== #0 test130::RealWorker() racecheck_unittest.cc:6038
==24659== #1 MyThread::ThreadBody(MyThread*) thread_wrappers_pthread.h:320
==24659== #2 ThreadSanitizerStartThread ts_valgrind_intercepts.c:407
==24659== Concurrent write(s) happened at (OR AFTER) these points:
==24659== T10 (locks held: {}):
==24659== #0 test130::RealWorker() racecheck_unittest.cc:6036
==24659== #1 MyThread::ThreadBody(MyThread*) thread_wrappers_pthread.h:320
==24659== #2 ThreadSanitizerStartThread ts_valgrind_intercepts.c:407
==24659== }}}
ThreadSanitizer and Helgrind reports are different because
ThreadSanitizer ignores everything that happens inside pthread_create.
>
> If I remember correctly, the NPTL allocates an area at the top of the
> stack of each thread for internal use. This area contains a.o. a
> lookup table for pthread_key_t values, and this lookup table is
> accessed from more than one thread. The race conditions triggered by
> using TLS are in this lookup table. This is why the Drd tool ignores
> the top-of-stack area allocated by the NPTL. Suppressing race reports
> on _dl_allocate_tls_init() in Helgrind and ThreadSanitizer might help,
No suppressing.
I think we may need to treat _dl_allocate_tls_init (or something else
there) as a function that clears the memory state.
--kcc
> but I'm not sure this is a full solution.
>
> Bart.
>
|
|
From: Bart V. A. <bar...@gm...> - 2009-03-30 18:10:53
|
On Mon, Mar 30, 2009 at 4:53 PM, Konstantin Serebryany <kon...@gm...> wrote: >> Did you check whether DRD reports a false positive on your TLS test case ? > > DRD says nothing about TLS in this test (it prints lots of reports > about _dl_lookup_symbol_x though) On which distro did this occur ? The following command does not complain about _dl_lookup_symbol_x on Ubuntu 7.10 nor on openSUSE 11.1: ./vg-in-place --tool=drd --check-stack-var=yes drt/unittest/racecheck_unittest 130 Bart. |
|
From: Tom H. <to...@co...> - 2009-04-01 15:03:38
|
Konstantin Serebryany wrote: > On Wed, Apr 1, 2009 at 6:15 PM, Bart Van Assche > <bar...@gm...> wrote: >> On Wed, Apr 1, 2009 at 11:19 AM, Konstantin Serebryany >> <kon...@gm...> wrote: >>> As you can see, the 3-rd parameter of sys_clone() is a bit different >>> from what VG_(thread_get_stack_max) returns. >>> Why? >> This source code from coregrind/m_syswrap/syswrap-amd64-linux.c >> explains it (VG_(thread_get_stack_max)() returns >> client_stack_highest_word): >> >> /* We don't really know where the client stack is, because its >> allocated by the client. > I am not sure this is entirely true, especially for amd64-linux. > We do have the top of the stack as one of the parameters of clone(). > No? That only tells us where one end of the stack is though - we have to guess where the other end is. Our stack handling is all bollocks anyway - there are two different systems trying to track the stacks and neither of them does a good job at all. It all needs a rewrite basically. Tom -- Tom Hughes (to...@co...) http://www.compton.nu/ |
|
From: Julian S. <js...@ac...> - 2009-04-02 07:46:05
|
> Our stack handling is all bollocks anyway - there are two different > systems trying to track the stacks and neither of them does a good job > at all. It all needs a rewrite basically. I agree. What we've got in there right now is a confusing and badly-specified swamp. J |
|
From: Konstantin S. <kon...@gm...> - 2009-04-06 13:54:37
|
I've submitted https://bugs.kde.org/show_bug.cgi?id=188969 for record. --kcc On Thu, Apr 2, 2009 at 11:49 AM, Julian Seward <js...@ac...> wrote: > >> Our stack handling is all bollocks anyway - there are two different >> systems trying to track the stacks and neither of them does a good job >> at all. It all needs a rewrite basically. > > I agree. What we've got in there right now is a confusing and > badly-specified swamp. > > J > |