Thread: [Valgrind-users] Problem with helgrind?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I'm currently playing around with helgrind. 
First, I'm trying to understand the annotations, cause I want to use them for my atomics implementation.
Second, I'd like to understand how well helgrind works on transitive happens before/after relationships across threads.

While doing this I observed strange reports from helgrind. It reports a false positive with a strange stack trace.
I have 2 test cases doing similar shared data accesses, the first is synchronized by an atomic flag, the second is unsynchronized using sleep. Yes, I know, this can change the execution order, but in all my runs with and without helgrind, the execution order was correct. I use sleep to provoke data races.

While helgrind correctly reports the data race for the second test case, it fails to recognize the atomic synchronization for the first, although I've annotated the synchronization.
Interesting: If I remove the second test case, the report is correct. No race is found.
I get this report with helgrind 3.10 and 3.9 with RHEL5.

Maybe I don't use the annotations correctly?
Or is this just a false positive I have to live with?
What confuses me is that the stack trace incorrectly reports a data race between the first and the second test thread. Which is impossible and makes me think that maybe the pthread_join() is incorrectly handled by helgrind?

Thanks and best regards,
                Hartmut Pohl

The Code:
// g++ -g -O3 -pthread pt_thread_test.cpp
#include <stdio.h>
#include <pthread.h>
#include <assert.h>
#include <unistd.h>
#include <valgrind/helgrind.h>

template <class TYPE>
class AtomicAccess
{
  public:
    AtomicAccess(TYPE value)
    : mMemoryModel(__ATOMIC_SEQ_CST),
      mValue(value)
    {
      VALGRIND_HG_DISABLE_CHECKING(&mValue, sizeof(mValue));    
    }

    ~AtomicAccess()
    {
      VALGRIND_HG_ENABLE_CHECKING(&mValue, sizeof(mValue)); 
    }

    __attribute__ ((noinline)) TYPE syncLoad()
    {
      TYPE ret = __atomic_load_n(&mValue, mMemoryModel);
      return ret;
    }

    __attribute__ ((noinline)) void syncStore(const TYPE& value)
    {
      __atomic_store_n(&mValue, value, mMemoryModel);
    }

    TYPE * getAddress()
    {
      return &mValue;
    }

  private:
    // Memory model (Initializes to default memory model)
    const int mMemoryModel;
    // Variable to store the value
    TYPE mValue;
};

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
AtomicAccess<bool> atomicWait(true);

void *timedWaitToGiveMutex(void *arg)
{
  printf("%s\n",__FUNCTION__);
  pthread_mutex_lock(&mutex);
  sleep(3);
  pthread_mutex_unlock(&mutex);
}

void *syncWaitToGiveMutex(void *arg)
{
  printf("%s\n",__FUNCTION__);
  pthread_mutex_lock(&mutex);
  bool wait = true;
  do
  {
    wait = atomicWait.syncLoad();
    ANNOTATE_HAPPENS_AFTER(atomicWait.getAddress());    
  }
  while (wait) ;
  pthread_mutex_unlock(&mutex);
}

void *workWithMutex(void *arg)
{
  printf("%s\n",__FUNCTION__);
  pthread_mutex_lock(&mutex);
  int *iptr = (int*) arg;
  (*iptr)++;
  pthread_mutex_unlock(&mutex);
  return iptr;
}

int main()
{
  int i = 10;
  pthread_t tid, tid2;
  int *ret;

// 1st test case: Start T1, take mutex, waits on atomic. Start T2, block on 
//                mutex. 
//                Main writes atomic - releases wait.
//                T1 unblocks, gives mutex, T2 takes it, reads and writes, unlocks
  pthread_create(&tid, 0, syncWaitToGiveMutex, 0);
  sleep(1);
  pthread_create(&tid2, 0, workWithMutex, &i);
  sleep(1);

  i = 30;
  ANNOTATE_HAPPENS_BEFORE(atomicWait.getAddress());
  atomicWait.syncStore(0);

  pthread_join(tid2, (void **)&ret);
  assert(i == *ret);
  pthread_join(tid, (void **)&ret); 

// 2nd test case: Same as 3 but with a timer instead of an atomic operation
  pthread_create(&tid, 0, timedWaitToGiveMutex, 0);
  sleep(1);
  pthread_create(&tid2, 0, workWithMutex, &i);
  sleep(1);
  i = 40;

  pthread_join(tid2, (void **)&ret);
  assert(i == *ret);
  pthread_join(tid, (void **)&ret);
}

The report:
==7704== Helgrind, a thread error detector
==7704== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==7704== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==7704== Command: a.out
==7704== Parent PID: 22355
==7704== 
==7704== ---Thread-Announcement------------------------------------------
==7704== 
==7704== Thread #3 was created
==7704==    at 0x38458D496E: clone (in /lib64/libc-2.5.so)
==7704==    by 0x3846406DCB: create_thread (createthread.c:76)
==7704==    by 0x3846406DCB: pthread_create@@GLIBC_2.2.5 (pthread_create.c:544)
==7704==    by 0x4A10890: pthread_create_WRK (hg_intercepts.cpp:457)
==7704==    by 0x4006CB: main (helgrind_problem.cpp:101)
==7704== 
==7704== ----------------------------------------------------------------
==7704== 
==7704== Possible data race during read of size 8 at 0x4C225D0 by thread #3
==7704== Locks held: none
==7704==    at 0x4A1402F: pthread_mutex_lock (hg_intercepts.cpp:795)
==7704==    by 0x4009F7: workWithMutex(void*) (helgrind_problem.cpp:80)
==7704==    by 0x4A10A26: mythread_wrapper (hg_intercepts.cpp:421)
==7704==    by 0x384640677C: start_thread (pthread_create.c:301)
==7704==    by 0x38458D49AC: clone (in /lib64/libc-2.5.so)
==7704==  Address 0x4c225d0 is 0 bytes inside data symbol "_ZL21fn_pthread_mutex_lock"
==7704== 
==7704== ---Thread-Announcement------------------------------------------
==7704== 
==7704== Thread #5 was created
==7704==    at 0x38458D496E: clone (in /lib64/libc-2.5.so)
==7704==    by 0x3846406DCB: create_thread (createthread.c:76)
==7704==    by 0x3846406DCB: pthread_create@@GLIBC_2.2.5 (pthread_create.c:544)
==7704==    by 0x4A10890: pthread_create_WRK (hg_intercepts.cpp:457)
==7704==    by 0x40079A: main (helgrind_problem.cpp:116)
==7704== 
==7704== ----------------------------------------------------------------
==7704== 
==7704==  Lock at 0x6012E0 was first observed
==7704==    at 0x4A0E385: pthread_mutex_lock_WRK (hg_intercepts.cpp:778)
==7704==    by 0x4A1404C: pthread_mutex_lock (hg_intercepts.cpp:803)
==7704==    by 0x400A40: syncWaitToGiveMutex(void*) (helgrind_problem.cpp:65)
==7704==    by 0x4A10A26: mythread_wrapper (hg_intercepts.cpp:421)
==7704==    by 0x384640677C: start_thread (pthread_create.c:301)
==7704==    by 0x38458D49AC: clone (in /lib64/libc-2.5.so)
==7704==  Address 0x6012e0 is 0 bytes inside data symbol "mutex"
==7704== 
==7704== Possible data race during read of size 4 at 0xFFEFFFE40 by thread #5
==7704== Locks held: 1, at address 0x6012E0
==7704==    at 0x4009F8: workWithMutex(void*) (helgrind_problem.cpp:82)
==7704==    by 0x4A10A26: mythread_wrapper (hg_intercepts.cpp:421)
==7704==    by 0x384640677C: start_thread (pthread_create.c:301)
==7704==    by 0x38458D49AC: clone (in /lib64/libc-2.5.so)
==7704== 
==7704== This conflicts with a previous write of size 4 by thread #3
==7704== Locks held: 1, at address 0x6012E0
==7704==    at 0x4009F8: workWithMutex(void*) (helgrind_problem.cpp:82)
==7704==    by 0x4A10A26: mythread_wrapper (hg_intercepts.cpp:421)
==7704==    by 0x384640677C: start_thread (pthread_create.c:301)
==7704==    by 0x38458D49AC: clone (in /lib64/libc-2.5.so)
==7704==  Address 0xffefffe40 is on thread #1's stack
==7704==  in frame #2, created by main (helgrind_problem.cpp:90)
==7704== 
==7704== ---Thread-Announcement------------------------------------------
==7704== 
==7704== Thread #1 is the program's root thread
==7704== 
==7704== ----------------------------------------------------------------
==7704== 
==7704==  Lock at 0x6012E0 was first observed
==7704==    at 0x4A0E385: pthread_mutex_lock_WRK (hg_intercepts.cpp:778)
==7704==    by 0x4A1404C: pthread_mutex_lock (hg_intercepts.cpp:803)
==7704==    by 0x400A40: syncWaitToGiveMutex(void*) (helgrind_problem.cpp:65)
==7704==    by 0x4A10A26: mythread_wrapper (hg_intercepts.cpp:421)
==7704==    by 0x384640677C: start_thread (pthread_create.c:301)
==7704==    by 0x38458D49AC: clone (in /lib64/libc-2.5.so)
==7704==  Address 0x6012e0 is 0 bytes inside data symbol "mutex"
==7704== 
==7704== Possible data race during write of size 4 at 0xFFEFFFE40 by thread #5
==7704== Locks held: 1, at address 0x6012E0
==7704==    at 0x4009F8: workWithMutex(void*) (helgrind_problem.cpp:82)
==7704==    by 0x4A10A26: mythread_wrapper (hg_intercepts.cpp:421)
==7704==    by 0x384640677C: start_thread (pthread_create.c:301)
==7704==    by 0x38458D49AC: clone (in /lib64/libc-2.5.so)
==7704== 
==7704== This conflicts with a previous read of size 4 by thread #1
==7704== Locks held: none
==7704==    at 0x400750: main (helgrind_problem.cpp:109)
==7704==  Address 0xffefffe40 is on thread #1's stack
==7704==  in frame #2, created by main (helgrind_problem.cpp:90)
==7704== 
==7704== 
==7704== For counts of detected and suppressed errors, rerun with: -v
==7704== Use --history-level=approx or =none to gain increased speed, at
==7704== the cost of reduced accuracy of conflicting-access information
==7704== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 96 from 43)

Thread: [Valgrind-users] Problem with helgrind?

valgrind-users