|
From: Ivo R. <iv...@iv...> - 2015-02-03 17:53:45
|
Hello developers, I am seeking advices with the following problem. I am currently porting Helgrind to Solaris OS (part of valgrind-solaris [1]). I have described all pthread and non-pthread intercepts, and Helgrind seems to be reporting races as it should be. But it also reports many false positives which I tracked down to falsely reporting a stack variable as accessed by two (or more) threads. First some background about Solaris. Solaris has its own libc and compared for example to GNU libc (utilized by Linux), threaded programs could exhibit different behaviour as regards to scheduling. The following utilizes a simple test program drd/tests/atomic_var which creates two threads and these threads read/write a global variable without any synchronization. On Linux typical scheduling is (on a machine with one CPU): 1. Main thread running 2. Main thread calling pthread_create(t1) [thread t1 created] 3. Main thread calling pthread_create(t2) [thread t2 created] 4. Main thread calling pthread_join() 5. Thread t1 running and exiting 6. Thread t2 running and exiting 7. Main thread joining t1 and t2 While on Solaris I observe in this case: 1. Main thread running 2. Main thread calling pthread_create(t1) [thread t1 created] 3. Thread t1 running and exiting 4. Main thread calling pthread_create(t2) [thread t2 created] 5. Thread t2 running and exiting 6. Main thread calling pthread_join() 7. Main thread joining t1 and t2 Because threads t1 and t2 (in this example) run serially (but there is no synchronization between them!) they also get the same stack; that is stack from thread t1 is reused for thread t2. That is not the case on Linux because threads got two different stacks. And Helgrind for some unknown reason reports all stack variables as falsely accessed with race. One such false report: ==4432== Possible data race during read of size 8 at 0x7FFC5FF70 by thread #3 ==4432== Locks held: none ==4432== at 0x7FFF5C9BE: mythread_wrapper (hg_intercepts.c:367) ==4432== by 0x7FFECFC5F: _thrp_setup (in /lib/amd64/libc.so.1) ==4432== by 0x7FFECFF3F: ??? (in /lib/amd64/libc.so.1) ==4432== ==4432== This conflicts with a previous write of size 8 by thread #2 ==4432== Locks held: none ==4432== at 0x7FFEB6C50: set_cancel_pending_flag (in /lib/amd64/libc.so.1) ==4432== Address 0x7ffc5ff70 is on thread #3's stack ==4432== in frame #0, created by mythread_wrapper (hg_intercepts.c:342) After having enabled tracing in hg_main.c, I can confirm that address 0x7ffc5ff70 first belonged to thread #2 and when it exited the same stack got assigned to thread #3: evh__pre_thread_ll_create(p=1, c=2) [Thread #2 is created] evh__new_mem_stack(0x7FFC5FF70, 8) evh__die_mem(0x7FFA62000, 2088960) [stack killed] evh__pre_thread_ll_exit(thr=2) evh__pre_thread_ll_create(p=1, c=2) [this is actually Thread #3] evh__new_mem_stack(0x7FFC5FF70, 8) evh__die_mem(0x7FFA62000, 2088960) [stack killed] evh__pre_thread_ll_exit(thr=2) I observed how Helgrind handles malloc/free and it seems to me that ultimately the same shadow_mem_make_NoAccess_NoFX() is called, as for the thread stack. I also read "avoid memory recycling" paragraph in [2]. But it is unclear to me if that applies also to thread stacks. How can I reason why Helgrind thinks there is a race here? What kind of tracing I need to enable to obtain necessary information? I am familiar with code in hg_intercepts.c, hg_main.c but did not study libhb... Kind regards, Ivo Raisr [1] https://bitbucket.org/setupji/valgrind-solaris [2] http://www.valgrind.org/docs/manual/hg-manual.html#hg-manual.effective-use |