|
From: Jan V. <jan...@ni...> - 2014-08-18 13:09:49
|
Hello list, we develop an authoritative DNS server called Knot DNS and we use the userspace-rcu library [1] for synchronization. The library implements the synchronization by inserting appropriate sfence/lfence/mfcence instructions and some reference counting around. Currently, we are trying to track down an invalid read reported by Valgrind. The problem appears very rarely and it's quite difficult to reproduce. Valgrind claims that a previously freed memory is being read. However we quite sure that the synchronization is correct and that this problem should not happen. Does Valgrind support the mentioned memory barrier instructions? I'm not rejecting that there is a problem in our code, but I want to make sure that we are chasing a real bug. Thanks and regards Jan [1] http://lttng.org/urcu |
|
From: Alexander P. <gl...@go...> - 2014-08-18 14:01:23
|
Valgrind should correctly translate the memory fence instructions into platform-specific memory fences that are at least not weaker, otherwise every synchronization algorithm would've been broken under Valgrind. On amd64 Valgrind creates an IRStmt_MBE(Imbe_Fence) for sfence/lfence/mfcence (see VEX/priv/guest_amd64_toIR.c), which is later translated to mfence when executing the code (see VEX/priv/host_amd64_isel.c). (On the related note, I wanted to try Knot DNS under a couple of tools, but failed to find any documentation for running the tests) On Mon, Aug 18, 2014 at 5:09 PM, Jan Včelák <jan...@ni...> wrote: > Hello list, > > we develop an authoritative DNS server called Knot DNS and we use the > userspace-rcu library [1] for synchronization. The library implements the > synchronization by inserting appropriate sfence/lfence/mfcence instructions > and some reference counting around. > > Currently, we are trying to track down an invalid read reported by Valgrind. > The problem appears very rarely and it's quite difficult to reproduce. > Valgrind claims that a previously freed memory is being read. However we quite > sure that the synchronization is correct and that this problem should not > happen. > > Does Valgrind support the mentioned memory barrier instructions? > > I'm not rejecting that there is a problem in our code, but I want to make sure > that we are chasing a real bug. > > Thanks and regards > > Jan > > [1] http://lttng.org/urcu > > ------------------------------------------------------------------------------ > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users -- Alexander Potapenko Software Engineer Google Moscow |
|
From: Eliot M. <mo...@cs...> - 2014-08-18 14:25:50
|
On 8/18/2014 9:09 AM, Jan Včelák wrote: > Hello list, > > we develop an authoritative DNS server called Knot DNS and we use the > userspace-rcu library [1] for synchronization. The library implements the > synchronization by inserting appropriate sfence/lfence/mfcence instructions > and some reference counting around. > > Currently, we are trying to track down an invalid read reported by Valgrind. > The problem appears very rarely and it's quite difficult to reproduce. > Valgrind claims that a previously freed memory is being read. However we quite > sure that the synchronization is correct and that this problem should not > happen. > > Does Valgrind support the mentioned memory barrier instructions? If it did not *recognize* the instruction, then valgrind would bomb out with an error message giving the op code byte sequence and saying it did not recognize it. A possibility is that it recognizes the instruction but does not implement it as you expect. Someone else will need to answer as to that possibility. > I'm not rejecting that there is a problem in our code, but I want to make sure > that we are chasing a real bug. You might want to verify that valgrind and your code have the same notion of what the malloc/free routines are, etc., i.e., that valgrind is able to hook into all allocation and freeing calls your application uses. Best wishes -- Eliot Moss |
|
From: Jan V. <jan...@ni...> - 2014-08-19 12:31:44
|
Dne Po 18. srpna 2014 10:25:42, Eliot Moss napsal(a): > On 8/18/2014 9:09 AM, Jan Včelák wrote: > > I'm not rejecting that there is a problem in our code, but I want to make > > sure that we are chasing a real bug. > > You might want to verify that valgrind and your code have the same > notion of what the malloc/free routines are, etc., i.e., that valgrind > is able to hook into all allocation and freeing calls your application > uses. We do not do anything special about malloc/free. I believe the problem originates in some kind of race. Jan |
|
From: Julian S. <js...@ac...> - 2014-08-19 22:15:49
|
On 08/19/2014 02:31 PM, Jan Včelák wrote: > I believe the problem originates in some kind of race. Try --fair-sched=yes to see if you can reproduce the problem more often and/or more reliably. J |
|
From: Jan V. <jan...@ni...> - 2014-08-19 12:23:35
|
OK. Then it is likely that there is a bug in our software. As for the tests - it is a bit complicated. But if you are willing to spend some time on it, we would be very happy. :-) Get latest Knot DNS from git: $ git clone https://gitlab.labs.nic.cz/labs/knot.git Compile the server. We hit the problem with the following configuration: $ autoreconf -fi $ export CC=gcc $ export CFLAGS="-O0 -g -j4" $ ./configure --enable-recvmmsg=no \ --enable-lto=no \ --disable-fastparser \ --disable-shared \ --enable-static $ make $ make check Setup environment for functional tests. You will need Python >= 3.3, BIND 9, lsof, and a few Python modules, which can be installed using pip: $ cd tests-extra $ pip install -r requirements.txt To run the occasionally failing test, execute: $ ./runtests.py ixfr/knot_bind The test establishes Knot DNS as a master server, BIND as a slave and performs simple zone transfer. Our testing machine is 4-core Intel Xeon machine, with 64-bit Linux (Ubuntu 13.10. and 3.11.0-18-generic kernel). I run the test case several times without Valgrind with address sanitizer. But I didn't hit the problem. Jan Dne Po 18. srpna 2014 18:01:15, Alexander Potapenko napsal(a): > Valgrind should correctly translate the memory fence instructions into > platform-specific memory fences that are at least not weaker, > otherwise every synchronization algorithm would've been broken under > Valgrind. > On amd64 Valgrind creates an IRStmt_MBE(Imbe_Fence) for > sfence/lfence/mfcence (see VEX/priv/guest_amd64_toIR.c), which is > later translated to mfence when executing the code (see > VEX/priv/host_amd64_isel.c). > > (On the related note, I wanted to try Knot DNS under a couple of > tools, but failed to find any documentation for running the tests) > > On Mon, Aug 18, 2014 at 5:09 PM, Jan Včelák <jan...@ni...> wrote: > > Hello list, > > > > we develop an authoritative DNS server called Knot DNS and we use the > > userspace-rcu library [1] for synchronization. The library implements the > > synchronization by inserting appropriate sfence/lfence/mfcence > > instructions > > and some reference counting around. > > > > Currently, we are trying to track down an invalid read reported by > > Valgrind. The problem appears very rarely and it's quite difficult to > > reproduce. Valgrind claims that a previously freed memory is being read. > > However we quite sure that the synchronization is correct and that this > > problem should not happen. > > > > Does Valgrind support the mentioned memory barrier instructions? > > > > I'm not rejecting that there is a problem in our code, but I want to make > > sure that we are chasing a real bug. > > > > Thanks and regards > > > > Jan > > > > [1] http://lttng.org/urcu |
|
From: Milian W. <ma...@mi...> - 2014-08-19 14:18:46
|
On Tuesday 19 August 2014 14:23:26 Jan Včelák wrote: > OK. Then it is likely that there is a bug in our software. <snip> > I run the test case several times without Valgrind with address sanitizer. > But I didn't hit the problem. If you think it's a race, then the address sanitizer won't find it, I guess. Rather, try the thread sanitizer. http://clang.llvm.org/docs/ThreadSanitizer.html Cheers -- Milian Wolff ma...@mi... http://milianw.de |
|
From: Jan V. <jan...@ni...> - 2014-08-19 14:09:05
|
> If you think it's a race, then the address sanitizer won't find it, I guess. > Rather, try the thread sanitizer. > > http://clang.llvm.org/docs/ThreadSanitizer.html Unfortunatelly, thread sanitizer does not support synchronization using memory barriers. The same with Helgrind. |
|
From: Philippe W. <phi...@sk...> - 2014-08-19 18:55:56
|
On Tue, 2014-08-19 at 14:23 +0200, Jan Včelák wrote: > Compile the server. We hit the problem with the following configuration: > > $ autoreconf -fi > $ export CC=gcc > $ export CFLAGS="-O0 -g -j4" > $ ./configure --enable-recvmmsg=no \ > --enable-lto=no \ > --disable-fastparser \ > --disable-shared \ > --enable-static If you link with a static malloc library, you have to use --soname-synonyms=somalloc=NONE to have the malloc/free interceptions needed for memcheck, helgrind, drd, ... to work properly. Note also that these tools have a very limited functionality and/or might not work properly if your application is completely statically linked. So ldd on your program should much better show at least one shared lib. Philippe |
|
From: Jan V. <jan...@ni...> - 2014-08-20 12:30:21
|
> > $ ./configure --enable-recvmmsg=no \ > > --enable-lto=no \ > > --disable-fastparser \ > > --disable-shared \ > > --enable-static > > If you link with a static malloc library, you have to use > --soname-synonyms=somalloc=NONE > to have the malloc/free interceptions needed for memcheck, helgrind, > drd, ... to work properly. I've verified, that libc is linked dynamically. The --disable-shared applies only to our internal libraries. So this should not be the case. |
|
From: Alexander P. <gl...@go...> - 2014-08-19 14:13:36
|
On Tue, Aug 19, 2014 at 5:59 PM, Milian Wolff <ma...@mi...> wrote: > On Tuesday 19 August 2014 14:23:26 Jan Včelák wrote: >> OK. Then it is likely that there is a bug in our software. > > <snip> > >> I run the test case several times without Valgrind with address sanitizer. >> But I didn't hit the problem. > > If you think it's a race, then the address sanitizer won't find it, I guess. > Rather, try the thread sanitizer. > > http://clang.llvm.org/docs/ThreadSanitizer.html > > Cheers ThreadSanitizer won't comprehend the fence instructions inserted by urcu. I believe even Helgrind won't, because these instructions do not imply any happens-before relation. |
|
From: Roland M. <rol...@nr...> - 2014-08-19 14:46:12
|
On Tue, Aug 19, 2014 at 4:13 PM, Alexander Potapenko <gl...@go...> wrote: > On Tue, Aug 19, 2014 at 5:59 PM, Milian Wolff <ma...@mi...> wrote: >> On Tuesday 19 August 2014 14:23:26 Jan Včelák wrote: >>> OK. Then it is likely that there is a bug in our software. >> >> <snip> >> >>> I run the test case several times without Valgrind with address sanitizer. >>> But I didn't hit the problem. >> >> If you think it's a race, then the address sanitizer won't find it, I guess. >> Rather, try the thread sanitizer. >> >> http://clang.llvm.org/docs/ThreadSanitizer.html > > ThreadSanitizer won't comprehend the fence instructions inserted by urcu. > I believe even Helgrind won't, because these instructions do not imply > any happens-before relation. Is there any opensource or commercial tool which might help in such situations (e.g. problems with memory barriers) ? ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) rol...@nr... \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) |
|
From: Philippe W. <phi...@sk...> - 2014-08-19 19:00:00
|
On Tue, 2014-08-19 at 16:46 +0200, Roland Mainz wrote: > > ThreadSanitizer won't comprehend the fence instructions inserted by urcu. > > I believe even Helgrind won't, because these instructions do not imply > > any happens-before relation. > > Is there any opensource or commercial tool which might help in such > situations (e.g. problems with memory barriers) ? helgrind or drd or ThreadSanitizer could still be used for race condition detection but you would have to annotate either the rcu library or the calling code to describe the happens before relationships. To my knowledge, there is (some?) (source?) compatibility between the annotations needed for these 3 tools. Philippe |
|
From: Philippe W. <phi...@sk...> - 2014-08-19 19:56:03
|
On Tue, 2014-08-19 at 21:44 +0200, David Faure wrote: > On Tuesday 19 August 2014 21:00:58 Philippe Waroquiers wrote: > > On Tue, 2014-08-19 at 16:46 +0200, Roland Mainz wrote: > > > > ThreadSanitizer won't comprehend the fence instructions inserted by > > > > urcu. > > > > I believe even Helgrind won't, because these instructions do not imply > > > > any happens-before relation. > > > > > > Is there any opensource or commercial tool which might help in such > > > situations (e.g. problems with memory barriers) ? > > > > helgrind or drd or ThreadSanitizer could still be used for race > > condition detection but you would have to annotate either the rcu > > library or the calling code to describe the happens before > > relationships. > > Are such annotations documented somewhere? http://www.valgrind.org/docs/manual/hg-manual.html#hg-manual.client-requests gives a list of such annotations, and points to helgrind.h for more information. > I'm still trying to find a way to annotate threadsafe-statics so that helgrind > doesn't complain about them. What is a threadsafe-static ? Is that using __thread in something like: void fun(void) { static __thread int no_race_on_this_var_is_possible; .... } If that is the case, I am just finishing a change that should avoid false positive in __thread variables. Humph, replace rather 'change' by kludge: The user will have to add the option --sim-hints=no-nptl-pthread-stackcache and that uses a nasty kludge to disable the nptl pthread stack&tls cache, as helgrind does not understand that the memory for e.g. tls __thread variables is "safely" re-usable by another thread once the thread is finished. Philippe |
|
From: Alexander P. <gl...@go...> - 2014-08-19 20:18:28
|
On Aug 19, 2014 11:58 PM, "Philippe Waroquiers" < phi...@sk...> wrote: > > On Tue, 2014-08-19 at 21:44 +0200, David Faure wrote: > > On Tuesday 19 August 2014 21:00:58 Philippe Waroquiers wrote: > > > On Tue, 2014-08-19 at 16:46 +0200, Roland Mainz wrote: > > > > > ThreadSanitizer won't comprehend the fence instructions inserted by > > > > > urcu. > > > > > I believe even Helgrind won't, because these instructions do not imply > > > > > any happens-before relation. > > > > > > > > Is there any opensource or commercial tool which might help in such > > > > situations (e.g. problems with memory barriers) ? > > > > > > helgrind or drd or ThreadSanitizer could still be used for race > > > condition detection but you would have to annotate either the rcu > > > library or the calling code to describe the happens before > > > relationships. > > > > Are such annotations documented somewhere? > http://www.valgrind.org/docs/manual/hg-manual.html#hg-manual.client-requests > gives a list of such annotations, and points to helgrind.h for more > information. > > > I'm still trying to find a way to annotate threadsafe-statics so that helgrind > > doesn't complain about them. > What is a threadsafe-static ? That's a local static that is guarded by __cxa_guard_acquire/__cxa_guard_release to ensure it's initialized only once. Prior to C++11 GCC used to emit those by default, whereas MSVC didn't support them. C++11 mandates that static initialization must be thread-safe. >From the data race detector's point of view the guard object is essentially a lock. > Is that using __thread in something like: > void fun(void) > { > static __thread int no_race_on_this_var_is_possible; > .... > } > > If that is the case, I am just finishing a change that should avoid > false positive in __thread variables. > > Humph, replace rather 'change' by kludge: > The user will have to add the option > --sim-hints=no-nptl-pthread-stackcache > and that uses a nasty kludge to disable the nptl pthread stack&tls > cache, as helgrind does not understand that the memory for e.g. > tls __thread variables is "safely" re-usable by another thread once the > thread is finished. > > Philippe > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
|
From: David F. <fa...@kd...> - 2014-08-19 20:02:32
|
On Tuesday 19 August 2014 21:00:58 Philippe Waroquiers wrote: > On Tue, 2014-08-19 at 16:46 +0200, Roland Mainz wrote: > > > ThreadSanitizer won't comprehend the fence instructions inserted by > > > urcu. > > > I believe even Helgrind won't, because these instructions do not imply > > > any happens-before relation. > > > > Is there any opensource or commercial tool which might help in such > > situations (e.g. problems with memory barriers) ? > > helgrind or drd or ThreadSanitizer could still be used for race > condition detection but you would have to annotate either the rcu > library or the calling code to describe the happens before > relationships. Are such annotations documented somewhere? I'm still trying to find a way to annotate threadsafe-statics so that helgrind doesn't complain about them. -- David Faure, fa...@kd..., http://www.davidfaure.fr Working on KDE Frameworks 5 |
|
From: David F. <fa...@kd...> - 2014-08-19 20:11:13
Attachments:
forwarded message
|
On Tuesday 19 August 2014 21:57:02 Philippe Waroquiers wrote: > On Tue, 2014-08-19 at 21:44 +0200, David Faure wrote: > > On Tuesday 19 August 2014 21:00:58 Philippe Waroquiers wrote: > > > On Tue, 2014-08-19 at 16:46 +0200, Roland Mainz wrote: > > > > > ThreadSanitizer won't comprehend the fence instructions inserted by > > > > > urcu. > > > > > I believe even Helgrind won't, because these instructions do not > > > > > imply > > > > > any happens-before relation. > > > > > > > > Is there any opensource or commercial tool which might help in such > > > > situations (e.g. problems with memory barriers) ? > > > > > > helgrind or drd or ThreadSanitizer could still be used for race > > > condition detection but you would have to annotate either the rcu > > > library or the calling code to describe the happens before > > > relationships. > > > > Are such annotations documented somewhere? > > http://www.valgrind.org/docs/manual/hg-manual.html#hg-manual.client-requests > gives a list of such annotations, and points to helgrind.h for more > information. Thanks. > > I'm still trying to find a way to annotate threadsafe-statics so that > > helgrind doesn't complain about them. > > What is a threadsafe-static ? See older mail to this list, attached. It doesn't use __thread anywhere, but rather lets gcc take care of ensuring thread-safety on static objects (like C++11 mandates, but it has been doing so for a long time already). Is that related to nptl (I'm not sure what that is exactly)? -- David Faure, fa...@kd..., http://www.davidfaure.fr Working on KDE Frameworks 5 |
|
From: Philippe W. <phi...@sk...> - 2014-08-19 20:28:24
|
On Tue, 2014-08-19 at 22:11 +0200, David Faure wrote: > > > I'm still trying to find a way to annotate threadsafe-statics so that > > > helgrind doesn't complain about them. > > > > What is a threadsafe-static ? > > See older mail to this list, attached. > > It doesn't use __thread anywhere, but rather lets gcc take care of ensuring > thread-safety on static objects (like C++11 mandates, but it has been doing so > for a long time already). Quickly re-reading the mail, this is not related. I see that drd has some interceptions that does annotate these like a mutex lock/unlock (see drd/drd_libstdcxx_intercepts.c) and has a test which looks like your problem in drd/tests/local_static.cpp I think a similar code is (trivially) doable for helgrind, inside helgrind/hg_intercepts.c > Is that related to nptl (I'm not sure what that is exactly)? nptl = new posix thread library (not so new now :). It is just the glibc pthread library. The kludge I am doing is not related to your problem. Philippe |
|
From: Philippe W. <phi...@sk...> - 2014-08-19 23:07:41
|
On Tue, 2014-08-19 at 22:29 +0200, Philippe Waroquiers wrote: > > It doesn't use __thread anywhere, but rather lets gcc take care of ensuring > > thread-safety on static objects (like C++11 mandates, but it has been doing so > > for a long time already). > Quickly re-reading the mail, this is not related. > I see that drd has some interceptions that does annotate > these like a mutex lock/unlock (see drd/drd_libstdcxx_intercepts.c) > and has a test which looks like your problem in > drd/tests/local_static.cpp > > I think a similar code is (trivially) doable for helgrind, > inside helgrind/hg_intercepts.c Just tried to do this trivial code. But I still had (what looks like) false positive. Probably explanation is that these drd intercepts are not working (yet), as documented in log for revision 14013: r14013 | bart | 2014-06-09 11:00:42 +0200 (Mon, 09 Jun 2014) | 1 line drd/tests/local_static: Disable because g++ does not yet allow proper interception of initialization of local static variables I have no idea what problem/difficulty was encountered. What is exactly the semantic of a threadsafe static ? I understand it is initialised only once, and that such 'init once' is guaranteed thanks to __cxa_guard_acquire/__cxa_guard_release. However, what is __cxa_guard_abort used for ? Once the object is initialised, I guess it must be either used read-only by all threads, or have a classical way to be protected (e.g. via a mutex). Philippe |
|
From: Milian W. <ma...@mi...> - 2014-08-20 08:54:51
|
On Wednesday 20 August 2014 01:08:41 Philippe Waroquiers wrote: > On Tue, 2014-08-19 at 22:29 +0200, Philippe Waroquiers wrote: > > > It doesn't use __thread anywhere, but rather lets gcc take care of > > > ensuring > > > thread-safety on static objects (like C++11 mandates, but it has been > > > doing so for a long time already). > > > > Quickly re-reading the mail, this is not related. > > I see that drd has some interceptions that does annotate > > these like a mutex lock/unlock (see drd/drd_libstdcxx_intercepts.c) > > and has a test which looks like your problem in > > drd/tests/local_static.cpp > > > > I think a similar code is (trivially) doable for helgrind, > > inside helgrind/hg_intercepts.c > > Just tried to do this trivial code. > But I still had (what looks like) false positive. > Probably explanation is that these drd intercepts are not working (yet), > as documented in log for revision 14013: > r14013 | bart | 2014-06-09 11:00:42 +0200 (Mon, 09 Jun 2014) | 1 line > > drd/tests/local_static: Disable because g++ does not yet allow proper > interception of initialization of local static variables > > I have no idea what problem/difficulty was encountered. > > What is exactly the semantic of a threadsafe static ? > I understand it is initialised only once, and that such 'init once' > is guaranteed thanks to __cxa_guard_acquire/__cxa_guard_release. > However, what is __cxa_guard_abort used for ? > > Once the object is initialised, I guess it must be either used > read-only by all threads, or have a classical way to be protected > (e.g. via a mutex). Not sure if that helps you, but here's an excerpt from the current version of the C++11 standard draf, which you can obtain free-of-charge on isocpp.org [1].From §6.7.4 (Declaration statement): If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization. The implementation must not introduce any deadlock around execution of the initializer. See also [2] and search for "Using a C++11 Static Initializer". [1]: https://isocpp.org/std/the-standard [2]: http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/ HTH -- Milian Wolff ma...@mi... http://milianw.de |