|
From: Martin L. <mar...@da...> - 2005-06-30 09:09:49
|
I came across a problem where it seems that running a multithreaded
application
under valgrinds control does not correctly handle the pthread cleanup
handlers
which you establish by using pthread_cleanup_push and pthread_cleanup_pop.
This behaviour is only observed by using the NPTL. The old LinuxThreads
implementation
does not show the problem.
AFAIK the mechanism used in the NPTL is stack unwinding, which may be the
problem here.
The illustrate the behaviour I have written the following simple example.
I have a main thread which creates another thread and waits for the thread
to be ready. The second thread
sets up a cleanup handler signals its readiness and goes to sleep forever.
The main thread then cancels
the second thread and waits for the thread to be terminated using a
condition variable.
here is the code
--- snip ----
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
pthread_cond_t threadIsReady = PTHREAD_COND_INITIALIZER;
pthread_cond_t threadIsClean = PTHREAD_COND_INITIALIZER;
pthread_mutex_t myMutex = PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP;
bool bThreadIsReady = false;
bool bThreadIsClean = false;
void lastwords(void*)
{
pthread_mutex_lock(&myMutex);
bThreadIsClean = true;
pthread_cond_signal(&threadIsClean);
pthread_mutex_unlock(&myMutex);
}
void* thread_function(void*)
{
pthread_cleanup_push(lastwords, 0);
pthread_mutex_lock(&myMutex);
bThreadIsReady = true;
pthread_cond_signal(&threadIsReady);
pthread_mutex_unlock(&myMutex);
for(;;)
sleep(1);
pthread_cleanup_pop(1);
}
int main(int argc, char* argv[])
{
pthread_t tid;
printf("main: creating thread ... \n");
pthread_create(&tid, 0, thread_function, 0);
printf("main: waiting for thread to be ready ...\n");
pthread_mutex_lock(&myMutex);
while (!bThreadIsReady)
pthread_cond_wait(&threadIsReady, &myMutex );
pthread_mutex_unlock(&myMutex);
printf("main: thread is ready\n");
printf("main: cancelling thread ...\n");
pthread_cancel(tid);
printf("main: waiting for thread to be clean...\n");
pthread_mutex_lock(&myMutex);
while (!bThreadIsClean)
pthread_cond_wait(&threadIsClean, &myMutex );
pthread_mutex_unlock(&myMutex);
printf("main: cleaning up \n");
pthread_join(tid, 0);
return 0;
}
--- snip ----
When running directly the output is as follows
main: creating thread ...
main: waiting for thread to be ready ...
main: thread is ready
main: cancelling thread ...
main: waiting for thread to be clean...
main: cleaning up
However when run under valgrinds control ( just unsing the none tool ) the
output is as follows
==3962== Nulgrind, a binary JIT-compiler for x86-linux.
==3962== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote.
==3962== Using valgrind-2.4.0, a program supervision framework for
x86-linux.
==3962== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==3962== For more details, rerun with: -v
==3962==
main: creating thread ...
main: waiting for thread to be ready ...
main: thread is ready
main: cancelling thread ...
main: waiting for thread to be clean...
As I have found out through further investigation is that no cleanup
handler ( lastwords in this case ) gets
ever called.
I am using an LFS based system using unpatched and plain glibc 2.3.5
Is this a known behaviour or am I missing something important. |
|
From: Martin L. <mar...@da...> - 2005-06-30 14:31:25
|
I have now more information regarding this problem. Looking through the regression tests of valgrind I found the simple test pth_cancel1 which checks for correctness of the pthread_cleanup_push mechanism. This single test passed without any problem on my current installation. Further investigation showed, that when compiled with exceptions enabled ( -fexception ) this test fails too. When using NPTL the macros pthread_cleanup_p* expand to different implementations depending if excpetions are switched on or off. So this indicates, that valgrind does not correctly process the stack unwinding for pthread_cleanup_push when used with the exception mechanism. It is not common to use -fexceptions for c code ( although we have to in our project ) but for c++ code this is the default compiler setting even if you do not to intend c++ exceptions. ( I am using gcc 3.4.3 btw ). I also have tested this behaviour with the current development version of valgrind 3.0 and got the same behaviour. Any comments on how to solve this are highly appreciated. |
|
From: Julian S. <js...@ac...> - 2005-06-30 16:24:51
|
> So this indicates, that valgrind does not correctly process the stack > unwinding for pthread_cleanup_push > when used with the exception mechanism. The really strange thing is, Valgrind does not unwind the stack at all -- it leaves all such strangeness to the program itself, or NPTL, or whoever would have unwound the stack anyway. So perhaps there is a frame on the stack that the c++ unwinder doesn't understand, which V put on the stack. J |
|
From: Julian S. <js...@ac...> - 2005-06-30 17:33:32
|
> The really strange thing is, Valgrind does not unwind the stack at all > -- it leaves all such strangeness to the program itself, or NPTL, or > whoever would have unwound the stack anyway. So perhaps there is a > frame on the stack that the c++ unwinder doesn't understand, which V > put on the stack. Ok, I can reproduce this on fc4 (gcc4 based). A signal presumably caused by the call to pthread_cancel is delivered to the thread, and then it goess off into the libgcc unwinder .. --19140-- Async handler got signal 32 for tid 2 info -6 --19140-- delivering signal 32 (SIGRT0):-6 to thread 2 --19140-- push_signal_frame (thread 2): signal 32 ==== BB 1778 sigcancel_handler(0x3A99C340) BBs exec'd 53038 ==== ==== BB 1779 sigcancel_handler+13(0x3A99C34D) BBs exec'd 53039 ==== ==== BB 1780 sigcancel_handler+22(0x3A99C356) BBs exec'd 53040 ==== ==== BB 1781 sigcancel_handler+45(0x3A99C36D) BBs exec'd 53041 ==== ==== BB 1782 sigcancel_handler+53(0x3A99C375) BBs exec'd 53042 ==== ==== BB 1783 sigcancel_handler+68(0x3A99C384) BBs exec'd 53043 ==== ==== BB 1784 sigcancel_handler+84(0x3A99C394) BBs exec'd 53044 ==== ==== BB 1785 __pthread_unwind+12(0x3A9A1C82) BBs exec'd 53045 ==== ==== BB 1786 _Unwind_ForcedUnwind(0x3A9A3C3C) BBs exec'd 53046 ==== ==== BB 1787 _Unwind_ForcedUnwind+12(0x3A9A3C48) BBs exec'd 53047 ==== ==== BB 1788 _Unwind_ForcedUnwind+28(0x3A9A3C58) BBs exec'd 53048 ==== ==== BB 1789 _Unwind_ForcedUnwind(0x3A9AFFAC) BBs exec'd 53049 ==== ==== BB 1790 (0x3A9AFCA0) BBs exec'd 53050 ==== ==== BB 1791 (0x3A9AB600) BBs exec'd 53051 ==== ==== BB 1792 (0x3A9AB606) BBs exec'd 53052 ==== ==== BB 1793 (0x3A9AFCD3) BBs exec'd 53293 ==== ==== BB 1794 (0x3A9AEEE0) BBs exec'd 53347 ==== ==== BB 1795 (0x3A9AEEFF) BBs exec'd 53348 ==== ==== BB 1796 (0x3A9AB4B6) BBs exec'd 53349 ==== ==== BB 1797 _Unwind_Find_FDE(0x3A9B1C5C) BBs exec'd 53459 ==== ==== BB 1798 _Unwind_Find_FDE+305(0x3A9B1D8D) BBs exec'd 53460 ==== ==== BB 1799 (0x3A9AB546) BBs exec'd 53461 ==== ==== BB 1800 pthread_mutex_lock+52(0x3A99EAB8) BBs exec'd 53626 ==== ==== BB 1801 pthread_mutex_lock+69(0x3A99EAC9) BBs exec'd 53627 ==== ==== BB 1802 _Unwind_Find_FDE+320(0x3A9B1D9C) BBs exec'd 53628 ==== ==== BB 1803 _Unwind_Find_FDE+296(0x3A9B1D84) BBs exec'd 53629 ==== ==== BB 1804 _Unwind_Find_FDE+99(0x3A9B1CBF) BBs exec'd 53630 ==== ==== BB 1805 _Unwind_Find_FDE+328(0x3A9B1DA4) BBs exec'd 53631 ==== ... All this Find_FDE stuff is libgcc or whoever looking for unwind info from the executable, which we expect. But the unwinding fails to find the cleanup fns; presumably the unwinder gets confused because the stack looks different to how it would natively. However -- the only part of the stack that is going to be different compared running natively is the signal frame itself; Valgrind does not mess with the stack in any other way. We already went to some effort to ensure the signal frame V builds looks sufficiently like the signal frame that would happen natively that gcc's unwinder still works. However, this does not seem to have worked this time. (see coregrind/m_sigframe/sigframe-x86-linux.c in the dev tree) If you can look at gcc's unwinding library and find the assumptions it expects to have when unwinding starts, that would be helpful; then we can compare those assumptions against the frame that V makes. J |
|
From: Jeremy F. <je...@go...> - 2005-06-30 20:07:26
|
Julian Seward wrote:
>>So this indicates, that valgrind does not correctly process the stack
>>unwinding for pthread_cleanup_push
>>when used with the exception mechanism.
>>
>>
>
>The really strange thing is, Valgrind does not unwind the stack at all
>-- it leaves all such strangeness to the program itself, or NPTL, or
>whoever would have unwound the stack anyway. So perhaps there is a
>frame on the stack that the c++ unwinder doesn't understand, which V
>put on the stack.
>
The only time I can think that Valgrind creates a stack frame is during
signals, and that code is very careful to construct a stack frame the
unwinder is happy with (ie, more or less identical to the kernel's
frame). Maybe something changed in the unwinder which made it more picky...
J
|
|
From: Nicholas N. <nj...@cs...> - 2005-07-01 20:54:35
|
On Thu, 30 Jun 2005, Martin Lubich wrote: > I came across a problem where it seems that running a multithreaded > application under valgrinds control does not correctly handle the > pthread cleanup handlers which you establish by using > pthread_cleanup_push and pthread_cleanup_pop. Could you create a Bugzilla bug report for this issue, including all the information you've given so far? That will ensure this isn't forgotten. Thanks. N |