|
From: Scott M. <ss...@us...> - 2006-03-07 21:04:47
Attachments:
sigill-test.c
unblock_sigill.diff
|
I'm attempting to respond (far after the fact) to a mail on this list with the given subject (http://sourceforge.net/mailarchive/message.php?msg_id=14689132). I'm pretty sure that there is a generic logic bug that is causing the given failure. Attached is a simple test program that emulates the behavior of m_machine.c in the floating point and vmx detection. When the first (have_fp) SIGILL is generated, the OS (or glibc/gcc or other) blocks subsequent SIGILL's while the registered handler function (handler_sigill) is running, with the intent of unblocking signals on return from the function. In the testcase (and m_machine.c) the handler never returns, but longjmps out. This has the effect of leaving signals blocked. The next signal is generated (have_vmx), but is not delivered because it is blocked. Then, when the default action (saved_act) and mask (saved_set) is restored, the signal is delivered and the default signal handler kills the app. The failure only occurs when the system does not have a fpu *and* does not have vmx. My guess as to why others haven't seen this problem is that most people with ppcnf systems must be running with kernel FPU emulation turned on. The testcase uses setjmp and longjmp, but their __builtin_ siblings behave the same way. The test case's usage is: Usage: sigill-test <enable_workaround> have_fp have_vmx each argument takes 1 or 0 and defaults to 0 The only time a sigill kills the program is with args 0 0 0 (the defaults). Ie, it only fails when you do not enable the fix, and do not have a fpu or vmx. My fix was to unmask symbols after return from longjmp, but you could also set up the hander to not block symbols using SA_NODEFER. Scott |
|
From: Julian S. <js...@ac...> - 2006-03-07 21:15:12
|
> When the first (have_fp) SIGILL is generated, the OS (or glibc/gcc or > other) blocks subsequent SIGILL's while the registered handler > function (handler_sigill) is running, with the intent of unblocking > signals on return from the function. In the testcase (and m_machine.c) > the handler never returns, but longjmps out. This has the effect of > leaving signals blocked. Yes. I eventually discovered this too. It is fixed in svn rev 5662 (for the trunk) and 5703 (3.1 branch). > My fix was to unmask symbols after return from longjmp, but you could > also set up the hander to not block symbols using SA_NODEFER. 5662/5703 use the SA_NODEFER solution. It would be good if you could check out and test the 3.1 branch and/or the trunk (preferably both) to check they work for you. It's easy: svn co svn://svn.valgrind.org/valgrind/trunk (for the trunk) or svn co svn://svn.valgrind.org/valgrind/branches/VALGRIND_3_1_BRANCH then cd into the directory you get ./autogen.sh then configure/build in the normal way. J |