|
From: Scott M. <ss...@us...> - 2006-03-07 21:04:47
|
I'm attempting to respond (far after the fact) to a mail on this list with the given subject (http://sourceforge.net/mailarchive/message.php?msg_id=14689132). I'm pretty sure that there is a generic logic bug that is causing the given failure. Attached is a simple test program that emulates the behavior of m_machine.c in the floating point and vmx detection. When the first (have_fp) SIGILL is generated, the OS (or glibc/gcc or other) blocks subsequent SIGILL's while the registered handler function (handler_sigill) is running, with the intent of unblocking signals on return from the function. In the testcase (and m_machine.c) the handler never returns, but longjmps out. This has the effect of leaving signals blocked. The next signal is generated (have_vmx), but is not delivered because it is blocked. Then, when the default action (saved_act) and mask (saved_set) is restored, the signal is delivered and the default signal handler kills the app. The failure only occurs when the system does not have a fpu *and* does not have vmx. My guess as to why others haven't seen this problem is that most people with ppcnf systems must be running with kernel FPU emulation turned on. The testcase uses setjmp and longjmp, but their __builtin_ siblings behave the same way. The test case's usage is: Usage: sigill-test <enable_workaround> have_fp have_vmx each argument takes 1 or 0 and defaults to 0 The only time a sigill kills the program is with args 0 0 0 (the defaults). Ie, it only fails when you do not enable the fix, and do not have a fpu or vmx. My fix was to unmask symbols after return from longjmp, but you could also set up the hander to not block symbols using SA_NODEFER. Scott |