|
From: Jeroen N. W. <jn...@xs...> - 2004-08-25 18:34:59
|
It looks as if I have stumbled across a bug in the way valgrind (CVS) handles signal SIGFPE. To demonstrate this problem, I have created and attached file SIGFPEc.c. The problem: When I run this program stand-alone [output in attached file SIGFPEc.check.txt], it shows that that siginfo->si_addr points somewhere in my executable. But when I run the very same executable under valgrind --tool=memcheck [output in attached file SIGFPEc.valgrind.txt], siginfo->si_addr points somewhere in valgrind's stage2. Note: I use a simplistic way to generate the SIGFPE. When I compile file SIGFPEc.c with gcc -O2, the SIGFPE does not happen, at least on my boxen. Compiling without any -O option does the trick. Please let me know if this is a real bug, and if you need a bug report. Jeroen. |
|
From: Tom H. <th...@cy...> - 2004-08-25 21:35:14
|
In message <207...@we...>
"Jeroen N. Witmond" <jn...@xs...> wrote:
> It looks as if I have stumbled across a bug in the way valgrind (CVS)
> handles signal SIGFPE. To demonstrate this problem, I have created and
> attached file SIGFPEc.c.
>
> The problem: When I run this program stand-alone [output in attached file
> SIGFPEc.check.txt], it shows that that siginfo->si_addr points somewhere
> in my executable. But when I run the very same executable under valgrind
> --tool=memcheck [output in attached file SIGFPEc.valgrind.txt],
> siginfo->si_addr points somewhere in valgrind's stage2.
With SIGFPE si_addr is the address of the faulting instruction. When
running under valgrind all instructions in your program are simulated
by valgrind which does a just in time translation of your code.
As a result the faulting instruction will always be in a different
place when running under valgrind. In particular in this case the
fault is probably occurring in one of the helper routines which is
used to handle division operations, hence the reason why the address
is inside stage2 as that is where the helper routines are.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Jeroen N. W. <jn...@xs...> - 2004-08-26 08:45:42
|
> In message <207...@we...> > "Jeroen N. Witmond" <jn...@xs...> wrote: > >> It looks as if I have stumbled across a bug in the way valgrind (CVS) >> handles signal SIGFPE. To demonstrate this problem, I have created and >> attached file SIGFPEc.c. >> >> The problem: When I run this program stand-alone [output in attached >> file >> SIGFPEc.check.txt], it shows that that siginfo->si_addr points somewhere >> in my executable. But when I run the very same executable under valgrind >> --tool=memcheck [output in attached file SIGFPEc.valgrind.txt], >> siginfo->si_addr points somewhere in valgrind's stage2. > > With SIGFPE si_addr is the address of the faulting instruction. When > running under valgrind all instructions in your program are simulated > by valgrind which does a just in time translation of your code. > > As a result the faulting instruction will always be in a different > place when running under valgrind. In particular in this case the > fault is probably occurring in one of the helper routines which is > used to handle division operations, hence the reason why the address > is inside stage2 as that is where the helper routines are. > Hmm? I would have expected that the simulated environment created by Valgrind (CPU and parts of Linux/libc) behaves the same as the environment being simulated. Any deviation may (and hence one day will) make the program being tested behave differently when run with or without Valgrind, reducing its effectiveness. Also, in the case of a SIGSEGV, Valgrind is quite capable to set the user's EIP in fields: - ((ucontext_t*)context)->uc_mcontext.gregs[REG_EIP], where 'void* context' is the third argument to the sa_sigaction in struct sigaction. 'ucontext_t' is declared by '#include <ucontext.h>'. Just now I have verified that this field is also set correctly by Valgrind in the case of a SIGFPE. - ctx.eip, where 'struct sigcontext ctx' is the undocumented second argument to the sa_handler in struct sigaction. [In the case of a SIGSEGV, siginfo->si_addr contains the address of the memory that cannot be accessed, not the address of the instruction doing the accessing. Not verified under Valgrind.] Should I create a bug report for this problem? Jeroen. |
|
From: Tom H. <th...@cy...> - 2004-08-26 09:00:31
|
In message <254...@we...>
Jeroen N. Witmond <jn...@xs...> wrote:
> Hmm? I would have expected that the simulated environment created by
> Valgrind (CPU and parts of Linux/libc) behaves the same as the environment
> being simulated. Any deviation may (and hence one day will) make the
> program being tested behave differently when run with or without Valgrind,
> reducing its effectiveness.
The problem is that there is not a one-one (or even a one-many) mapping
from instructions in the original program to JITed instructions so it
isn't clear which instruction in your program valgrind should make si_addr
point to as it doesn't record which instruction in your program caused
each instruction in the JITed code to be generated, and it might not
even be a single instruction.
> Also, in the case of a SIGSEGV, Valgrind is quite capable to set the
> user's EIP in fields:
>
> - ((ucontext_t*)context)->uc_mcontext.gregs[REG_EIP], where 'void*
> context' is the third argument to the sa_sigaction in struct sigaction.
> 'ucontext_t' is declared by '#include <ucontext.h>'. Just now I have
> verified that this field is also set correctly by Valgrind in the case of
> a SIGFPE.
>
> - ctx.eip, where 'struct sigcontext ctx' is the undocumented second
> argument to the sa_handler in struct sigaction.
I know all that. Are you saying that valgrind is already doing so? or
just that it should be doing so?
> [In the case of a SIGSEGV, siginfo->si_addr contains the address of the
> memory that cannot be accessed, not the address of the instruction doing
> the accessing. Not verified under Valgrind.]
Indeed it should, and I believe it will do. There is no translation
done on si_addr so you will get the address the kernel reports, but
for a SEGV that is fine as that is the address in your program.
It won't be the same address as when run outside of valgrind of
course, but that's because your data will probably be laid out
differently in memory. It will be the address your program tried
to read from.
> Should I create a bug report for this problem?
You are welcome to do so, but I don't think it's fixable. It's even
harder than filling in the right FP state (which we don't do) or allowing
the state to be modified (which we don't do). There are already bugs
for those things...
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Tom H. <th...@cy...> - 2004-08-26 09:18:04
|
In message <yek...@au...>
Tom Hughes <th...@cy...> wrote:
> In message <254...@we...>
> Jeroen N. Witmond <jn...@xs...> wrote:
>
>> Also, in the case of a SIGSEGV, Valgrind is quite capable to set the
>> user's EIP in fields:
>>
>> - ((ucontext_t*)context)->uc_mcontext.gregs[REG_EIP], where 'void*
>> context' is the third argument to the sa_sigaction in struct sigaction.
>> 'ucontext_t' is declared by '#include <ucontext.h>'. Just now I have
>> verified that this field is also set correctly by Valgrind in the case of
>> a SIGFPE.
>>
>> - ctx.eip, where 'struct sigcontext ctx' is the undocumented second
>> argument to the sa_handler in struct sigaction.
>
> I know all that. Are you saying that valgrind is already doing so? or
> just that it should be doing so?
Sorry. I misread what you wrote. You are saying that valgrind does
fill in the context. In that case filling in si_addr should be easy
to do - please file a bug and I'll sort it out.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Nicholas N. <nj...@ca...> - 2004-08-26 09:14:14
|
On Thu, 26 Aug 2004, Tom Hughes wrote: >> Hmm? I would have expected that the simulated environment created by >> Valgrind (CPU and parts of Linux/libc) behaves the same as the environment >> being simulated. Any deviation may (and hence one day will) make the >> program being tested behave differently when run with or without Valgrind, >> reducing its effectiveness. > > The problem is that there is not a one-one (or even a one-many) mapping > from instructions in the original program to JITed instructions Hmm, that's not really true; there is a one-many mapping from original-->JITted instructions. The INCEIP UCode instruction marks the boundaries. > so it isn't clear which instruction in your program valgrind should make > si_addr point to as it doesn't record which instruction in your program > caused each instruction in the JITed code to be generated, and it might > not even be a single instruction. >> Should I create a bug report for this problem? > > You are welcome to do so, but I don't think it's fixable. It's even > harder than filling in the right FP state (which we don't do) or allowing > the state to be modified (which we don't do). There are already bugs > for those things... It's unclear to me what the right thing to do here is. So here's a question: Jeroen, how did you discover this? Is this behaviour causing problems for you in a real program? If so, can you explain how/why? And would changing the behaviour in the way you say fix the problem? Thanks. N |
|
From: Tom H. <th...@cy...> - 2004-08-26 09:19:43
|
In message <Pin...@he...>
Nicholas Nethercote <nj...@ca...> wrote:
> On Thu, 26 Aug 2004, Tom Hughes wrote:
>
>>> Hmm? I would have expected that the simulated environment created by
>>> Valgrind (CPU and parts of Linux/libc) behaves the same as the environment
>>> being simulated. Any deviation may (and hence one day will) make the
>>> program being tested behave differently when run with or without Valgrind,
>>> reducing its effectiveness.
>>
>> The problem is that there is not a one-one (or even a one-many) mapping
>> from instructions in the original program to JITed instructions
>
> Hmm, that's not really true; there is a one-many mapping from
> original-->JITted instructions. The INCEIP UCode instruction marks
> the boundaries.
You are quite right of course. I thought there was some mingling
at the boundaries when we translated back from UCode to x86 but
presumably if there is it doesn't across an INCEIP instruction.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Julian S. <js...@ac...> - 2004-08-26 12:27:23
|
> > As a result the faulting instruction will always be in a different > > place when running under valgrind. In particular in this case the > > fault is probably occurring in one of the helper routines which is > > used to handle division operations, hence the reason why the address > > is inside stage2 as that is where the helper routines are. > > Hmm? I would have expected that the simulated environment created by > Valgrind (CPU and parts of Linux/libc) behaves the same as the environment > being simulated. Huh? That was never a design goal, and isn't achievable without significant performance overheads. What you are asking for is 'precise exceptions', where, after an instruction takes a fault, in this case a FP exception, the simulated machine's register state is identical to what a real machine would have. This isn't a design goal and isn't likely to become one in future. Valgrind approximately attempts to supply a POSIX-compliant environment in which programs can run -- that's really the design goal. I'm sure that looking at machine registers following an exception isn't POSIX compliant -- POSIX doesn't even guarantee precise exceptions, AIUI. Let alone have any notion of machine registers. > Should I create a bug report for this problem? No. Future valgrinds may optimise code more aggressively than at present, which will likely make this problem worse rather than better. Even at present, Valgrind only guarantees to update the integer/FP/SSE/ register/flag state at each jump, so at an exception you will usually be seeing machine state which is many instructions out of date. J |
|
From: Nicholas N. <nj...@ca...> - 2004-08-26 13:07:27
|
On Thu, 26 Aug 2004, Julian Seward wrote: >> Should I create a bug report for this problem? > > No. Future valgrinds may optimise code more aggressively than at > present, which will likely make this problem worse rather than better. > Even at present, Valgrind only guarantees to update the integer/FP/SSE/ > register/flag state at each jump, so at an exception you will usually > be seeing machine state which is many instructions out of date. Seems like there's some confusion about this thread. Still, Tom seems to have found a simple solution that satisfies Jeroen, I think... N |
|
From: Tom H. <th...@cy...> - 2004-08-26 13:12:04
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> Valgrind approximately attempts to supply a POSIX-compliant environment
> in which programs can run -- that's really the design goal. I'm sure that
> looking at machine registers following an exception isn't POSIX compliant
> -- POSIX doesn't even guarantee precise exceptions, AIUI. Let alone have
> any notion of machine registers.
Actually POSIX (or at least SuS) does say that si_addr will be the
address of the faulting instruction - see the section on siginfo_t in:
http://www.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html
I'm not sure how that interacts with platforms like the Alpha where
floating point exceptions are not normally precise even in normal use.
> No. Future valgrinds may optimise code more aggressively than at
> present, which will likely make this problem worse rather than better.
> Even at present, Valgrind only guarantees to update the integer/FP/SSE/
> register/flag state at each jump, so at an exception you will usually
> be seeing machine state which is many instructions out of date.
He wasn't actually talking about the saved registers, and the integer
registers are always up to date aren't they? I thought it was only the
floating point ones that weren't.
The point is that si_addr doesn't match the EIP in the register set
valgrind supplies because we update one and not the other.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|