|
From: Alex B. <ker...@be...> - 2006-07-08 08:02:37
|
Hi, I've been trying to track down some failures in my program when compiled with -O0 (it's normally compiled -O3 however I like to be able to run -O0 for debugging). I have found problems before with some inlined asm routines before which when not inlined in -O0 don't reserve stack space. I've got several test cases to test the code so it would be nice if Valgrind could point me at the functions that break things. Is this something Valgrind can detect on its own or do I need to recompile the program with additional instrumentation? The basic memcheck run didn't seem to come up with anything. -- Alex, homepage: http://www.bennee.com/~alex/ ... I don't know why but, suddenly, I want to discuss declining I.Q. LEVELS with a blue ribbon SENATE SUB-COMMITTEE! |
|
From: Nicholas N. <nj...@cs...> - 2006-07-08 08:14:45
|
On Sat, 8 Jul 2006, Alex Bennee wrote: > I've been trying to track down some failures in my program when compiled > with -O0 (it's normally compiled -O3 however I like to be able to run > -O0 for debugging). I have found problems before with some inlined asm > routines before which when not inlined in -O0 don't reserve stack space. > I've got several test cases to test the code so it would be nice if > Valgrind could point me at the functions that break things. > > Is this something Valgrind can detect on its own or do I need to > recompile the program with additional instrumentation? > > The basic memcheck run didn't seem to come up with anything. Maybe you can give some more specific detail? I can't tell from your description above what it is you want Valgrind to detect. Nick |
Nicholas Nethercote wrote:
> On Sat, 8 Jul 2006, Alex Bennee wrote:
>> I have found problems before with some inlined asm
>>routines before which when not inlined in -O0 don't reserve stack space.
> Maybe you can give some more specific detail? I can't tell from your
> description above what it is you want Valgrind to detect.
For instance on x86, there should be an option to detect when
a return address (the word pushed onto the stack by any CALL instruction)
gets written before it is POP'ed. Similarly for clobbering a
frame pointer (the word pushed onto the stack by the sequence
"pushl %ebp; movl %esp,%ebp"). And possibly also for saved registers
(%ebx, %esi, %edi) that are pushed in a stylized way during subroutine
prolog; but this last piece might be problematic because various compilers
do it differently (before or after allocating space for local automatic
variables, for instance.) The mechanism is obvious: mark these locations
on the stack as "read-only"; and if there is no such state, then mark
them as "not written" (or even "not allocated"), with a special-case
in the trap so that there is no complaint upon reading, which will happen
exactly once per frame.
Example:
-----main.c
main()
{
smash();
return 0;
}
-----smash.S
smash: .globl smash
movl $0,(%esp) # clobbers the return address
ret
-----
$ gcc main.c smash.S
$ valgrrind ./a.out
. . .
==3989== Using valgrind-3.2.0, a dynamic binary instrumentation framework.
. . .
==3989== Jump to the invalid address stated on the next line
==3989== at 0x0: ???
==3989== by 0x544D7E: (below main) (in /lib/libc-2.3.6.so)
==3989== Address 0x0 is not stack'd, malloc'd or (recently) free'd
-----
In the above example, valgrind-3.2.0 tells you too late, only after the 'ret'
has been performed, and valgrind does not tell you the PC of the 'ret',
which is the most useful piece of information at this point.
Instead, valgrind should tell you that you are clobbering
a "reserved stack location" as soon as you execute the "movl $0,(%esp)",
and the error report should include the PC of the 'movl'.
Then the user has some real information to find the bug.
--
|
|
From: Tom H. <to...@co...> - 2006-07-08 16:04:38
|
In message <44AFCD99.5040008@BitWagon.com>
John Reiser <jreiser@BitWagon.com> wrote:
> Nicholas Nethercote wrote:
> > On Sat, 8 Jul 2006, Alex Bennee wrote:
>
> >> I have found problems before with some inlined asm
> >>routines before which when not inlined in -O0 don't reserve stack space.
>
> > Maybe you can give some more specific detail? I can't tell from your
> > description above what it is you want Valgrind to detect.
>
> For instance on x86, there should be an option to detect when
> a return address (the word pushed onto the stack by any CALL instruction)
> gets written before it is POP'ed. Similarly for clobbering a
> frame pointer (the word pushed onto the stack by the sequence
> "pushl %ebp; movl %esp,%ebp"). And possibly also for saved registers
> (%ebx, %esi, %edi) that are pushed in a stylized way during subroutine
> prolog; but this last piece might be problematic because various compilers
> do it differently (before or after allocating space for local automatic
> variables, for instance.) The mechanism is obvious: mark these locations
> on the stack as "read-only"; and if there is no such state, then mark
> them as "not written" (or even "not allocated"), with a special-case
> in the trap so that there is no complaint upon reading, which will happen
> exactly once per frame.
The problem, as you seem to have deduced, is that valgrind (or rather
memcheck) has no concept of read only memory. Only addressable and/or
defined memory.
One option is to mark it as inaccessible (ie effectively not allocated) and
then restore that to the defined state before the return. I think that is
the solution my colleage used when he played with doing this in valgrind. The
problem with that is that trapping returns is in general very hard.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
Tom Hughes wrote: > The problem with that is that trapping returns is in general very hard. But a "solution" with 99+% theoretical coverage (such as, omitting coverage for a hand-written return which uses "pop %ecx; ...; jmp *%ecx" instead of 'ret', but covering all cases of actual 'ret') still will be enormously valuable to users. It's an optional mode where the intermediate code for 'ret' is always "call a subroutine and figure it out." Yes, such a 'ret' will be 20x as expensive as before, but it will still be thousands of times faster than the user scratching his head, wondering which piece of code overwrote the return address. And, emphasizing an implicit point in my previous post, the error report: ----- ==3989== Jump to the invalid address stated on the next line ==3989== at 0x0: ??? ==3989== by 0x544D7E: (below main) (in /lib/libc-2.3.6.so) ==3989== Address 0x0 is not stack'd, malloc'd or (recently) free'd ----- is almost useless because it omits the PC _before_ the jump/ret, which is the most important piece of information the user wants to know. Being very explicit, the report should have read something like: ----- ==3989== Jump to the invalid address stated on the next line ==3989== at 0x0: ??? ==3989== by 0x8048377: smash+7 ==3989== by 0x544D7E: (below main) (in /lib/libc-2.3.6.so) ==3989== Address 0x0 is not stack'd, malloc'd or (recently) free'd ----- where the "0x8048377: smash+7" is _essential_. Again, the mechanism is obvious: keep one variable [per thread] which remembers the PC of any instruction which could cause a non-sequential transfer of control (or at least, a transfer to an address that is not statically known.) Put the current value of that variable into every traceback, or at least the ones for "Jump to invalid address." -- |
|
From: Alex B. <ker...@be...> - 2006-07-10 08:58:29
|
On Sat, 2006-07-08 at 18:14 +1000, Nicholas Nethercote wrote:
> On Sat, 8 Jul 2006, Alex Bennee wrote:
>
> > I've been trying to track down some failures in my program when compiled
> > with -O0 (it's normally compiled -O3 however I like to be able to run
> > -O0 for debugging). I have found problems before with some inlined asm
> > routines before which when not inlined in -O0 don't reserve stack space.
> <snip>
>
> Maybe you can give some more specific detail? I can't tell from your
> description above what it is you want Valgrind to detect.
>
> Nick
Well here is an segment of example of code that failed before I added a
special sub for -O0 builds.
void MyClass::setIfCondition_Zero(uint8_t &condStatus, uint64_t eflags)
{
__asm__ volatile
(
#ifdef OPTIMIZE
#if OPTIMIZE==0
"sub $0x10, %%rsp\n\t"
#endif
#endif
"pushf \n\t"
"push %1 \n\t"
"popf \n\t"
"setz %0 \n\t"
"popf \n\t"
: "=r"(condStatus): "r"(eflags));
}
void MyClass::decodeJumpCC(uint8_t parameter)
{
//lets read our current eflags register
uint64_t intFlags=getEFLAGSValue();
uint8_t doBranch=0;
switch(parameter)
{
// Zero (or not)
case CONDITION_NZ:
inverse=true;
case CONDITION_Z:
setIfCondition_Zero(doSet,intFlags);
break;
...
...
}
In the normal case (-O3) the setIfConditionZero function gets inlined in
the decodeJumpCC code and the pushf/popf gets away with it
because ::decodeJumpCC has a normal stack frame reserved by the compiler
with space for this sort of thing. However in the -O0 case the compiler
won't reserve any space in setIfConditionZero() as it has no local
variables of its own. In this case (after a lot of head scratching) we
fixed it by adding a stack sub to add space for the push/pop operation.
RSP then gets cleaned up by the normal function prolog.
I'm sure there are other cases where we use inline assembler which
doesn't take these sort of stack related things into account. These are
the sort of things it would be useful for Valgrind to detect.
--
Alex, homepage: http://www.bennee.com/~alex/
What I tell you three times is true.
|
|
From: Julian S. <js...@ac...> - 2006-07-17 11:11:11
|
> Well here is an segment of example of code that failed before I added a
> special sub for -O0 builds.
>
> void MyClass::setIfCondition_Zero(uint8_t &condStatus, uint64_t eflags)
> {
> __asm__ volatile
> (
> #ifdef OPTIMIZE
> #if OPTIMIZE==0
> "sub $0x10, %%rsp\n\t"
> #endif
> #endif
> "pushf \n\t"
> "push %1 \n\t"
> "popf \n\t"
> "setz %0 \n\t"
> "popf \n\t"
>
> : "=r"(condStatus): "r"(eflags));
>
> }
Alex
Are you sure this is a gcc bug? It strikes me that your inline
asm, although valid on x86, is in violation of the amd64-ELF ABI
and so is invalid.
Reason is amd64-ELF defines the 128 bytes below rsp
(viz, -128(rsp) .. -1(rsp)) as a scratch area which the compiler
can use at any time. Your asm will trash it and gcc will never
be able to know that.
In order to fix this you need to clear the scratch area, and so
you need to sub $0x80 from %rsp at the start and add it back on
later. The fact that it works without that at >= -O1 just seems
like luck to me - presumably in those cases gcc does not have
anything live in that region of the stack at the point your
asm is used.
J
|