|
From: Harris, J. <Je...@ai...> - 2006-09-12 15:22:29
|
Both environments are using dynamic libstdc++:
> powerpc-ai-linux-ldd a.out
libstdc++.so.5 =3D> libstdc++.so.5 (0x0)
libm.so.6 =3D> libm.so.6 (0x0)
libgcc_s.so.1 =3D> libgcc_s.so.1 (0x0)
libc.so.6 =3D> libc.so.6 (0x0)
/lib/ld.so.1 =3D> /lib/ld.so.1 (0x0)
The addresses are zero because I'm using the cross-compiler ldd on my
PC. The x86 version is the same. All of the symbols are missing. The
library on both platform is stripped, but still has its dynamic
relocation entries. The libc and libm libraries are also stripped, but
still load with Valgrind.
If I run the base memcheck checker on the program, I also do not see it
loading libstdc++.
Jeff
-----Original Message-----
From: val...@li...
[mailto:val...@li...] On Behalf Of Tom
Hughes
Sent: Tuesday, September 12, 2006 9:55 AM
To: val...@li...
Subject: Re: [Valgrind-users] Callgrind results on ppc
In message
<C01...@ai...>
Jeff Harris <Je...@ai...> wrote:
> When compiled with the cross C++ compiler with -g and -O0, the output
on
> PowerPC does not pick up the symbols for libstdc++. All I get in
> kcachegrind are addresses. On the x86 platform, I get the symbols.
> Also on the PPC run I get cycles detected in some of the dynamic
loader
> function calls, like dl_main. The cycles do not appear on the x86
run.
> I have attached the output from Valgrind for both the ppc and x86
runs.
Have you checked what symbols were left in when you build libstdc++
for that machine? Are all libstdc+++ symbols missing? Or just some?
One obviously difference is that the x86 environment is using a
dynamic libstdc++ while the PPC environment appears to be using a
statically linked one.
Tom
--=20
Tom Hughes (to...@co...)
http://www.compton.nu/
------------------------------------------------------------------------
-
Using Tomcat but need to do more? Need to support web services,
security?
Get stuff done quickly with pre-integrated technology to make your job
easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&dat=3D=
121642
_______________________________________________
Valgrind-users mailing list
Val...@li...
https://lists.sourceforge.net/lists/listinfo/valgrind-users
|
|
From: Harris, J. <Je...@ai...> - 2006-09-12 21:44:14
|
After much investigation into Valgrind, libstdc++, and gcc, I think I've
figured out why that library behaves differently. The issue is that the
normally read-only text and data section in the library is being flagged
as writable by the linker. There is a test in the VG_(di_notify_mmap)
function for whether a particular mmap is for a "code" segment that
Valgrind will want record. For PPC, it's looking for a segment that is
readable and executable, but not writable. Such a section will not
exist in the libstdc++ library.
On x86, the library's read-only sections are indeed read-only. On PPC,
however, there are some symbols in the .rodata section which are marked
as writable, causing the linker to mark all of the read-only sections as
writable. The culprit are C++ type_info RTTI symbols. GCC is
generating them as writable symbols in the .rodata section. There is
PPC specific code for determining the section for a particular symbol.
There appears to be a bug where the symbol should probably be writable
since it's relocatable, but the symbol is placed in a read-only section.
There doesn't appear to be a straight-forward patch to fix it, so I'm in
for more work.
Newer versions of gcc do not seem to have this issue, from looking at
the code. The PPC specific hooks have been rewritten to use the common
code. It would be nice if Valgrind could recognize the writable
sections, but I can understand that my situation is certainly not
"normal".
Jeff
-----Original Message-----
From: val...@li...
[mailto:val...@li...] On Behalf Of Tom
Hughes
Sent: Tuesday, September 12, 2006 11:35 AM
To: val...@li...
Subject: Re: [Valgrind-users] Callgrind results on ppc
In message
<C01...@ai...>
Jeff Harris <Je...@ai...> wrote:
> Both environments are using dynamic libstdc++:
>
>> powerpc-ai-linux-ldd a.out
> libstdc++.so.5 =3D> libstdc++.so.5 (0x0)
> libm.so.6 =3D> libm.so.6 (0x0)
> libgcc_s.so.1 =3D> libgcc_s.so.1 (0x0)
> libc.so.6 =3D> libc.so.6 (0x0)
> /lib/ld.so.1 =3D> /lib/ld.so.1 (0x0)
>
> The addresses are zero because I'm using the cross-compiler ldd on my
> PC. The x86 version is the same. All of the symbols are missing.
The
> library on both platform is stripped, but still has its dynamic
> relocation entries. The libc and libm libraries are also stripped,
but
> still load with Valgrind.
I was going on the fact that the x86 output contains this:
--11147-- Reading syms from /lib/libstdc++.so.5.0.3 (0x401A000)
while the PPC output does not contain an equivalent line.
That line will appear regardless of whether or not it actually finds
any symbols - all that has to happen for that to appear is that
valgrind has to see an mmap that appears to be mapping in a given
shared library.
If it is dynamically linked then either the dynamic linker has not
loaded it for some reason or VG_(di_notify_mmap) has not recognised
the mapping as being from a library.
You might want to try adding --trace-syscalls=3Dyes and look for
evidence of it opening libstdc++ and mapping from it. If you find
that it is doing so that I would try adding some extra trace
statements to VG_(di_notify_mmap) in coregrind/m_debuginfo/debuginfo.c
to try and work out why valgrind is not spotting it.
Tom
--=20
Tom Hughes (to...@co...)
http://www.compton.nu/
------------------------------------------------------------------------
-
Using Tomcat but need to do more? Need to support web services,
security?
Get stuff done quickly with pre-integrated technology to make your job
easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&dat=3D=
121642
_______________________________________________
Valgrind-users mailing list
Val...@li...
https://lists.sourceforge.net/lists/listinfo/valgrind-users
|
|
From: Julian S. <js...@ac...> - 2006-09-12 21:56:13
|
Wow. You are certainly a determined investigator .. That sounds like a plausible scenario to me. I remember having a bunch of trouble with that particular logic (in VG_(di_notify_mmap)) at some point in the ppc32-linux integration, and I believe I left some comments in there. > Newer versions of gcc do not seem to have this issue, from looking at Wouldn't it be simpler for you to use a newer gcc, then? libstdc++ is tied to gcc, so upgrading gcc would fix it? Out of interest, what version of gcc is this? I know that 3.3.3 works ok on ppc32 and ppc64; one of the dev machines we used for ppc used gcc 3.3.3. J |
|
From: Harris, J. <Je...@ai...> - 2006-09-13 12:59:05
|
Would it be acceptable to have Valgrind keep track of all of the mmap'd segments? It may be wasteful because there shouldn't ever be code in the writable data segments, but it should work, right? We probably will upgrade to a newer gcc at some point. Currently we're at gcc 3.2.3. Now I have another reason to push for an upgrade because likely the kernel will not be able to share the code segments between processes as they are writable. For libstdc++, that's a sizeable amount of memory we can reclaim. Until then, I can profile a fair amount on our x86 platform. =20 Jeff -----Original Message----- From: Julian Seward [mailto:js...@ac...]=20 Sent: Tuesday, September 12, 2006 5:56 PM To: val...@li... Cc: Harris, Jeff Subject: Re: [Valgrind-users] Callgrind results on ppc Wow. You are certainly a determined investigator .. That sounds like a plausible scenario to me. I remember having a bunch of trouble with that particular logic (in VG_(di_notify_mmap)) at some point in the ppc32-linux integration, and I believe I left some comments in there. > Newer versions of gcc do not seem to have this issue, from looking at Wouldn't it be simpler for you to use a newer gcc, then? libstdc++ is tied to gcc, so upgrading gcc would fix it? Out of interest, what version of gcc is this? I know that 3.3.3 works ok on ppc32 and ppc64; one of the dev machines we used for ppc used gcc 3.3.3. J |
|
From: Tom H. <to...@co...> - 2006-09-13 13:05:31
|
In message <C01...@ai...>
Jeff Harris <Je...@ai...> wrote:
> Would it be acceptable to have Valgrind keep track of all of the mmap'd
> segments? It may be wasteful because there shouldn't ever be code in
> the writable data segments, but it should work, right?
We do keep track of all mmap'd segments, but we only try and load
symbols from ones which we think contain code.
You can easily alter that routine to consider writable segments - it
will still look for an ELF header before doing anything to verify that
it is code anyway.
This was in fact done previously for the wine patches as wine liked
to map code in writable segments at the time.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Harris, J. <Je...@ai...> - 2006-09-13 13:02:14
|
I will try to reproduce the callgraph issues with a smaller example if I have time between other projects. I don't think the compiler is producing invalid symbol names. GDB and glibc backtrace functions work just fine with determining the correct symbols. Jeff -----Original Message----- From: Josef Weidendorfer [mailto:Jos...@gm...]=20 Sent: Tuesday, September 12, 2006 6:24 PM To: val...@li... Cc: Harris, Jeff Subject: Re: [Valgrind-users] Callgrind results on ppc Hi, On Tuesday 12 September 2006 15:41, Harris, Jeff wrote: > Also on the PPC run I get cycles detected See below. > in some of the dynamic loader=20 > function calls, like dl_main. The cycles do not appear on the x86 run. > I have attached the output from Valgrind for both the ppc and x86 runs. >=20 > In profiling our real application on ppc, the output in kcachegrind > seems to be getting the wrong symbols in some cases. Hmm... either there is some bug in Valgrind (wrong interpretation of symbol information on PPC?), or the compiler does not produce correct data (e.g. wrong "length" of symbols?). If some code gets the wrong function name attributed, it easily can be that KCachegrind detects some bogus cycles.=20 In general some warning about callgrind on PPC32/64. Obviously, it is reasonable stable, but I never had time to fully check whether call graph tracing on PPC produces senseable call graphs in all cases. At least, one problem is known to not be handled correctly: On PPC, jumps are used for both calls and returns. So theoretically, there are conditional calls and returns on PPC. This is not possible on x86, and currently _not_ handled in callgrind. However, conditional calls/returns seem to be used only in rare cases (?). Josef |
|
From: Harris, J. <Je...@ai...> - 2006-09-21 14:49:23
Attachments:
callgrind-test-x86.gz
callgrind-test-ppc.gz
|
I think I have a simple program created to highlight the differences
between the x86 and ppc output of callgrind and seen in kcachegrind.
The program is pretty basic:
#include <iostream>
#include <string>
using namespace std;
int foo()
{
return 35;
}
int main()
{
cout << foo() << endl;
return 0;
}
The call to foo appears the same in both attached outputs of running
valgrind with callgrind. The differences show when the ppc program
calls the libstdc++ methods for operator<<(). If you look at the output
for x86, kcachegrind shows main calling foo, two libstdc++ methods, and
_dl_runtime_resolve (presumably to find and relocate the libstdc++
methods).
On ppc, the same call to main shows a call to foo and exit, but shows
one call to 0x10000E2C. Address 0x10000E2C is the location of the
dynamic relocation record for the operator<< method. Stepping into
0x10000E2C, I see the call to the operator<< method as in the x86. But,
I have to step further into the operator<< call to see another call to a
dynamic relocation record in order to see the second libstdc++ method as
in x86. In the disassembly of main, both libstdc++ calls occur in main,
there is no recursion.
I'm guessing that valgrind/callgrind is not seeing a "return" from the
dynamic relocation record, causing it to think the function never exits.
Does valgrind/callgrind perhaps not recognize that 0x10000E2C is a
relocation entry which may act differently than a local function call?
Both the ppc and x86 were built with the same version of gcc and
libstdc++. They were compiled with -g and -O0 flags.
Thanks,
Jeff
-----Original Message-----
From: Josef Weidendorfer [mailto:Jos...@gm...]=20
Sent: Tuesday, September 12, 2006 6:24 PM
To: val...@li...
Cc: Harris, Jeff
Subject: Re: [Valgrind-users] Callgrind results on ppc
Hi,
On Tuesday 12 September 2006 15:41, Harris, Jeff wrote:
> Also on the PPC run I get cycles detected
See below.
> in some of the dynamic loader=20
> function calls, like dl_main. The cycles do not appear on the x86
run.
> I have attached the output from Valgrind for both the ppc and x86
runs.
>=20
> In profiling our real application on ppc, the output in kcachegrind
> seems to be getting the wrong symbols in some cases.
Hmm... either there is some bug in Valgrind (wrong interpretation of
symbol information on PPC?), or the compiler does not produce correct
data (e.g. wrong "length" of symbols?).
If some code gets the wrong function name attributed, it easily can be
that
KCachegrind detects some bogus cycles.=20
In general some warning about callgrind on PPC32/64. Obviously, it is
reasonable stable, but I never had time to fully check whether call
graph tracing on PPC produces senseable call graphs in all cases.
At least, one problem is known to not be handled correctly:
On PPC, jumps are used for both calls and returns. So theoretically,
there are conditional calls and returns on PPC. This is not
possible on x86, and currently _not_ handled in callgrind. However,
conditional calls/returns seem to be used only in rare cases (?).
Josef
|
|
From: Josef W. <Jos...@gm...> - 2006-09-21 19:07:39
Attachments:
log.ppc.gz
|
On Thursday 21 September 2006 16:49, Harris, Jeff wrote:
> #include <iostream>
> #include <string>
> using namespace std;
> int foo()
> {
> return 35;
> }
> int main()
> {
> cout << foo() << endl;
> return 0;
> }
>
> The call to foo appears the same in both attached outputs of running
> valgrind with callgrind. The differences show when the ppc program
> calls the libstdc++ methods for operator<<(). If you look at the output
> for x86, kcachegrind shows main calling foo, two libstdc++ methods, and
> _dl_runtime_resolve (presumably to find and relocate the libstdc++
> methods).
Yes.
As these "operator<<()" are 2 different functions,
_dl_runtime_resolve is called 2 times.
On x86, callgrind generates quite pretty call graphs for calls into
shared libraries, as it (1) defaults to ignore the call to the PLT section,
and (2) interpretes the jump at end of _dl_runtime_resolve to the
resolved function as "return for _dl_runtime_resolve and call into
resolved function".
> On ppc, the same call to main shows a call to foo and exit
The call to exit() already seems to be wrong.
> , but shows
> one call to 0x10000E2C. Address 0x10000E2C is the location of the
> dynamic relocation record for the operator<< method.
I wonder why this address is not found inside of a PLT section;
if that would be the case, it would have been ignored as in the x86 case.
> Stepping into
> 0x10000E2C, I see the call to the operator<< method as in the x86.
This actually looks sane. There is also a call to
_dl_runtime_resolve from 0x10000E2C (you can ignore the "'2").
> But,
> I have to step further into the operator<< call to see another call to a
> dynamic relocation record in order to see the second libstdc++ method as
> in x86. In the disassembly of main, both libstdc++ calls occur in main,
> there is no recursion.
That is an example how it looks like when reality and callgrinds
shadow stack are not in sync. Obviously, a PPC jump which should
have been interpreted as a return was interpreted as a call, and
therefore, the second call to operator<< is 2 levels too deep.
To analyse such problems, it is best to look at the order of
function enter/exit events as callgrind observes them.
You can print out the order of function enter events (and exit events
implicitly via indentation) with
valgrind --tool=callgrind --ct-verbose1=main ./testprog
Meaning of "--ct-verbose1=main" here: "Switch to verbose mode 1 when
entering function <main>, and restore verbose mode (actually, to 0 again)
when leaving <main>", and verbosity 1 prints out the dynamic call tree.
> I'm guessing that valgrind/callgrind is not seeing a "return" from the
> dynamic relocation record, causing it to think the function never exits.
> Does valgrind/callgrind perhaps not recognize that 0x10000E2C is a
> relocation entry which may act differently than a local function call?
It is not that easy, as there are a lot of calls in the call tree, even
with this small example. I compiled it, and run it with callgrind on
our PPC32 machine, with printing out the events as shown above
(see attached file). You see that the call level slowly gets
more to the right (deeper and deeper), and there are 3 places where
it gets around 10 levels up again in one step, when entering
* 0x10010E64 (in line 225)
* exit (in line 390), and
* __libc_csu_fini (in line 485)
These points actually are resynchronisation points, using the stack
pointer. This is needed to make the tool robust, and to
handle e.g. longjumps - also on x86 - correctly. And therefore, you see
a call to exit() from main()...
One has to look at the PPC assembler to detect where these wrong
interpretations happen, and think about good heuristics how to recognize
them correctly.
The thing is, I never got around to do this very carefully.
Partly, because it did not known PPC assembler before the last time
I looked at this stuff.
x86 with its explicit call/ret instructions is way easier to get right;
on x86, the stack pointer always changes on call/ret. On PPC, this does
not need to happen as the return address is stored in the link register.
So: ideas for good heuristics welcome.
Josef
|
|
From: Julian S. <js...@ac...> - 2006-09-21 21:34:35
|
Thanks Jeff for finding a small test case, and Josef for chasing it. I have not much constructive to add except ... > x86 with its explicit call/ret instructions is way easier to get right; > on x86, the stack pointer always changes on call/ret. On PPC, this does > not need to happen as the return address is stored in the link register. > So: ideas for good heuristics welcome. One difficulty on ppc is that the RA is not always in the link register, not even for the innermost frame. Suppose f is a leaf function. Normally RA would remain in lr and that would be OK; however suppose the compiler wants to use lr for some other purpose - not calling a function, maybe for an indirect jump. Then it will have to store LR somewhere else inside f. I am not claiming to understand this fully. I think studying the ppc32-ELF ABI would help. What I do know is that there is a nasty hack in m_stacktrace.c, the part for unwinding the stack -- see VG_(get_StackTrace2) and specifically the stuff for setting/using lr_is_first_RA. This was from one of the IBM linux guys, unfortunately moved on elsewhere now. If you do come up with a good story on unwinding the ppc-linux stack I would like to see it. It may be that the logic for ppc in VG_(get_StackTrace2) is too complex or wrong or something, or maybe it's exactly correct, I don't know. J |
|
From: Josef W. <Jos...@gm...> - 2006-09-22 08:37:12
|
On Thursday 21 September 2006 23:34, Julian Seward wrote: > > Thanks Jeff for finding a small test case, and Josef for chasing it. > I have not much constructive to add except ... Still, thanks for this info and the pointer. I simply have to read not only the ppc32-ELF ABI, but about branches on ppc in general. For example, I have no idea if "bctrl" is supposed to be an indirect jump or indirect call (which is the current interpretation). Perhaps it even depends on the position of the target (same or different function). Currently, in callgrind I do a lot of decisions only depending on the instruction stream (e.g. calls/return instructions). Perhaps, to get PPC right, I need to look more at info from the compiler like function address ranges and other debug info. However, the nice thing with callgrind currently is that you get an useful call graph on x86 even with stripped binaries. I really do not like to make callgrinds decisions architecture/platform dependent... Anyway. I should mention in the documentation of callgrind that call graphs for ppc are not reliable at the moment, despite its robustness. Josef |
|
From: Julian S. <js...@ac...> - 2006-09-28 19:17:16
|
> What I do know is that there is a nasty hack in m_stacktrace.c, the part
> for unwinding the stack -- see VG_(get_StackTrace2) and specifically the
> stuff for setting/using lr_is_first_RA. This was from one of the IBM
> linux guys, unfortunately moved on elsewhere now.
I later documented the trick as shown below, so at least we can see what
it is doing.
J
/* We have to determine whether or not LR currently holds this fn
(call it F)'s return address. It might not if F has previously
called some other function, hence overwriting LR with a pointer
to some part of F. Hence if LR and IP point to the same
function then we conclude LR does not hold this function's
return address; instead the LR at entry must have been saved in
the stack by F's prologue and so we must get it from there
instead. Note all this guff only applies to the innermost
frame. */
|
|
From: Josef W. <Jos...@gm...> - 2006-09-28 20:03:23
|
On Thursday 28 September 2006 21:16, Julian Seward wrote:
> I later documented the trick as shown below, so at least we can see what
> it is doing.
>
> J
>
> /* We have to determine whether or not LR currently holds this fn
> (call it F)'s return address. It might not if F has previously
> called some other function, hence overwriting LR with a pointer
> to some part of F. Hence if LR and IP point to the same
> function then we conclude LR does not hold this function's
> return address; instead the LR at entry must have been saved in
> the stack by F's prologue and so we must get it from there
> instead. Note all this guff only applies to the innermost
> frame. */
Wow. This implies the assumption that LR never can hold any other value
than either the return address of the current function or the return address of
the function we called (which resides in the current function).
Is this always correct? I think that "blr" could be reused as
indirect jump by loading code pointer into LR. Hmmm... such a
jump still _should_ be in the current function (?).
For an indirect call ("brlr"?), the assumption is also true *after*
the call happened (but *not* directly before the call...).
Now the question is whether we really have to check this whenever we
branch to LR (blr), ie. whether a blr should map to a "return" or a
"boring jump".
Josef
>
>
|
Julian Seward wrote: >>What I do know is that there is a nasty hack in m_stacktrace.c, the part >>for unwinding the stack -- see VG_(get_StackTrace2) and specifically the >>stuff for setting/using lr_is_first_RA. This was from one of the IBM >>linux guys, unfortunately moved on elsewhere now. > > > I later documented the trick as shown below, so at least we can see what > it is doing. > > J > > /* We have to determine whether or not LR currently holds this fn > (call it F)'s return address. It might not if F has previously > called some other function, hence overwriting LR with a pointer > to some part of F. Hence if LR and IP point to the same > function then we conclude LR does not hold this function's > return address; instead the LR at entry must have been saved in > the stack by F's prologue and so we must get it from there > instead. Note all this guff only applies to the innermost > frame. */ By itself, the reasoning of that paragraph is not always correct. The prolog of a recursive function that calls itself directly (so that immediately after the recursive 'bl', then IP and LR do point to the same function) might save the return address into the stack only when forced to by preparation for a yet-deeper call. The deepest call on any call chain can avoid saving the return address into the stack, as long as returns from a deepest call also know this. Implementations of Ackerman's function often behave in this manner. [Indeed, Ackerman's function is a useful testcase for Callgrind.] Some hand-written subroutine nests have apriori bounds on the nesting level (frequently 1, 2, or 3), and dedicate a general register (instead of the stack) to hold the return address for each level. If the first call after the outermost entry into function F is a [recursive] call to F, and if the prolog determines that the [recursive] entry is a leaf entry, and if therefore F decides not to save the return address into the stack (and perhaps avoids constructing a stack frame at all), then LR and IP will point to the same function, and LR will be the current return address, but the logic of the paragraph quoted above will say that the stack holds the current return address. This will be an error, either because the stack slot for this level is logically undefined (never was written), or because leaf entry uses no frame at all (and thus the return address that is in "the" stack frame actually designates the _grandparent_ of the current activation.) If the compiler is "nice", then an instruction is a CALL if and only if it is a branch instruction with the LK bit set (the least significant bit.) Any indirect jump through the Link or Count register, when the LK bit of the instruction is 0, must be a RETURN, a tail-recursive continuation CALL (which must use the Count register, because the return address must always be in the Link register [unless the tail-recursive CALL is known to be a leaf call, or otherwise skips part of the prolog]), or a 'switch' case. Some compilers are "naughty": they set the LK bit willy-nilly. After all, the value in LR immediately after a RETURN is a "do not care." The logic in the quoted paragraph also does not handle true co-routines: two or more functions which resume each other by turns at the point of previous "exit." Runtime-generated code for formatted I/O often uses co-routines, and so do various simulation engines. Of course, co-routines blur the meaning of CALL and RETURN, but Callgrind must cope somehow. -- |
|
From: Harris, J. <Je...@ai...> - 2006-09-28 19:40:27
|
I've made a little more progress on this issue of getting the function
call/return tracked correctly. I modified to toIR.c file for the
guest-ppc in the VEX library. For some of the PPC branch instructions,
the code was setting the call type to either call or boring. I tried
modifying the code to return a type of return for the bctr instructions
similar to the bclr instructions. That change seemed to make the call
trace align a lot better, especially the _dl_runtime_resolve function
which is called a lot. I still have more work, I believe, to catch some
other branch instructions. I don't know how "ugly" of a fix it will be
since my knowledge of PPC assembly and ABI is quite limited.
Also, I found some issues in how some segments of the ELF file are
managed. The .plt section has some peculiar behavior on PPC. In the
application ELF object, the section should not be relocated, or it gets
the wrong starting address. In shared libraries, however, the section
should be relocated.
I'll try and spend more time when I can getting my changes into a
cohesive patch, once I've ironed out the bugs.
Jeff
-----Original Message-----
From: Julian Seward [mailto:js...@ac...]=20
Sent: Thursday, September 28, 2006 3:17 PM
To: Josef Weidendorfer
Cc: val...@li...; Harris, Jeff
Subject: Re: [Valgrind-users] Callgrind results on ppc
> What I do know is that there is a nasty hack in m_stacktrace.c, the
part
> for unwinding the stack -- see VG_(get_StackTrace2) and specifically
the
> stuff for setting/using lr_is_first_RA. This was from one of the IBM
> linux guys, unfortunately moved on elsewhere now.
I later documented the trick as shown below, so at least we can see what
it is doing.
J
/* We have to determine whether or not LR currently holds this fn
(call it F)'s return address. It might not if F has previously
called some other function, hence overwriting LR with a pointer
to some part of F. Hence if LR and IP point to the same
function then we conclude LR does not hold this function's
return address; instead the LR at entry must have been saved in
the stack by F's prologue and so we must get it from there
instead. Note all this guff only applies to the innermost
frame. */
|
|
From: Josef W. <Jos...@gm...> - 2006-09-28 19:50:06
|
On Thursday 28 September 2006 21:40, Harris, Jeff wrote: > I've made a little more progress on this issue of getting the function > call/return tracked correctly. I modified to toIR.c file for the > guest-ppc in the VEX library. For some of the PPC branch instructions, > the code was setting the call type to either call or boring. I tried > modifying the code to return a type of return for the bctr instructions > similar to the bclr instructions. That change seemed to make the call > trace align a lot better, especially the _dl_runtime_resolve function > which is called a lot. Ah, that's good to known. I remember that I requested a similar change a long time ago... AFAIK the only VG tool that is interpreting the kind of the jump is Callgrind, so such a change should be easy to integrate without any side effects. > I still have more work, I believe, to catch some > other branch instructions. I don't know how "ugly" of a fix it will be > since my knowledge of PPC assembly and ABI is quite limited. It really would be ugly if the jump kind depends on the context (e.g. whether LR holds the return address or not). > so, I found some issues in how some segments of the ELF file are > managed. The .plt section has some peculiar behavior on PPC. In the > application ELF object, the section should not be relocated, or it gets > the wrong starting address. In shared libraries, however, the section > should be relocated. So that would explain why skipping of calls into PLT sometimes is working and sometimes not? > I'll try and spend more time when I can getting my changes into a > cohesive patch, once I've ironed out the bugs. Thanks for this work, Josef > > Jeff > > -----Original Message----- > From: Julian Seward [mailto:js...@ac...] > Sent: Thursday, September 28, 2006 3:17 PM > To: Josef Weidendorfer > Cc: val...@li...; Harris, Jeff > Subject: Re: [Valgrind-users] Callgrind results on ppc > > > > > What I do know is that there is a nasty hack in m_stacktrace.c, the > part > > for unwinding the stack -- see VG_(get_StackTrace2) and specifically > the > > stuff for setting/using lr_is_first_RA. This was from one of the IBM > > linux guys, unfortunately moved on elsewhere now. > > I later documented the trick as shown below, so at least we can see what > > it is doing. > > J > > /* We have to determine whether or not LR currently holds this fn > (call it F)'s return address. It might not if F has previously > called some other function, hence overwriting LR with a pointer > to some part of F. Hence if LR and IP point to the same > function then we conclude LR does not hold this function's > return address; instead the LR at entry must have been saved in > the stack by F's prologue and so we must get it from there > instead. Note all this guff only applies to the innermost > frame. */ > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > > |
|
From: Harris, J. <Je...@ai...> - 2006-09-28 19:59:36
|
There is a comment in the bclr instruction which indicated that it used to choose either call or return as the type, but the code now just uses return. So, bctr is at least consistent. The fix for the .plt does explain why it sometimes worked and other times did not. I believe it's always working now for application and shared libraries. There are still loops with _dl_runtime_resolve which need to be resolved. There are still times when the stack does a recovery and jumps up a number of calls. I believe it's related to changing the offset of an obj_node in obj_of_address in bb.c based on what segment contains an address. It causes a symbol to show twice in the callgrind output as foo() and foo'2(). I haven't started tracking down what causes the problem yet. Jeff -----Original Message----- From: Josef Weidendorfer [mailto:Jos...@gm...]=20 Sent: Thursday, September 28, 2006 3:50 PM To: val...@li... Cc: Harris, Jeff; Julian Seward Subject: Re: [Valgrind-users] Callgrind results on ppc On Thursday 28 September 2006 21:40, Harris, Jeff wrote: > I've made a little more progress on this issue of getting the function > call/return tracked correctly. I modified to toIR.c file for the > guest-ppc in the VEX library. For some of the PPC branch instructions, > the code was setting the call type to either call or boring. I tried > modifying the code to return a type of return for the bctr instructions > similar to the bclr instructions. That change seemed to make the call > trace align a lot better, especially the _dl_runtime_resolve function > which is called a lot. Ah, that's good to known. I remember that I requested a similar change a long time ago... AFAIK the only VG tool that is interpreting the kind of the jump is Callgrind, so such a change should be easy to integrate without any side effects. > I still have more work, I believe, to catch some=20 > other branch instructions. I don't know how "ugly" of a fix it will be > since my knowledge of PPC assembly and ABI is quite limited. It really would be ugly if the jump kind depends on the context (e.g. whether LR holds the return address or not). > so, I found some issues in how some segments of the ELF file are > managed. The .plt section has some peculiar behavior on PPC. In the > application ELF object, the section should not be relocated, or it gets > the wrong starting address. In shared libraries, however, the section > should be relocated. So that would explain why skipping of calls into PLT sometimes is working and sometimes not? > I'll try and spend more time when I can getting my changes into a > cohesive patch, once I've ironed out the bugs. Thanks for this work, Josef >=20 > Jeff >=20 > -----Original Message----- > From: Julian Seward [mailto:js...@ac...]=20 > Sent: Thursday, September 28, 2006 3:17 PM > To: Josef Weidendorfer > Cc: val...@li...; Harris, Jeff > Subject: Re: [Valgrind-users] Callgrind results on ppc >=20 >=20 >=20 > > What I do know is that there is a nasty hack in m_stacktrace.c, the > part > > for unwinding the stack -- see VG_(get_StackTrace2) and specifically > the > > stuff for setting/using lr_is_first_RA. This was from one of the IBM > > linux guys, unfortunately moved on elsewhere now. >=20 > I later documented the trick as shown below, so at least we can see what >=20 > it is doing. >=20 > J >=20 > /* We have to determine whether or not LR currently holds this fn > (call it F)'s return address. It might not if F has previously > called some other function, hence overwriting LR with a pointer > to some part of F. Hence if LR and IP point to the same > function then we conclude LR does not hold this function's > return address; instead the LR at entry must have been saved in > the stack by F's prologue and so we must get it from there > instead. Note all this guff only applies to the innermost > frame. */ >=20 > ------------------------------------------------------------------------ - > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDE V > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users >=20 >=20 |
|
From: Josef W. <Jos...@gm...> - 2006-09-28 20:25:14
|
On Thursday 28 September 2006 21:59, Harris, Jeff wrote: > There are still times when the stack does a recovery and jumps up a > number of calls. This shows that some branches which should be "return" are interpreted as "calls". > I believe it's related to changing the offset of an > obj_node in obj_of_address in bb.c based on what segment contains an > address. Are you talking about the code after the comment /* Update symbol offset in object if remapped */ ? > It causes a symbol to show twice in the callgrind output as > foo() and foo'2(). With "--separate-recs=2" in place (which is default), "foo'2" decodes as "2nd and any further recursion level of function foo", i.e. whenever function "foo" appears >1 times on the shadow stack. You can get rid of this encoding of the recursion level into the symbol name with "--separate-recs=1". This encoding is meant as a feature, not a bug ;-) How can the above code trigger wrong recursions? It simply checks for changing mapping of a shared ELF object (e.g. trigged by a dlopen/dlclose and dlopen of the same object, mapped at another address). BTW, this code has a design flaw, as we really should get rid of meta data of discarded code (just imagine a JIT compiler): this is kind of a memory leak. Instead, the cost counters of the discarded code should immediatly be dumped out and counter/BB structures freed. This would allow callgrind to get rid of that (obj_node/offset) tuple instead of the simple code address. The later was introduced only to correctly handle remappings of ELF objects... Josef |
|
From: Tom H. <to...@co...> - 2006-09-12 15:35:40
|
In message <C01...@ai...>
Jeff Harris <Je...@ai...> wrote:
> Both environments are using dynamic libstdc++:
>
>> powerpc-ai-linux-ldd a.out
> libstdc++.so.5 => libstdc++.so.5 (0x0)
> libm.so.6 => libm.so.6 (0x0)
> libgcc_s.so.1 => libgcc_s.so.1 (0x0)
> libc.so.6 => libc.so.6 (0x0)
> /lib/ld.so.1 => /lib/ld.so.1 (0x0)
>
> The addresses are zero because I'm using the cross-compiler ldd on my
> PC. The x86 version is the same. All of the symbols are missing. The
> library on both platform is stripped, but still has its dynamic
> relocation entries. The libc and libm libraries are also stripped, but
> still load with Valgrind.
I was going on the fact that the x86 output contains this:
--11147-- Reading syms from /lib/libstdc++.so.5.0.3 (0x401A000)
while the PPC output does not contain an equivalent line.
That line will appear regardless of whether or not it actually finds
any symbols - all that has to happen for that to appear is that
valgrind has to see an mmap that appears to be mapping in a given
shared library.
If it is dynamically linked then either the dynamic linker has not
loaded it for some reason or VG_(di_notify_mmap) has not recognised
the mapping as being from a library.
You might want to try adding --trace-syscalls=yes and look for
evidence of it opening libstdc++ and mapping from it. If you find
that it is doing so that I would try adding some extra trace
statements to VG_(di_notify_mmap) in coregrind/m_debuginfo/debuginfo.c
to try and work out why valgrind is not spotting it.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2006-09-12 15:44:50
|
> You might want to try adding --trace-syscalls=yes and look for > evidence of it opening libstdc++ and mapping from it. If you find > that it is doing so that I would try adding some extra trace > statements to VG_(di_notify_mmap) in coregrind/m_debuginfo/debuginfo.c > to try and work out why valgrind is not spotting it. Also you could try running with --trace-symtab=yes to see gigabytes of gruesome crud showing what the debuginfo reader is doing, in great detail. J |