|
From: Emilio C. <er...@gm...> - 2011-08-24 19:12:42
Attachments:
function.c
|
Hi I am trying to write a new tool. For my purpose i need to track function entry/exit, I read tracking-fn-entry-exit.txt and see some code of callgrind (it's not easy to isolate the code that i need), but I am not so good and so I have some problem. I write some code that basically insert a call to an helper function at the beginning of each BB (guest_chase_thresh = 0). Then I simulate a stack during the execution of the program based on SP changes: - if VG_(get_fnname_if_entry) is true, I simulate a call - if the stack pointer provided by Valgrind is bigger than the SP on my stack, I simulate one or more return (callgrind, I think, call this operation unwind_call_stack) This does not work correctly (I compare my result with callgrind), something is missing. The code fo the simulated stack is attached to this email. Do you have any suggestion? How can I correcly trace function entry/exit without analyse all jumps? Sorry for my bad english. Emilio. |
|
From: Josef W. <Jos...@gm...> - 2011-08-24 20:03:47
|
On Wednesday 24 August 2011, Emilio Coppa wrote:
> Hi
>
> I am trying to write a new tool. For my purpose i need to track function
> entry/exit, I read tracking-fn-entry-exit.txt and see some code of callgrind
> (it's not easy to isolate the code that i need), but I am not so good and so
> I have some problem.
> I write some code that basically insert a call to an helper function at the
> beginning of each BB (guest_chase_thresh = 0). Then I simulate a stack
> during the execution of the program based on SP changes:
> - if VG_(get_fnname_if_entry) is true, I simulate a call
> - if the stack pointer provided by Valgrind is bigger than the SP on my
> stack, I simulate one or more return (callgrind, I think, call this
> operation unwind_call_stack)
> This does not work correctly (I compare my result with callgrind), something
> is missing. The code fo the simulated stack is attached to this email.
Can you show us an example of the differences you get?
Callgrind not only assumes a call if VG_(get_fnname_if_entry) is true, but also,
if VEX returns jump kind "Call" for the last BB.
You should be aware that you get problems with
- tail recursions (jumping to the beginning of a function)
- multithreaded code (you need a shadow stack for every thread)
- signal handlers (you will see arbitrary function calls when a signal handler
calls a function)
I found that the shadow stack with synchronization of SP (yes, using unwind_call_stack)
is quite robust, especially for exception handling e.g. in C++ or longjumps.
However, this does not work nicely with RISC architectures (PPC/ARM), as a call or
a return does not need to change the SP there. Instead, the return address is stored
and restored from a specific register ("link register") using regular branch instructions.
Callgrind does not do this right at the moment, too.
> Do you have any suggestion? How can I correcly trace function entry/exit
> without analyse all jumps?
>
> Sorry for my bad english.
>
> Emilio.
>
|
|
From: Emilio C. <er...@gm...> - 2011-08-24 21:52:45
|
Hi Josef, thanks for your reply :) Can you show us an example of the differences you get? > My output <http://pastebin.com/6k0g2i9f> and callgrind output.<http://pastebin.com/XXbzP56Q>See the difference it's a little bit caotic, sorry. For example (line numbers on the left), callgrind says: > . > strlen(0x40183b6, 0xbe918faa, ...) [ld-2.13.so / 0x16c20] > > . > calloc(0x2e9, 0x1, ...) [ld-2.13.so / 0x14dd0] > > . > 0x000174ab(0xbe918194, 0x401cff4, ...) [ld-2.13.so / 0x174ab] > > . .> malloc(0x2e9, 0x401cff4, ...) [ld-2.13.so / 0x14da0] > > . . > 0x000174ab(0x401cff4, 0xbe918158, ...) [ld-2.13.so / 0x174ab] > > . . > __libc_memalign(0x8, 0x2e9, ...) [ld-2.13.so / 0x14c90] > > . . > 0x000174ab(0x0, 0x0, ...) [ld-2.13.so / 0x174ab] > > . > memcpy(0x401dc00, 0x40183b6, ...) [ld-2.13.so / 0x17330] My code says (number inside [] is my simulated stack depth ): > [5] Entered strlen() > [5] Exit strlen() > [4] Inside _dl_new_object > Valgrind does not know where we are: 0x40007E4 > [5] Entered calloc() > Valgrind does not know where we are: 0x40174AB > Valgrind does not know where we are: 0x40007D4 > [6] Entered malloc() > Valgrind does not know where we are: 0x40174AB > Valgrind does not know where we are: 0x40007C4 > [7] Entered __libc_memalign() > Valgrind does not know where we are: 0x40174AB > [7] Exit __libc_memalign() > [6] Inside malloc > [6] Exit malloc() > [5] Inside calloc > [5] Exit calloc() > [4] Inside _dl_new_object > [5] Entered memcpy() > [5] Exit memcpy() As you can see with my code strlen() return immediately, with callgrind not. There are a lot of other difference if you see the two files. Callgrind not only assumes a call if VG_(get_fnname_if_entry) is true, but > also, > if VEX returns jump kind "Call" for the last BB. Why this? I mean if the last BB do a jump type Ijk_Call, why does it not jump to first instruction of a BB (of another function)? I expect this if it's a "call". Sorry if it's a stupid question. Can I do this: during instrumentation, if jmpkind of the BB is Ijk_Call invoke an helper function that simulate the call ? - tail recursions (jumping to the beginning of a function) > Ok, I am seeing your code ( setup_bbcc() ), you do a lot of things: - if last BB has a jmpkind == IjkRet and SP is smaller or equal (if is a return we expect a bigger SP) to the one on top of the stack, you set ret_without_call = True - you simulate a call if: ret_without_call is true OR current BB is the first BB of a function OR (last_bb->sect_kind != bb->sect_kind) || (last_bb->obj->number != bb->obj->number) Am I correct? > - multithreaded code (you need a shadow stack for every thread) > Yes of course. > - signal handlers (you will see arbitrary function calls when a signal > handler > calls a function) > How callgrind manage this? > However, this does not work nicely with RISC architectures (PPC/ARM), as a > call or > a return does not need to change the SP there. Instead, the return address > is stored > and restored from a specific register ("link register") using regular > branch instructions. > Callgrind does not do this right at the moment, too. Ok. Emilio. |
|
From: Josef W. <Jos...@gm...> - 2011-08-25 12:59:36
|
On Wednesday 24 August 2011, Emilio Coppa wrote:
> > . > strlen(0x40183b6, 0xbe918faa, ...) [ld-2.13.so / 0x16c20]
> > . > calloc(0x2e9, 0x1, ...) [ld-2.13.so / 0x14dd0]
> > . > 0x000174ab(0xbe918194, 0x401cff4, ...) [ld-2.13.so / 0x174ab]
> > . .> malloc(0x2e9, 0x401cff4, ...) [ld-2.13.so / 0x14da0]
> > . . > 0x000174ab(0x401cff4, 0xbe918158, ...) [ld-2.13.so / 0x174ab]
> > . . > __libc_memalign(0x8, 0x2e9, ...) [ld-2.13.so / 0x14c90]
> > . . > 0x000174ab(0x0, 0x0, ...) [ld-2.13.so / 0x174ab]
> > . > memcpy(0x401dc00, 0x40183b6, ...) [ld-2.13.so / 0x17330]
Hmm... strlen shouldn't need to call calloc, so your output seems to be sane.
What distribution is this?
Can you locate the relevant output of "valgrind --tool=callgrind --ct-verbose=2 ..."?
And can you show the relevant disassembly of ld-2.13.so at offset 0x16c20 ?
The calls to 0x000174ab should be fine. They most probably are about getting the
GOT pointer for the library.
> My code says (number inside [] is my simulated stack depth ):
>
>
> > [5] Entered strlen()
> > [5] Exit strlen()
> > [4] Inside _dl_new_object
> > Valgrind does not know where we are: 0x40007E4
When do you print out this?
> > [5] Entered calloc()
> > Valgrind does not know where we are: 0x40174AB
> > Valgrind does not know where we are: 0x40007D4
> > [6] Entered malloc()
> > Valgrind does not know where we are: 0x40174AB
> > Valgrind does not know where we are: 0x40007C4
> > [7] Entered __libc_memalign()
> > Valgrind does not know where we are: 0x40174AB
> > [7] Exit __libc_memalign()
> > [6] Inside malloc
> > [6] Exit malloc()
> > [5] Inside calloc
> > [5] Exit calloc()
> > [4] Inside _dl_new_object
> > [5] Entered memcpy()
> > [5] Exit memcpy()
>
>
> As you can see with my code strlen() return immediately, with callgrind
> not.
> There are a lot of other difference if you see the two files.
>
> Callgrind not only assumes a call if VG_(get_fnname_if_entry) is true, but
> > also,
> > if VEX returns jump kind "Call" for the last BB.
>
>
> Why this? I mean if the last BB do a jump type Ijk_Call, why does it not
> jump to first instruction of a BB (of another function)?
Of course it does. But it could be that there is not debug info available,
and thus, Valgrind can not tell whether this is the beginning of a function
or not.
> I expect this if
> it's a "call". Sorry if it's a stupid question.
>
> Can I do this: during instrumentation, if jmpkind of the BB is Ijk_Call
> invoke an helper function that simulate the call ?
I think that should work, yes.
> - tail recursions (jumping to the beginning of a function)
> >
>
> Ok, I am seeing your code ( setup_bbcc() ), you do a lot of things:
>
> - if last BB has a jmpkind == IjkRet and SP is smaller or equal (if is a
> return we expect a bigger SP) to the one on top of the stack, you
> set ret_without_call = True
I do not remember any more, but I suppose I added this when I found
a case with "PUSH addr; RET" or similar.
> - you simulate a call if:
>
> ret_without_call is true OR
>
> current BB is the first BB of a function OR
> (last_bb->sect_kind != bb->sect_kind) || (last_bb->obj->number !=
> bb->obj->number)
>
>
> Am I correct?
Yes, if it is not a return.
The result is that sometimes jumps (e.g. among libraries) will be
converted into calls.
This really is spaghetti code. It should be rewritten more clearly.
> > - signal handlers (you will see arbitrary function calls when a signal
> > handler
> > calls a function)
> >
>
> How callgrind manage this?
Valgrind core can catch enter/leave of signal handlers. I put a marker
into the shadow stack if I see a signal handler being called. This way,
even long jumps out of signal handlers are detectede. Valgrind can not
catch such returns from signal handlers, which are invisible to the kernel.
Josef
> > However, this does not work nicely with RISC architectures (PPC/ARM), as a
> > call or
> > a return does not need to change the SP there. Instead, the return address
> > is stored
> > and restored from a specific register ("link register") using regular
> > branch instructions.
> > Callgrind does not do this right at the moment, too.
>
>
> Ok.
>
> Emilio.
>
|
|
From: Emilio C. <er...@gm...> - 2011-08-25 17:48:05
|
Hi Josef, > Hmm... strlen shouldn't need to call calloc, so your output seems to be > sane. > Ok. In the next days I try to trace function entry/exit with a tool based on PIN on the same program, I hope I can understand more about differences between my code and callgrind. It's not easy to catch errors if you don't know the function flow :( > What distribution is this? > Ubuntu 10.04 32bit. > Can you locate the relevant output of "valgrind --tool=callgrind > --ct-verbose=2 ..."? > Here: http://www.ercoppa.org/include/files/log-call2tar.bz2 The original (stupid) program is this: http://pastebin.com/M1AaTMWN > And can you show the relevant disassembly of ld-2.13.so at offset 0x16c20 > ? > Obtained with objdump -d /lib/i386-linux-gnu/ld-2.13.so--start-address=0x16c20 Here: http://pastebin.com/ugf3HzRJ > > > Valgrind does not know where we are: 0x40007E4 > > When do you print out this? It's printed by the helper function (called at the beginning of a BB) when Valgrind doesn't give me debug info [VG_(get_fnname)], > Of course it does. But it could be that there is not debug info available, > and thus, Valgrind can not tell whether this is the beginning of a function > or not. Ok, thanks. In the next days I modify my code based on your observations. We'll see what I get :) Emilio. |
|
From: Josef W. <Jos...@gm...> - 2011-09-01 16:40:31
|
On Thursday 25 August 2011, Emilio Coppa wrote: > > Hmm... strlen shouldn't need to call calloc, so your output seems to be > > sane. ... > > Can you locate the relevant output of "valgrind --tool=callgrind > > --ct-verbose=2 ..."? Thanks! There are a lot of strlen called here, but all end with returning from strlen. So that is sane. And that makes sense, as... > Obtained with objdump -d /lib/i386-linux-gnu/ld-2.13.so--start-address=0x16c20 > Here: http://pastebin.com/ugf3HzRJ ... the disassembly also does not contain any calls. So there must be something wrong with the --ct-verbose=1 output you got. So, am I right that if you select strlen in kcachegrind, you do not see any callees there, too? There is a section where strlen returns to _dl_new_object, and according to your log (--ct-verbose=2), it then calls 040007e4/P (from PLT section), which also gets the symbol "_dl_new_object", and from there it is jumping to calloc. I have no idea why the call into the PLT section gets the same symbol. Can you check if objdump sees some symbol at offset 74e in ld.so.13 ? Thanks, Josef > > > > > > Valgrind does not know where we are: 0x40007E4 > > > > When do you print out this? > > > It's printed by the helper function (called at the beginning of a BB) when > Valgrind doesn't give me debug info [VG_(get_fnname)], > > > > Of course it does. But it could be that there is not debug info available, > > and thus, Valgrind can not tell whether this is the beginning of a function > > or not. > > > Ok, thanks. > > In the next days I modify my code based on your observations. We'll see what > I get :) > > Emilio. > |
|
From: Emilio C. <er...@gm...> - 2011-09-09 22:26:16
|
Hi Josef, sorry for my delay in answering you :( So there must be something wrong with the --ct-verbose=1 output you got. Yes, probably because you do not show PLT, so if you have this flow 1) start function a() 2) return function a() and go in a PLT 3) start function b() You should print this: > a() > PLT() > > b() But if you print: > a () > > b() So you can think (erroneously) that b() is called from a(). So, am I right that if you select strlen in kcachegrind, you do not see > any callees there, too? Yes, you are right. Can you check if objdump sees some symbol at offset 74e in ld.so.13 ? Output of: objdump -d /lib/i386-linux-gnu/ld-2.13.so --start-address=0x074e Here: http://www.ercoppa.org/include/files/dump.log I have improved my code based on your suggestions and callgrind code. I create a new valgrind tool called TF, when it is ready I integrate its code on my valgrind tool (aprof). I do this because maybe this tool can be used by other users interested understanding function tracing (I see other requests in the past about callgrind logic). The code is here: http://code.google.com/p/aprof/source/browse/#svn%2Fbranches%2Ffunction%2Fvalgrind%2Ftf Or: svn co http://aprof.googlecode.com/svn/branches/function/valgrind/tf I try to comment all the code, sorry for the error, I do my best, I hope I can improve them with time. I have two issues for now: 1) TF does not recognize correctly CALL as RET: i use your logic (i hope) but I can't undestand why TF fail doing this compared with callgrind, for example see this two log: TF: http://pastebin.com/jybKDd8D Callgrind: http://pastebin.com/8B1ynfmg If you see callgrind log, almost at the end, there are different prints like "No match found BB(XXX); RET converted in CALL XXX". I insert this prints both on callgrind code and on TF code. But TF does not print the same :( I really do not understand why, maybe it's a little mistake in my code or I miss something in your logic :( 2) For now TF does not manage signal handlers: I am trying to understand your code (I have not spent much time on this part, because these days I have to prepare an exam :( ). You say: I put a marker > into the shadow stack if I see a signal handler being called. This way, > even long jumps out of signal handlers are detectede. But if a signal handler do a long jump, I expect that synchronization of SP fix this problem. Why not? I mean code in setup_bbcc() line 745:749 ( http://code.google.com/p/aprof/source/browse/trunk/valgrind/callgrind/bbcc.c ). Of course with a marker I can assure that all push made inside the signal handler, are correctly pop out. Is this the reason? Sorry for my bad english. Emilio. |
|
From: Josef W. <Jos...@gm...> - 2011-09-10 00:18:32
|
On Saturday 10 September 2011, Emilio Coppa wrote: > > a() > > PLT() > > > b() > > But if you print: > > a () > > > b() > > So you can think (erroneously) that b() is called from a(). You are completely right. The calls into PLT are printed if "--skip-plt=no" is specified. Then, the results should look a lot better. > I have two issues for now: > 1) TF does not recognize correctly CALL as RET: i use your logic (i hope) > but I can't undestand why TF fail doing this compared with callgrind, for > example see this two log: > > TF: http://pastebin.com/jybKDd8D > Callgrind: http://pastebin.com/8B1ynfmg > > If you see callgrind log, almost at the end, there are different prints like > "No match found BB(XXX); > RET converted in CALL XXX". I insert this prints both on callgrind code and > on TF code. But TF does not print the same :( > I really do not understand why, maybe it's a little mistake in my code or I > miss something in your logic :( Looks like the places where dl_runtime_resolve() "returns" into the resolved functions. Not sure what you are doing wrong here, but I remember that I had to special-case dl_runtime_resolve. I even have a signature check for the machine code for the case that the runtime linker is stripped... > 2) For now TF does not manage signal handlers: I am trying to understand > your code (I have not spent much time on this part, because these days I > have to prepare an exam :( ). You say: > > I put a marker > > into the shadow stack if I see a signal handler being called. This way, > > even long jumps out of signal handlers are detectede. > > > But if a signal handler do a long jump, I expect that synchronization of SP > fix this problem. Yes, of course. While unwinding, it will see the marker, and thus it knows that the signal handler was left. I.e. instead of adding calls to the (separate) call graph of the signal handler, it returns to the regular call graph of the thread. BTW, signal handlers are probably not worth to be handled specially. I already had quite a large number of bug reports because of that. Josef > Why not? I mean code in setup_bbcc() line 745:749 ( > http://code.google.com/p/aprof/source/browse/trunk/valgrind/callgrind/bbcc.c > ). > Of course with a marker I can assure that all push made inside the signal > handler, are correctly pop out. Is this the reason? > > Sorry for my bad english. > > Emilio. > |
|
From: Emilio C. <er...@gm...> - 2011-09-10 19:01:15
|
Hi Josef, You are completely right. The calls into PLT are printed if > "--skip-plt=no" is specified. Then, the results should look > a lot better. Perfect :) > Looks like the places where dl_runtime_resolve() "returns" into the > resolved > functions. Not sure what you are doing wrong here, but I remember that I > had > to special-case dl_runtime_resolve. I even have a signature check for the > machine code for the case that the runtime linker is stripped... You are right! Thanks :) I see your code, I try to do the same as callgrind. > Yes, of course. While unwinding, it will see the marker, and thus it knows > that the signal handler was left. I.e. instead of adding calls to the > (separate) call graph of the signal handler, it returns to the regular call > graph of the thread. Ok, I think I understand. Josef thanks for your help, i appreciate your patience with me. Emilio. |