|
From: David G. <dav...@gm...> - 2010-10-28 17:55:15
|
Hi! I'm currently trying to develop a simple profiling tool on top of
valgrind/callgrind. To do that I've inserted code on "CLG_(instrument)"
(callgrind/main.c) so that it creates a block (that actually includes the
current BB and some other information - I call it execution block) so that
when addEvent_Dr or addEvent_Dw are called I can register the respective
memory access (read or write). Then, at "finish" I try to calculate the
conflicts between blocks with the information on their memory accesses and
finally I output those results. When running this version of callgrind with
a simple program:
#include "callgrind.h"
volatile int i;
volatile int k;.
.
.
int main() {
CALLGRIND_START_INSTRUMENTATION
volatile int j;
k = 0;
i = 10;
for(j = 0; j<i; j++){
k = k + 1;
}
i = 15;
func1();
func2();
i = 2;
CALLGRIND_STOP_INSTRUMENTATION
return 0;
}
(func1 and func2 don't matter for this problem)
observing the output I see that there is no information about the blocks of
the for-loop: I identify each execution block (at "CLG_(instrument)") by
"new_eb->id = CLG_(stat).bb_executions;" and on the output I see a jump from
block #2 to block #12 leading me to the conclusion that the 10 blocks for
the 10 iterations are missing. It's important that I can output this
information. I know that VEX does some optimizations to the client code
before it is instrumented by the tool (callgrind). Namely it does loop
unrolling which I believe is the source of my problem. So, I ask: is this
really the cause of my problem? and Is there an
obvious/simple/straightforward way to prevent VEX from doing this?
Thank you very much for your help!
|
|
From: Julian S. <js...@ac...> - 2010-10-30 11:47:37
|
You can disable unrolling by setting the field .iropt_unroll_thresh to
zero in the VexControl structure handed to VEX. Don't ask me where
that is though. But I think Callgrind does this already.
A more immediate question I have is
> k = 0;
>
> i = 10;
>
> for(j = 0; j<i; j++){
>
> k = k + 1;
>
> }
It's not beyond the realms of possibility that a clever loop optimiser
(in gcc) would nuke the loop and replace it by "k += 10". Or even fold
the whole thing out at compile time. Did you check, with objdump -d,
that the assembly code you're getting from gcc is in any way similar to
what you expect to get? When writing small test programs like this it's
important to avoid writing stuff which gcc can transform into something
you don't want.
J
|