|
From: Josef W. <Jos...@gm...> - 2007-02-15 19:34:43
|
Hi,
we recently talked about VEX possibly inverting conditional jumps,
which can confuse tools' instrumentation.
I just came around a simple example for this using lackey:
==============================================
int main()
{
int i, sum=0;
for(i=0;i<1000000;i++)
sum += i;
return sum;
}
==============================================
Relevant assembler:
==============================================
...
80484ac: 03 d0 add %eax,%edx
80484ae: 83 c0 01 add $0x1,%eax
80484b1: 3d 40 42 0f 00 cmp $0xf4240,%eax
80484b6: 7c f4 jl 80484ac <main+0x18>
...
==============================================
Obviously, the conditional branch is taken 1000000 times.
Yet, "valgrind --tool=lackey ./loop" gives
==============================================
...
==24183== Jccs:
==24183== total: 1,036,628
==24183== taken: 14,348 ( 1%)
...
==============================================
Which is way off.
Using --trace-flags=11100000, one can see the invertion in VEX code
(only relevant parts):
==============================================
==== BB 1435 main+24(0x80484AC) BBs exec'd 39078 ====
------------------------ Front end ------------------------
...
0x80484B6: jl-8 0x80484AC
------ IMark(0x80484B6, 2) ------
PUT(60) = 0x80484B6:I32
if (32to1(x86g_calculate_condition[mcx=0x13]{0x380a4ef0}(0xC:I32,GET:I32(32),GET:I32(36),GET:I32(40),GET:I32(
44)):I32)) goto {Boring} 0x80484AC:I32
goto {Boring} 0x80484B8:I32
------------------------ After pre-instr IR optimisation ------------------------
...
------ IMark(0x80484B6, 2) ------
PUT(60) = 0x80484B6:I32
t70 = CmpLT32S(t57,0xF4240:I32)
t81 = 1Uto32(t70)
t82 = 32to1(t81)
t83 = Not1(t82)
if (t83) goto {Boring} 0x80484B8:I32
goto {Boring} 0x80484AC:I32
}
...
==============================================
The "pre-instr IR optimisation" phase inverts the condition,
so that the original "taken" case becomes a fall through case, counted
by lackey as "not taken".
The solution is to not look at VEX code for deciding the "taken" case,
but check for not-sequential instruction addresses of the guest code.
However, lackey is meant as an example to show how to write tools; it
is really bad to give wrong examples. So we could
(1) Get rid of JCC counting in lackey
(2) Correct it and show how to do it the right way
Solution (2) complicates lackey (a little?), but if it is also
a tutorial about branch counting, it should be correct.
So perhaps a third solution would be:
(3) Not invert the meaning of conditions in the "pre-instr IR optimisation"
phase
Comments?
Josef
|