|
From: Trevor A. H. <th...@cs...> - 2009-02-28 13:58:03
|
Hi,
I'm making a Valgrind tool that reconstructs a binary's control flow graph.
I'd like there to be no duplicate instructions in the CFG. VEX disassembles
until it hits the next control transfer instruction. So there are duplicates
caused by code like this:
if ( guard )
{
b1
}
b2
If the guard evaluates to true, then there will be a super block that's the
concatenation of b1 & b2, if guard is false, then another superblock will be
created just for b2.
I tried the obvious step of finding the first instruction of b2 in b1, and
replacing it with a jump. But that stuffs up the register allocation. For
example, if I replace the instructions at 0x4005aa50 in the second block with a
call to the first block below. Then instead of doing a STLe to GET:I32(96) +
0x460, it does it to GET:I32(0) + 0x460. Eeek, that fails.
------ IMark(0x400aa50, 3) ------
t2 = GET:I32(16)
t0 = Add32(t2,0x1:I32)
IR-NoOp
t24 = _32Uto64(t0)
t12 = t24
PUT(16) = t12
------ IMark(0x400aa53, 7) ------
PUT(168) = 0x400aa53:I64
t14 = GET:I64(0)
t13 = Add64(t14,0x460:I64)
t15 = GET:I64(96)
STle(t13) = t15
------ IMark(0x400aa4a, 3) ------
t13 = GET:I64(96)
------ IMark(0x400aa4d, 2) ------
PUT(16) = 0x0:I64
IR-NoOp
PUT(16) = 0x0:I64
------ IMark(0x400aa4f, 1) ------
------ IMark(0x400aa50, 3) ------
t5 = GET:I32(16)
t3 = Add32(t5,0x1:I32)
IR-NoOp
t31 = _32Uto64(t3)
t19 = t31
PUT(16) = t19
------ IMark(0x400aa53, 7) ------
PUT(168) = 0x400aa53:I64
t20 = Add64(t13,0x460:I64)
STle(t20) = t13
Can anyone think of a solution?
thanks,
Trevor
|
|
From: Josef W. <Jos...@gm...> - 2009-02-28 21:09:15
|
On Saturday 28 February 2009, Trevor Alexander HANSEN wrote:
> Hi,
>
> I'm making a Valgrind tool that reconstructs a binary's control flow graph.
>
> I'd like there to be no duplicate instructions in the CFG.
> ...
Valgrind works on dynamic basic/superblocks, not static ones.
There can be blocks overlapping each other, resulting in the same
instruction existing in multiple instrumented blocks.
I think you have to cope with this.
Your approach never would work with something like this:
b1
label: b2
if (guard) { jump label; }
b1 & b2 always will show up combined in Valgrind. Lets say that
guard at the beginning always is false, so it does not matter.
But at one point, when guard happens to be true, do you want to split
up the already executed block b1&b2 into two blocks?
Josef
|
|
From: Trevor A. H. <th...@cs...> - 2009-03-01 03:08:34
|
> Your approach never would work with something like this:
>
> b1
> label: b2
> if (guard) { jump label; }
>
> b1 & b2 always will show up combined in Valgrind. Lets say that
> guard at the beginning always is false, so it does not matter.
> But at one point, when guard happens to be true, do you want to split
> up the already executed block b1&b2 into two blocks?
Nice example Josef. Yes if the guard is never true then I don't want the
blocks split. However, if the guard is true, then I'd like 'em to be split. So
after b2 if disassembled, I'd like b1 to be updated to unconditionally
jump to b2.
My analysis works best if there aren't duplicate instructions.
My problem seems to be that where I wish to place a new unconditional
jump instruction to b2 from b1, the guest register state hasn't been
saved.
So when duplicate instructions are disassembled it seems like I need to
call some method, that will disassemble not up until the first
control transfer instruction, but up until a particular address.
So I could make a call like disassemble(&b1, &b2). That is disassemble
from b1 up until the first instruction of b2, returning the basic block.
Then I'd add an unconditional jump to it, and replace the original,
concatenated b1;b2 block with the new block.
Does anyone know if such a function, that returns the block between two
addresses exists?
Thanks,
Trevor
|
|
From: Julian S. <js...@ac...> - 2009-03-01 11:36:46
|
> Does anyone know if such a function, that returns the block between two > addresses exists? I think what you want to do isn't doable, at least it isn't the way you've been looking at it so far. There are two "times" that are important here: when the block is translated (translation-time) and when it is run (run-time). In order to do a minimal partitioning at translation time, you need to know all the branch targets in the program, and that's just not doable. Not only from a practical perspective but also because it's undecidable. IMO you should forget about trying to avoid the tail duplication at translation time; just let the translator do its thing. So that means you're restricted to trying to deduce the block boundaries from the run-time behaviour. Suppose, at run time, you observe a jump/call/return (any control flow transfer which isn't simply "fall through to the next instruction") from instruction A to instruction B. Then you can deduce that that A is the end of a basic block, and B is the start of one. As the program runs more and more, and you see more A->B transitions (for different A and/or B) you gradually partition the initially unknown code space into smaller and smaller blocks. Of course that only gives you the blocks observed for the specific paths that that run of the program takes. In order to find all the bbs statically (at translation time) you would need to know all the branch targets statically. But that's not something that any static analysis can guarantee to do, because of decidability limitations. So that means you're restricted to finding the block boundaries that are evident at run-time. Does that make any sense? J |
Julian Seward wrote: > In order to do a minimal > partitioning at translation time, you need to know all the branch targets > in the program, and that's just not doable. Not only from a practical > perspective but also because it's undecidable. For nearly all practical programs, almost anything that is executed today on commodity hardware (compiled from C, C++, Fortran, Algol, Ada, Java, ...), it is easy and quick to compute all the branch targets. Beginning with the ElfXX_Ehdr, PT_LOAD, and .e_entry: disassemble and take the transitive closure of the possible succeeding instructions. The only difficult part is the idioms that are used for multiway branching (such as C 'switch' with large dense ranges of 'case'). In practice, appropriate software can process several megabytes of instructions in a couple seconds. In theory this may be undecidable, but in practice the goal can be accomplished quickly nearly all the time. -- |