|
From: Jeremy F. <je...@go...> - 2002-11-20 16:46:31
|
On Wed, 2002-11-20 at 04:23, Nicholas Nethercote wrote:
> Does this mean that the fall-through JMP back to the dispatcher is
> replaced by the following basic block? ie. you're allowing multiple exits
> from a basic block?
>
> ie.
>
> [code]
> Jcc X
> JMP 0xY
>
> becomes
>
> [code]
> Jcc ...
> [code for block at 0xY]
>
> ?
>
> Interesting that this doesn't make a definite improvement.
That in itself should be a definite improvement. The problem is that
its going to increase the rate of redundant translations. If you have
the extended basic block:
1: ...
jnz 3
2: ...
jeq 2
3: ...
jz 5
4: ...
5: ...
jmp 4
Then you're going to end up translating some instructions 5 times.
> Another way to extend basic blocks is to inline direct calls in the same
> way, eg:
>
> [code]
> JMP-c 0xX
>
> becomes
>
> [code]
> [code for block at 0xX]
Right. So you'd want to do some branch-weight measurements first to see
which way around it should go.
> Problem with that is that it screws up function-entry detection for skins
> -- currently to detect function entry they can call
> VG_(get_fnname_if_entry)() at the start of a basic block to determine if
> it's the first basic block in a function. With this optimisation they'd
> have to do it for every x86 instruction, urgh.
To do this, you'd need a SETEIP UInstr to keep the EIP up to date, even
if there isn't a jump. Even if you weren't keeping EIP up to date, you
could have a UInstr for representing the start of a basic block.
> Similarly direct JMPs seem pretty common, perhaps they too could be
> inlined:
>
> [code]
> JMP 0xY
>
> becomes
>
> [code]
> [code for block at 0xY]
>
> ----
>
> This sort of inlining seems nicer than chaining. Would it be hard?
This is what I've been referring to as trace caching. The problem is
that its hard to know when to stop, as a tradeoff between runtime
efficiency and repeated compilation. For example, this is the schematic
of a switch in an infinite loop:
1: cmp ...
jeq 5
cmp ...
jle 4
cmp ...
jgt 3
cmp ...
jne 2
...
jmp 6
2: ...
jmp 6
3: ...
jmp 6
4: ...
jmp 6
5: ...
6: ...
jmp 1
It seems to me that you're going to have to work very hard to amortise
the cost of doing all that compilation.
Dynamo does this, but only after running the code for a while to see
that it's worth the cost of extra compilation. We don't have the
machinery to do "regenerate after instrumenting it for a while", and
that's where the complexity in this idea is. The actual following-jmps
is easy.
> Is it any different in principle to what Jeremy's already implemented?
Chaining is almost as efficient as this, but it has the advantage of not
incurring as much compile-time overhead and only incremental runtime
overhead.
J
|