|
From: Nicholas N. <nj...@ca...> - 2002-11-21 10:26:01
|
On Thu, 21 Nov 2002, Julian Seward wrote:
> Variant 1 (less ambitious)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> We create three new functions, all of them predicates on UInstrs:
>
> 1 Bool uinstr_maybe_trashes_realEFLAGS ( UInstr* );
> 2 Bool uinstr_maybe_trashes_realFPU ( UInstr* );
> 3 Bool uinstr_maybe_reads_simdEIP ( UInstr* );
>
> Fn (1) would dually be used to support lazy %EFLAG save restore in the
> same manner as the FPU.
Can I just confirm my understanding of the lazy %EFLAG optimisation?
Currently, AIUI, a lot of basic blocks (pretty much all those that end in
a conditional jump) end like this:
37: ANDL %eax, %ebx (-wOSZACP)
pushl 32(%ebp) ; popfl
andl %eax, %ebx
pushfl ; popl 32(%ebp) ***
38: INCEIPo $2
addl $2, 36(%ebp)
39: Jzo $0x402047AC (-rOSZACP)
pushl 32(%ebp) ; popfl ***
jnz-8 %eip+6
movl $0x402047AC, %eax
ret
If-and-only-if the INCEIP addl is removed, then the 4 instructions in the
two lines marked *** can be trivially removed.
Looking through --skin=none code, the only place where this seems
applicable is for conditional jumps -- arithmetic ops in the middle of
basic blocks don't look like they can have their %EFLAGs fiddling avoided.
One downer: I'm not sure if this will work for MemCheck. Consider the
instrumented code in the same situation:
30: ANDL %edx, %ebx (-wOSZACP)
pushl 32(%ebp) ; popfl
andl %edx, %ebx
pushfl ; popl 32(%ebp)
31: INCEIPo $2
addl $2, 104(%ebp)
32: GETVFo %eax
movl 0x44(%ebp), %eax
33: TESTVo %eax
testb $0x1, %al
jz-8 %eip+3
call * 76(%ebp)
34: Jzo $0x4000D2C4 (-rOSZACP)
pushl 32(%ebp) ; popfl
jnz-8 %eip+6
movl $0x4000D2C4, %eax
ret
Ignoring the INCEIP, the arithmetic op and the conditional jump are no
longer adjacent.
Here's the equivalent code for Cachegrind:
9: ANDL %edi, %edx (-wOSZACP)
pushl 32(%ebp) ; popfl
andl %edi, %edx
pushfl ; popl 32(%ebp)
10: MOVL $0x40269DC0, %eax
movl $0x40269DC0, %eax
11: CCALLo 0x4001C170(%eax)
call * 36(%ebp)
12: INCEIPo $2
addl $2, 60(%ebp)
13: MOVL $0x40269DE0, %eax
movl $0x40269DE0, %eax
14: CCALLo 0x4001C170(%eax)
call * 36(%ebp)
15: Jnzo $0x4000D190 (-rOSZACP)
pushl 32(%ebp) ; popfl
jz-8 %eip+6
movl $0x4000D190, %eax
ret
Again the optimisation is foiled.
It would work for AddrCheck and Helgrind though.
It seems that we would be getting into a situation where the
instrumentation you add could affect what optimisations can occur; in one
way this is reasonable but in another way it kind of sucks.
Or have I misunderstood something?
> Variant 2 (more ambitious)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Difference here is we do table-based EIP reconstruction as needed
> and completely nuke INCEIP (yay!)
> [...]
> If we could do variant 2 rather than variant 1 that would be cool,
> considering it would give a bigger speedup overall.
I support this. The complexity will be localised pretty well, which is
good.
BTW, Am I right in thinking that the INCEIP instruction will stay in
UCode, but that it will just be removed during the codegen phase?
Because skins (eg. Cachegrind) use it as a way of determining the
boundaries of the original x86 instructions.
N
|