|
From: Jeremy F. <je...@go...> - 2002-10-04 15:44:12
|
On Fri, 2002-10-04 at 03:51, Julian Seward wrote:
How much faster is "significantly faster" ?
I haven't measured it in detail, but the frame rate increased from about
1100ms/frame to 800-900ms/frame. I'll so some more scientific
measurements soon.
So, my main point. I think this patch is unsafe and will lead to hard
to find problems down the line. The difficulty is that it allows the
simulated FPU state to hang around in the real FPU for long periods,
up to a whole basic block's worth of execution (if I understand it
write).
We only need a skin to call out to a helper function which modifies
the real FPU state on some obscure path, and we're hosed. Since we don't
have any control over what skins people might plug in, this seems like
and unsafe modification to the core.
The modification I had in mind for a while was a lot more conservative,
and more along the lines of a peephole optimisation. Essentially
if we see a FPU-no-mem op followed by another FPU-no-mem op we can
skip the save at the end of the first and the restore at the start of
the second.
What I'm doing is not conceptually different from caching an ArchReg in
a RealReg for the lifetime of a basic block. The general idea is that
the FP state is pulled in just before the first FPU/FPU_[RW]
instruction, and saved again just before:
- JMP
- CCALL
- any skin UInstr
I can't see how a skin can introduce any instrumentation which would be
able to catch the FP state unsaved (is there any way for a skin to do
instrumentation or call a C function without using either CCALL or its
own UInstr?).
Your idea is basically the same, except we add a fourth saving
condition:
- any non FPU instruction
This would only be necessary if you imagine a non-FPU instruction which
can inspect the architectural state of the FPU (in other words, is a
memory access offset into the baseBlock: something which skins can't
generate directly).
In summary, I think this is actually pretty conservative, simple and
safe.
J
|