From: Jeremy F. <je...@go...> - 2002-10-04 15:44:12
|
On Fri, 2002-10-04 at 03:51, Julian Seward wrote: How much faster is "significantly faster" ? I haven't measured it in detail, but the frame rate increased from about 1100ms/frame to 800-900ms/frame. I'll so some more scientific measurements soon. So, my main point. I think this patch is unsafe and will lead to hard to find problems down the line. The difficulty is that it allows the simulated FPU state to hang around in the real FPU for long periods, up to a whole basic block's worth of execution (if I understand it write). We only need a skin to call out to a helper function which modifies the real FPU state on some obscure path, and we're hosed. Since we don't have any control over what skins people might plug in, this seems like and unsafe modification to the core. The modification I had in mind for a while was a lot more conservative, and more along the lines of a peephole optimisation. Essentially if we see a FPU-no-mem op followed by another FPU-no-mem op we can skip the save at the end of the first and the restore at the start of the second. What I'm doing is not conceptually different from caching an ArchReg in a RealReg for the lifetime of a basic block. The general idea is that the FP state is pulled in just before the first FPU/FPU_[RW] instruction, and saved again just before: - JMP - CCALL - any skin UInstr I can't see how a skin can introduce any instrumentation which would be able to catch the FP state unsaved (is there any way for a skin to do instrumentation or call a C function without using either CCALL or its own UInstr?). Your idea is basically the same, except we add a fourth saving condition: - any non FPU instruction This would only be necessary if you imagine a non-FPU instruction which can inspect the architectural state of the FPU (in other words, is a memory access offset into the baseBlock: something which skins can't generate directly). In summary, I think this is actually pretty conservative, simple and safe. J |