|
From: Julian S. <js...@ac...> - 2002-11-21 09:01:36
|
It's clear we need to do something to fix this. This is a proposal with two variants; a less ambitious variant (first) and a more ambitious one (second). Variant 1 (less ambitious) ~~~~~~~~~~~~~~~~~~~~~~~~~~ We create three new functions, all of them predicates on UInstrs: 1 Bool uinstr_maybe_trashes_realEFLAGS ( UInstr* ); 2 Bool uinstr_maybe_trashes_realFPU ( UInstr* ); 3 Bool uinstr_maybe_reads_simdEIP ( UInstr* ); which tell you whether or not a uinstr (more correctly, the code generated for that uinstr) **might** (1) alter the real machine's %eflags, (2) alter the real machine's FPU state, and (3) require to see the %EIP of the simulated machine. Getting a safe-if-conservative result for (1) and (2) is critical for translation correctness. For (3) it just effects the accuracy of stack traces. So the safe thing to return is True for all functions, yet we want to return False as often as we safely can. Classical compiler-analysis conservative-estimate stuff. Further, for all skins which extend ucode, we add suitable functions to the core/skin iface so that the same questions can be asked of the extended ucodes. With fn (3), the redundant INCEIP removal phase can be reinstated for all skins, and should work correctly. The reason it's commented out is precisely because at present it can't reliably ask this question. Fn (2) would be used for formally and cleanly support Jeremy's lazy FPU state save/restore, which I think is rolled into the code generation loop. I would like to be able to ship this optimisation with confidence that I understand the ramifications. Fn (3) would dually be used to support lazy %EFLAG save restore in the same manner as the FPU. None of this would be hard to implement, and they would support a useful bunch of optimisations. Variant 2 (more ambitious) ~~~~~~~~~~~~~~~~~~~~~~~~~~ We create two new functions, all of them predicates on UInstrs: 1 Bool uinstr_maybe_trashes_realEFLAGS ( UInstr* ); 2 Bool uinstr_maybe_trashes_realFPU ( UInstr* ); We do the lazy FPU and EFLAGS save/restore optimisation exactly as above. Difference here is we do table-based EIP reconstruction as needed and completely nuke INCEIP (yay!) So, precisely how to reconstruct %EIP from %eip? Like this. Firstly %EIP must be made up-to-date at the start of each bb, but it is anyway, so that's free. Now suppose we are in the middle of a bb and want to know the current %EIP. We actually know the block-start %EIP and the current %eip. Using the block-start %EIP we can look in the translation table (TT) to find info about this block: start address of its translation, and presumably whereabouts it's %eip->%EIP mapping table is. Knowning where the translation starts (%eip for the start of the translated block) and knowing the current %eip, we can figure out how far inside the translation we are. That offset can then be used to index (somehow) the mapping table for this block, to give the corresponding %EIP offset, which, when added to the block-start %EIP, gives the current %EIP. Makes sense? So the only question is how to compactly encode the mapping table, and perhaps where to store them, but that's not a big deal. Comments? If we could do variant 2 rather than variant 1 that would be cool, considering it would give a bigger speedup overall. J |