|
From: Nicholas N. <nj...@ca...> - 2002-11-20 09:23:52
|
On Wed, 20 Nov 2002, Julian Seward wrote: > Another obvious lemon is the INCEIP nonsense (of my own devising :) > For every insn executed, except for the last in each bb, there is a > corresponding INCEIP, which becomes something like > addl $insn_size, 36(%ebp). That's probably expensive; it's 3 microops > for modern CPUs (load-op-store). All because we need an up-to-date %EIP > in some very rare circumstances: when taking a snapshot of the stack that > might conceivably get passed to the user, and when delivering signals. > These are relatively rare. I wonder if we could dispense with the eip > and instead associate with each bb a small table of offsets which indicate > how to calculate the simulated %EIP from the value it was set at at the > start of the block and the current distance that the real %eip is inside > the translation now. If you see what I mean. I was fiddling with the redundant-INCEIP-removal code yesterday (which is currently commented out). But this table-of-EIP-offsets sounds like a much better idea. (And because of my interface changes, it should be possible to add an extra field to UCodeBlock without breaking binary compatibility :) > Easy to test the net effect; disable %EIP generation altogether > (one-liner in vg_to_ucode.c). I bet it gives another 10% or so. > I would try it now but I have to rush off and duke it out with ARM > code all day :-) I did this yesterday. For gzip with --skin=none, the time difference was smallish (16.95s --> 16.13s, about 5%) but the code size difference was big (4.6x --> 3.6x, about 22%). For MemCheck and Cachegrind, INCEIP instructions account for about 10% and 12% of code size respectively. I think the space savings are representative for a wide range of programs. So, to summarise: INCEIP removal via a table of offsets shouldn't be too hard (I think?) and would definitely be worth it even if only for the space savings. N |