|
From: Jeremy F. <je...@go...> - 2002-11-22 02:41:14
|
On Thu, 2002-11-21 at 16:08, Julian Seward wrote:
> 1. INCEIPs
> ~~~~~~~~~~~
> The best way to handle the INCEIP expense is replace it all with a
> table lookup. As Jeremy suggests, it's probably simpler to have one
> big table than one table per bb, and I'm probably inclined in favour
> of that.
Eh? No I think one table per BB is easiest to handle, particularly if
we have EIP at least up to date to the current basic block.
> The problem appears to be how to get hold of %eip (the real one) when
> a skin calls a helper function, in a way which is not skin-specific.
> I thought about this this evening for some considerable time, but I
> can't think of a clean solution which doesn't involve some amount of
> performance loss. The least-worst thing I thought of was to associate
> with CCALL uinstrs a boolean flag, which when set automagically causes
> the current %eip to get passed to the called function as an extra
> parameter. This at least makes it skin-independent.
But we have a VG_(get_EIP)() function. Why can't skins just use that,
and we hide all the complexity in that function? Why does a skin need
to know about the hardware %eip?
> Unfortunately, it adds extra expense for almost all of the calls made
> by cachegrind, memcheck and addrcheck, since most of those can potentially
> report an error and so need %eip. This is not good. So I don't like
> this idea.
Memcheck only needs %EIP when actually reporting an error though.
> 2. eflags save/restores
> ~~~~~~~~~~~~~~~~~~~~~~~~
> The second thing Nick has unearthed is that the lazy eflag optimisation
> will often not apply, due to intervening flag-mangling operations.
> [Nick: yes, your understanding of how this is supposed to work appears
> to be correct]. This is bad; but I'd like to do better, because we're
> really getting clobbered by these pushf/popf insns.
I had an idea about restoring eflags without pushf/popf: just rerun the
setting instruction. Since (almost?) all the instructions which
generate eflags are otherwise side-effect free and don't depend on
memory, why not just rerun them on the same values just before the
jump? (Or perhaps move it altogether, but that might be tricker than
just copying the instruction.)
I haven't looked at this in detail, but it's an amusing idea...
Hm, I guess making sure the flags are correct for the end of the BB
still needs them to be explicitly saved. Which I suppose we could do in
the form of saving a piece of code which has the right effect, but
that's probably going a bit far...
> If we can't cleanly solve problem (1) [getting hold of %eip], perhaps we
> should back off from the %EIP-by-tables story; in effect fall back to
> Variant (1) in this mornings proposal. That means retaining INCEIP, at
> least when we can't get rid of redundant updates.
I still don't think its hard to efficiently implement an EIP table
lookup.
> How about if we had instead PUTEIP, which simply sets the value of %EIP
> to some literal value? So, we just emit PUTEIPs with absolute addresses
> where currently we create INCEIPs with deltas. PUTEIP then becomes
> something like[...]
I do this in the t-chaining patch. Just before each jump, it generates:
movl $target-addr, %eax
movl %eax, 36(%ebp)
It means that if the chaining patch fails, it can still just ret into
the dispatch loop, and %eax is set up properly, as well as having %EIP
up to date in case there's a context switch.
I currently generate this inline, but I was thinking of adding a SETEIP
with these semantics. This would also be useful to distinguish basic
blocks in an extended basic block when doing trace caching.
> [S2: suggestion of smarter code for Jz et al]
>
> Another thing which would surely help is to generate smarter code for
> the Jz / Jnz UINSTRS. Observation to be made here is that at least for
> the Zero flag, we might as well directly test the bit in %EFLAGS in
> memory [...]
> Some complicated tests (not-below, etc) would still have to be done the
> old way, but we could cover the tests associated with single flags
> (zero/nonzero, sign/nonsign, carry/nocarry) and I think that would
> catch most _uses_ of the condition codes.
That's nice and simple to implement. It's a good start. Has anyone
looked to see what the proportions of the different comparisons are? I
would expect Z and NZ would be way up there...
Hm, looking at the definitions of the tests in the manual, it seems that
it would be pretty easy to generate open-coded equivalents for most of
them in much less than the 12 cycles to do popf...
> Even with the PUTEIP hack, I can see that, as Nick says, the instrumentation
> created by memcheck and cachegrind will often get in the way of the lazy
> eflags stuff. Now I'm wondering if it is possible for the skins (memcheck
> specifically) to generatee instrumentation a bit more cleverly so as to try
> and keep condition code generators and users together more often. Dunno if
> this will help.
This is along the lines of my suggestion above of duplicating the
flags-setting instruction to just before the test. It's a little bit
like Shade's use of carefully chosen instructions to set the flags to
the desired value.
J
|