|
From: Julian S. <js...@ac...> - 2002-11-22 08:24:08
|
Nick,
I have thought of a way to get hold of %eip in helper fns, skin independently
and without slowing down the normal I-don't-want-to-know-%eip case, but it has
a high coefficient of horribleness [although is easy to implement].
Basic idea is that we can probably find the return address for the call to the
helper, lying round on the stack, without outside assistance.
Suppose (as is indeed the case) that %EIP is brought up to date at the
start of each bb. Now the bb calls a helper fn and you want to know
what %eip is at the point of the call.
So we're looking for a word on the stack (the RA) that points back into
the translation. Since we have %EIP for the bb, we can look up in TT
the location and size of the translation. The word we're looking for
must have a value in the range: location .. location + size - 1.
Furthermore, 6 (?) bytes before the word must also be a valid location in the
translation -- the call insn -- and it must contain the opcode for "call".
So you're heavily constrained as to the value of the word being searched
for.
Second part of the trick is that we can probably constrain ourselves to
scanning about a dozen or so stack locations; the RA must be in that area.
Consider what the stack looks like immediately after the doing the call
insn which leads to the helper (stack grows down):
RA for VG_(run_innerloop)
saved ebx, ecx, edx, esi, edi, ebp pushed at start of VG_(run_innerloop)
RA for where VG_(run_innerloop) calls translation
/* stuff below is created by translation and helpers it calls */
RA for call to helper ("MEMEME") (since all args are passed in regs)
/* stuff pushed by stack of helper functions */
/* top-of-stack; %esp points here (or one word above) */
Imagine now we grab the value of %esp at entry to VG_(run_innerloop) and
park it in some global variable. If my picture of the stack is right, we
can find the relevant RA ("MEMEME") by scanning downwards from that value
for 8 or 9 words, looking for a word satisfying the constraints mentioned
above. If we have to go even a few words further, we've probably missed it.
The combination of value constraint and location constraint should make
this fairly good at finding the correct RA. The main danger AFAICS is
that the 6 regs saved at the start of VG_(run_innerloop) might hold
an spuriously matching value. Even that can be avoided by snapshotting
%eip after those are pushed, in which case we're even more strongly
location-constrained: there can be only about 3 or 4 words to search
before declaring that we've missed it somehow.
How does that sound? Does it make sense? Is it utterly horrible?
I'm not sure if something this fragile is really a good idea, but still
it might be worth trying ...
Actually I'd guess it's pretty robust, if the above analysis is correct.
And even if it calculates a wrong %EIP once in 10000 calls, do we care?
[perhaps cachegrind does? I don't know]. The calculated %EIP can be
sanity checked against the %EIP-saved-at-bb-start and the bb's known
original length as extracted from TT. If the %EIP calculated via this
method points outside that range, something's clearly wrong and so we
can ignore it and fall back to the %EIP-saved-at-bb-start.
J
|