|
From: Julian S. <js...@ac...> - 2002-11-22 08:24:08
|
Nick,
I have thought of a way to get hold of %eip in helper fns, skin independently
and without slowing down the normal I-don't-want-to-know-%eip case, but it has
a high coefficient of horribleness [although is easy to implement].
Basic idea is that we can probably find the return address for the call to the
helper, lying round on the stack, without outside assistance.
Suppose (as is indeed the case) that %EIP is brought up to date at the
start of each bb. Now the bb calls a helper fn and you want to know
what %eip is at the point of the call.
So we're looking for a word on the stack (the RA) that points back into
the translation. Since we have %EIP for the bb, we can look up in TT
the location and size of the translation. The word we're looking for
must have a value in the range: location .. location + size - 1.
Furthermore, 6 (?) bytes before the word must also be a valid location in the
translation -- the call insn -- and it must contain the opcode for "call".
So you're heavily constrained as to the value of the word being searched
for.
Second part of the trick is that we can probably constrain ourselves to
scanning about a dozen or so stack locations; the RA must be in that area.
Consider what the stack looks like immediately after the doing the call
insn which leads to the helper (stack grows down):
RA for VG_(run_innerloop)
saved ebx, ecx, edx, esi, edi, ebp pushed at start of VG_(run_innerloop)
RA for where VG_(run_innerloop) calls translation
/* stuff below is created by translation and helpers it calls */
RA for call to helper ("MEMEME") (since all args are passed in regs)
/* stuff pushed by stack of helper functions */
/* top-of-stack; %esp points here (or one word above) */
Imagine now we grab the value of %esp at entry to VG_(run_innerloop) and
park it in some global variable. If my picture of the stack is right, we
can find the relevant RA ("MEMEME") by scanning downwards from that value
for 8 or 9 words, looking for a word satisfying the constraints mentioned
above. If we have to go even a few words further, we've probably missed it.
The combination of value constraint and location constraint should make
this fairly good at finding the correct RA. The main danger AFAICS is
that the 6 regs saved at the start of VG_(run_innerloop) might hold
an spuriously matching value. Even that can be avoided by snapshotting
%eip after those are pushed, in which case we're even more strongly
location-constrained: there can be only about 3 or 4 words to search
before declaring that we've missed it somehow.
How does that sound? Does it make sense? Is it utterly horrible?
I'm not sure if something this fragile is really a good idea, but still
it might be worth trying ...
Actually I'd guess it's pretty robust, if the above analysis is correct.
And even if it calculates a wrong %EIP once in 10000 calls, do we care?
[perhaps cachegrind does? I don't know]. The calculated %EIP can be
sanity checked against the %EIP-saved-at-bb-start and the bb's known
original length as extracted from TT. If the %EIP calculated via this
method points outside that range, something's clearly wrong and so we
can ignore it and fall back to the %EIP-saved-at-bb-start.
J
|
|
From: Nicholas N. <nj...@ca...> - 2002-11-22 11:33:21
|
On Fri, 22 Nov 2002, Julian Seward wrote:
> I have thought of a way to get hold of %eip in helper fns, skin independently
> and without slowing down the normal I-don't-want-to-know-%eip case, but it has
> a high coefficient of horribleness [although is easy to implement].
>
> Basic idea is that we can probably find the return address for the call to the
> helper, lying round on the stack, without outside assistance.
>
> Suppose (as is indeed the case) that %EIP is brought up to date at the
> start of each bb. Now the bb calls a helper fn and you want to know
> what %eip is at the point of the call.
>
> So we're looking for a word on the stack (the RA) that points back into
> the translation. Since we have %EIP for the bb, we can look up in TT
> the location and size of the translation. The word we're looking for
> must have a value in the range: location .. location + size - 1.
> Furthermore, 6 (?) bytes before the word must also be a valid location in the
> translation -- the call insn -- and it must contain the opcode for "call".
> [...]
> Actually I'd guess it's pretty robust, if the above analysis is correct.
Wow. I think it's the sort of idea that if I'd thought of it I would
think it was really cool and worth doing, but because someone else thought
of it I take two steps back and think hmm...
----
Ideally, we want these characteristics:
1a. simple idea
1b. simple effect on existing code (eg. affects only one place)
2. entirely skin-independent
3. maximally efficient, ie. only updates EIP when EIP is actually
recorded (I'm assuming we update EIP between BBs, though)
4. no extra space required
Your suggestion, assuming it works effectively, meets 2 and 3. It fails
1a; as for 1b, the finding-%eip complexity would all be in the one spot,
although it still needs the eip-to-EIP table.
----
Before you mentioned this finding-a-needle-in-a-call-stack idea, I was
leaning back towards the update-EIP-only-when-necessary approach.
%EIP only needs to be up-to-date for a small number of UInstrs:
a. CCALL/CALLM, if the called code records EIP
b. PUT %ESP, if stack change/post_mem_write events are tracked and the
called function records EIP
c. Any extended UInstr that records EIP
There are only two ways skin code can record EIP:
1. Call VG_(get_ExeContext)()
2. Call VG_(get_EIP)()
VG_(get_ExeContext)() could be changed so that it calls VG_(get_EIP)(),
which would essentially reduce the problem to a single point of code.
Using the latest ideas:
a. add a 1-bit field `records_eip' to CCALL and CALLM instructions that
a skin must turn on if the called code can record EIP
b. add a bool argument to the functions VG_(track_new_mem_stack)() etc,
which the skin makes `True' if the called code can record EIP
c. add another function that must be implemented by skins using extended
UCode, SK_(UInstr_record_EIP)()
a,b,c in this list correspond directly to a,b,c the list above.
(Actually, c could be handled by a's mechanism too.)
Any UInstr which could cause an EIP-record would have to be preceded with
a PUTEIP instruction (I agree that sounds better than INCEIP). Actually,
we could keep INCEIP instructions (necessary for determining x86
instruction boundaries), generate no code for them, and the codegen phase
could effectively add the PUTEIP code for UInstrs when the relevant
boolean is set without actually needing an explicit PUTEIP instruction.
This satisfies 1a and 4, isn't too bad for 1b, but fails the others. It
also makes the VG_(track_*) functions non-uniform, which is unfortunate.
----
So, I've criticised the flaws in the current proposals, but haven't come
up with anything better. Harumph.
----
A couple of other random thing that occurred to me:
i. If we do use an eip-to-EIP table, it doesn't need to record entries
for every instruction, just those that can cause an EIP-recording
(CCALL/CALLM/PUT ESP/extended UInstrs).
ii. None of this affects Cachegrind, since it records %eip for every x86
instruction in its cost-centre anyway. At one point I took out all
INCEIPs for Cachegrind, but then put them back in because of the
weird-signals-sometimes issue.
N
|
|
From: Nicholas N. <nj...@ca...> - 2002-11-22 12:03:05
|
On Fri, 22 Nov 2002, Julian Seward wrote:
> Second part of the trick is that we can probably constrain ourselves to
> scanning about a dozen or so stack locations; the RA must be in that area.
> Consider what the stack looks like immediately after the doing the call
> insn which leads to the helper (stack grows down):
>
> RA for VG_(run_innerloop)
> saved ebx, ecx, edx, esi, edi, ebp pushed at start of VG_(run_innerloop)
> RA for where VG_(run_innerloop) calls translation
> /* stuff below is created by translation and helpers it calls */
> RA for call to helper ("MEMEME") (since all args are passed in regs)
> /* stuff pushed by stack of helper functions */
> /* top-of-stack; %esp points here (or one word above) */
>
> Imagine now we grab the value of %esp at entry to VG_(run_innerloop) and
> park it in some global variable. If my picture of the stack is right, we
> can find the relevant RA ("MEMEME") by scanning downwards from that value
> for 8 or 9 words, looking for a word satisfying the constraints mentioned
> above. If we have to go even a few words further, we've probably missed it.
>
> The combination of value constraint and location constraint should make
> this fairly good at finding the correct RA. The main danger AFAICS is
> that the 6 regs saved at the start of VG_(run_innerloop) might hold
> an spuriously matching value. Even that can be avoided by snapshotting
> %eip after those are pushed, in which case we're even more strongly
> location-constrained: there can be only about 3 or 4 words to search
> before declaring that we've missed it somehow.
Do you mean "%esp after those are pushed"?
Actually, won't we know exactly which stack location it should be in, ie.
immediately after the RA of VG_(run_innerloop)? Or can extra stuff get
put between the two RAs?
> How does that sound? Does it make sense? Is it utterly horrible?
> I'm not sure if something this fragile is really a good idea, but still
> it might be worth trying ...
Another thing is that we have to subtract 6 (or whatever) from the RA to
get the right %eip before doing the eip-to-EIP lookup, but that's fine.
I'm starting to come around to this idea now...
N
|
|
From: Jeremy F. <je...@go...> - 2002-11-22 17:07:48
|
On Fri, 2002-11-22 at 00:28, Julian Seward wrote:
> I have thought of a way to get hold of %eip in helper fns, skin independently
> and without slowing down the normal I-don't-want-to-know-%eip case, but it has
> a high coefficient of horribleness [although is easy to implement].
>
> Basic idea is that we can probably find the return address for the call to the
> helper, lying round on the stack, without outside assistance.
Um, isn't the return address into the generated code exactly and
precisely at (%esp)? If the helper/skin function wants it, wouldn't
using __builtin_return_address(0) near the top of the helper be a good
enough solution? Is it that you don't want to require helpers/skin
callbacks to do this? What am I missing?
> How does that sound? Does it make sense? Is it utterly horrible?
> I'm not sure if something this fragile is really a good idea, but still
> it might be worth trying ...
It seems a bit over-complex to me. If the problem is that we want
things which need %EIP can get to the up to date value, then I think we
can probably do it very simply while still avoiding the expense of
INCEIP:
1. Have a bit in each UInstr which means "update EIP" (as Nick
suggested)
2. When generating code for a UInstr with that bit set, emit "movl
$EIP, XX(%ebp)" before generating the rest of the code
Done. No rummaging, no tables.
It seems to me that the overhead of INCEIP is most noticeable with
simple skins which don't do much in the way of per-instruction work. As
soon as the skin starts doing reasonably complex things, then the
overhead of INCEIP (or its replacement) becomes less of an issue.
Therefore, I don't think its all that important to try and absolutely
squash any EIP manipulation in generated code, because it doesn't matter
that much.
The trouble with a search-around-for-EIP scheme is that it works OK for
the cases where there are very few calls out to the skin, but if you
don't have many skin calls, then you don't need to update the EIP that
much anyway. If you have lots of skin calls which need EIP, then doing
lots of searches is expensive, whereas the generated update is still
cheap.
I suppose the common case for something like memcheck is that there are
lots of skin helper calls, but only those which are actually reporting
errors care about EIP. My suspicion is that the work that memcheck does
anyway would overwhelm the effect of an extra EIP store per call.
So, I guess I'm veering away from mapping tables and complexity again.
BTW, even if we kill INCEIP, we still need to have a UInstr which marks
the boundaries between x86 instructions, even if it is functionally a
NOP.
J
|
|
From: Nicholas N. <nj...@ca...> - 2002-11-22 17:37:23
|
On 22 Nov 2002, Jeremy Fitzhardinge wrote: > If the helper/skin function wants it, wouldn't using > __builtin_return_address(0) near the top of the helper be a good enough > solution? Is it that you don't want to require helpers/skin callbacks > to do this? What am I missing? That should work so long as you do it in the skin function called immediately from generated code. Currently, for example, MemCheck calls an assembly routine which calls a C function which causes a chain of calls before VG_(get_ExeContext)() is called. If it was done with __builtin_return_address(0), it would have to change so a C function was called (not so hard I think) and then the %eip would have to be passed through the calls, which is a bit of a pain... I think that if we go down the eip-to-EIP route, the needle-in-a-call-stack approach is worth it so that it's all completely transparent to skins. If the skin has to remember to call __builtin_return_address before recording EIP that's a bit nasty. > So, I guess I'm veering away from mapping tables and complexity again. Hmm... yeah, maybe. I'm completely unsure now. N |
|
From: Jeremy F. <je...@go...> - 2002-11-22 18:05:22
|
On Fri, 2002-11-22 at 09:37, Nicholas Nethercote wrote:
> That should work so long as you do it in the skin function called
> immediately from generated code. Currently, for example, MemCheck calls
> an assembly routine which calls a C function which causes a chain of calls
> before VG_(get_ExeContext)() is called.
Well, if there's a helper assembly routine, it would be trivial for it
to:
movl (%esp), %reg
movl %reg, NN(%ebp)
so it can be easily found by whoever needs it (and if we ran all helpers
through a wrapper like this, there wouldn't be a problem).
> I think that if we go down the eip-to-EIP route, the
> needle-in-a-call-stack approach is worth it so that it's all completely
> transparent to skins. If the skin has to remember to call
> __builtin_return_address before recording EIP that's a bit nasty.
Well, it isn't too bad, since there'll be this argument you need to pass
a value to, and it has to come from somewhere. But if you consider how
much work it is to avoid a single "movl $EIP, VGOFF_(m_eip)(%ebp)", it
doesn't seem like such a good idea.
J
|