|
From: Jeremy F. <je...@go...> - 2005-01-19 21:33:34
|
I've been thinking about how to restore the pthreads functionality which
was lost as a result of the recent threading changes.
It seems to me that the only feasible approach is to wrap the standard
libpthread functions to generate a stream of events, and use that to
maintain an abstract model of the state of the threads, locks, etc.
The downside of this parallel model-keeping is that if it gets out of
sync with the real state of the threads library, it will start reporting
bogus errors (or missing real errors). I think, however, that it is a
vast improvement over the outright functional bugs (and maintenance
problems) which vg_libpthread had. And certainly better than not
reporting anything as we do now.
General function wrapping would be useful in other places too. For
example, we could wrap libc malloc rather than implementing our own. Or
we could provide a facility for clients to install their own wrappers.
So, how to wrap functions? Function wrapping basically requires
intercepting a pair of edges in the program's control flow graph, and
breaking each of them in two:
Normal: Wrapped:
------ R------ ------ R-----
| ^ ==> | ^
V | V |
S--------- B- A-
v ^
S----------
Key: S - subroutine
R - return address
B - before wrapper
A - after wrapper
I think the basic requirements are:
* the "before" function has access to all of the subroutine's
arguments
* the "after" function has access to the return value
* some state is passed between "before" and "after" so that
matching operations can be performed
* the mechanism can cope with wrapping any function with
call/return semantics and a single entrypoint
* it can cope with varargs
* it can cope with unknown numbers of parameters
* it can cope with recursion
* it can cope with multithreading
* wrapped functions can call other wrapped functions
Another wart is that functions can finish without returning to their
caller if they use longjmp/exceptions.
(Note that the existing mechanism interception is much simpler than
this, since it just redirects one edge of the CFG, and doesn't have to
worry about returns at all. There isn't much overlap in functionality.)
So, how to implement this?
An obvious way is how you'd do it in C:
int wrap_foo(int a, int b, struct bar c)
{
int ret;
void *cookie;
cookie = before_foo(a, b, c);
ret = foo(a, b, c);
after_foo(cookie, ret);
return ret;
}
The trouble with this is that it requires knowing in advance how many
arguments the function has, and then copying them for the calls to
before_foo() and foo(). It doesn't work for varargs functions unless
you can work out how many args there are (by parsing the printf format
string, for example).
So that's out.
[ From here on, I'm handwaving and thinking out loud. ]
We could take advantage of the codegen. If we're generating code for
the first basic block of a wrapped function, we could generate in the
preamble:
call wrap_before_func
wrap_before_func would then be able to inspect %ESP and get both the
args and the return address. The value of TID+ESP+RETADDR will give us
a unique cookie key to match the call to the return.
Using this, the wrap_before_func can install a hook at the beginning of
the basic block at RETADDR (point 'R' in the diagram above), which does:
call wrap_after_func
wrap_after_func gets to see the return value in %EAX, and can use TID
+ESP+EIP to generate the key to find the cookie value generated by
wrap_before_func; once used, the cookie is deleted so that the "after"
wrapper is only called once (consider the case of where the return BB
address is also the head of a loop).
Inserting the call to wrap_after_func at R is very easy; it doesn't even
require regenerating the BB. Currently, the first 16 bytes of each BB
is a preamble which is solely concerned with decrementing and testing
VG_(dispatch_ctr); we can easily do this in wrap_after_func, so we can
just patch over the preamble with the call to wrap_after_func (and nop
out the rest).
Another subtle point is what if a particular basic block is both the
start of a wrapped function and the target of a wrapped function return.
It shouldn't happen in normal code, but it could happen. This is easily
dealt with; the resulting preamble would look like:
call wrap_after_func
call wrap_before_func
rest of BB...
OK, so that's normal call-return: how to deal with longjmp/exceptions?
Well, we could just ignore it. If you call a wrapped function, and it
longjmps back, it means that the "before" function is called but not the
"after", and the cookie store fills up with junk. That's not optimal.
One thing to note is that everything below %ESP is, by definition,
undefined, and so if %ESP for a particular TID moves above the TID+ESP
encoded in a cookie, that cookie becomes invalid, (or, effectively,
returned). We can call an wrap_after_func variant to indicate that a
function returned with longjmp/exception rather than normally. This
runs into the old problem posed by user-space threading libraries, since
we would need to be able to distinguish between a switching stacks and a
normal return/longjmp.
If we don't explicitly track every ESP change, we can still periodically
sweep through the cookie list and mop up anything which has become
stale.
You know, that all looks pretty sound to me. Somewhat complex, but not
deeply intrusive. It would need:
1. Machinery for registering wrappers - it can probably make use of
the existing intercept machinery.
2. Generate the call to wrap_before_func. vg_from_ucode would do
this as part of generating the BB preamble; it will know that a
function needs to be wrapped at codegen time (obviously you need
to declare a function is to be wrapped before its first called,
though you could invalidate the TC).
3. Generate the call to wrap_after_func, just by overwriting the
standard preamble.
4. Implement a cookie list: just a skiplist. To implement
longjmp/exceptions, it needs to be searchable with a partial
key.
5. Implement wrap_before/after_func - they'll be called from
generated code, and will have non-standard calling convention,
so they'll probably be in assembler. But they would call C code
to do all the real work.
6. Hook into ESP tracking to detect longjmps (this is potentially
very expensive, so maybe it should be an option).
7. Housekeeping to mop up stale cookies, either because we're not
doing ESP tracking or because a thread exits.
Comments? What have I forgotten?
J
|
|
From: Julian S. <js...@ac...> - 2005-01-20 00:00:19
|
> I think, however, that it is a > vast improvement over the outright functional bugs (and maintenance > problems) which vg_libpthread had. And certainly better than not > reporting anything as we do now. I agree. We should make this work if we can. > We could take advantage of the codegen. If we're generating code for > the first basic block of a wrapped function, we could generate in the > preamble: > call wrap_before_func > wrap_before_func would then be able to inspect %ESP and get both the > args and the return address. The value of TID+ESP+RETADDR will give us > a unique cookie key to match the call to the return. Who writes wrap_before_func? That has to understand the baseblock layout and also the calling conventions to extract esp and retaddr, and so is going to be machine specific. > Inserting the call to wrap_after_func at R is very easy; it doesn't even > require regenerating the BB. Currently, the first 16 bytes of each BB > is a preamble which is solely concerned with decrementing and testing > VG_(dispatch_ctr); we can easily do this in wrap_after_func, so we can > just patch over the preamble with the call to wrap_after_func (and nop > out the rest). That will change drastically .. the new JIT (1) translates multiple BBs at a time, and (2) actually doesn't do translation chaining as I could not think of a clean way to do this portably. ----------- The proposal leaves me with a nasty feeling that it will introduce all sorts of complex inter-component dependencies and generally be a maintenance and portability problem later. ----------- I would prefer a solution which didn't involve so much magic in the JIT. Why do we need general function wrapping? Currently all we care about is intercepting libpthread calls. I would prefer to write, in C, a libpthread stub library, and use the existing intercept mechanism to route all calls there. The stub library emits events -- using the client request mechanism -- to those who want to know, and calls onwards to the real pthread functions (my hands wave here). No need to mess with calling conventions, guest state layout or magic run-time code modification. ------------ The cookie idea seems like the kernel of something useful -- that is, a clean statement of the semantics of function wrapping in the presence of recursion, threads, and functions which don't necessarily return. J |
|
From: Tom H. <th...@cy...> - 2005-01-20 00:10:28
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> Why do we need general function wrapping? Currently all we care about
> is intercepting libpthread calls. I would prefer to write, in C, a
> libpthread stub library, and use the existing intercept mechanism to
> route all calls there. The stub library emits events -- using the
> client request mechanism -- to those who want to know, and calls onwards
> to the real pthread functions (my hands wave here). No need to mess with
> calling conventions, guest state layout or magic run-time code modification.
That's more like how I had envisaged function wrapping working. Use
the existing intercept machinery to redirect the original function
call, somehow passing the original function address as we do so.
The wrapper would then call the real function, ensuring that this
time the address didn't get redirected during translation. It would
then get control again when the real function returned. The only
problem then is the longjmp/exception case.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Julian S. <js...@ac...> - 2005-01-20 00:44:55
|
> That's more like how I had envisaged function wrapping working. Use > the existing intercept machinery to redirect the original function > call, somehow passing the original function address as we do so. > > The wrapper would then call the real function, ensuring that this > time the address didn't get redirected during translation. It would > then get control again when the real function returned. Exactly. This is the point I arrived at. The only problem -- and one I cannot immediately see a clean solution for -- is how to know what the real (non-redirected) function address is. > The only problem then is the longjmp/exception case. Do we even need to handle this case, for libpthread? For that matter, can we also ignore recursion? J |
|
From: Jeremy F. <je...@go...> - 2005-01-20 01:10:13
|
On Thu, 2005-01-20 at 00:44 +0000, Julian Seward wrote: > > The only problem then is the longjmp/exception case. > > Do we even need to handle this case, for libpthread? For that matter, > can we also ignore recursion? We need to deal with taking a signal while blocked in a pthread function; if the signal handler longjmps, it's as if the pthread function did. Hm, and pthread_cancel ends up invoking gcc's exception unwinding machinery, so it effectively looks like a C++ exception. Don't know about recursion, but I think we've been burned enough to not rule it out. Or pthreads functions calling each other. J |
|
From: Jeremy F. <je...@go...> - 2005-01-20 01:04:17
|
On Thu, 2005-01-20 at 00:00 +0000, Julian Seward wrote: > Who writes wrap_before_func? That has to understand the baseblock layout > and also the calling conventions to extract esp and retaddr, and so is going > to be machine specific. It's part of the core. It isn't a per-wrapper piece of code, it's a helper (ie, called something like VG_(wrapper_before_helper), and would be a pretty small piece of assembler). The actual wrapper functions are ordinary-looking pieces of C. > That will change drastically .. the new JIT (1) translates multiple BBs > at a time, Well, the BB's we're talking about here are 1) the first BB of a function and 2) the BB at the return address. Under normal circumstances, they're not going to get coalesced with other BBs anyway, I would have thought. But we're going to need a mechanism to inhibit BBs from being coalesced anyway, I think (for debugger support). > and (2) actually doesn't do translation chaining as I could > not think of a clean way to do this portably. You're hoping that coalesced BBs will make up the performance difference? What's the difficulty? It isn't something which could be implemented per-target? > The proposal leaves me with a nasty feeling that it will introduce all > sorts of complex inter-component dependencies and generally be a > maintenance and portability problem later. I actually think its pretty clean that way. By keeping it all on the real CPU rather than in virtual space, we avoid falling into a bunch of ratholes we just escaped from. > I would prefer a solution which didn't involve so much magic in the JIT. Well, there's a little bit of magic in calling a helper for the before wrapper, which doesn't really count. The patching-in of the exit wrapper is a bit tricky, but in the worst case we can always generate a space to patch into. Or regenerate the BBs. > Why do we need general function wrapping? Currently all we care about > is intercepting libpthread calls. Well, that's the immediate concern. But I think there's a lot of other things we could do with wrappers. For example, I think we should consider wrapping client mallocs rather than replacing them outright. We already have the problem that ld.so and glibc each have their own copies of malloc() and friends, and assume that they can operate of each other's pointers. I think we're OK in that case, but its just one instance where being functionally correct requires 100% coverage; with wrapping, we could miss a few cases, and it wouldn't be the end of the world. And Tools like massif don't need a special malloc at all; it only cares about observing the mallocs a program does, with no further checks. > I would prefer to write, in C, a > libpthread stub library, and use the existing intercept mechanism to > route all calls there. The stub library emits events -- using the > client request mechanism -- to those who want to know, and calls onwards > to the real pthread functions (my hands wave here). Right. I thought about that a lot, and it basically comes down to being able to distinguish between an "outside" call to a wrapped function, which needs to be directed to the wrapper, and an "inside" call, which is from the wrapper to the real function, and making that work if the wrapped function is recursive. I can think of a bunch of hacks (look at the callsite, and see if its within the wrapper), but it just seems cleaner to me to keep all this out of the virtual space. And I'm feeling a bit allergic to stub libraries and so on. We're still depending on LD_PRELOAD/LD_LIBRARY path tricks to get that code into the client space, and I'd like to minimize, or even eliminate, that. J |
|
From: Josef W. <Jos...@gm...> - 2005-01-21 09:28:28
|
On Wednesday 19 January 2005 22:29, Jeremy Fitzhardinge wrote: > [...] > OK, so that's normal call-return: how to deal with longjmp/exceptions? > Well, we could just ignore it. If you call a wrapped function, and it > longjmps back, it means that the "before" function is called but not the > "after", and the cookie store fills up with junk. That's not optimal. With longjmp/exceptions, 2 cookies could be created with the same TID+ESP+RETADDR (same call chain after a exception). Is this a problem? > 3. Generate the call to wrap_after_func, just by overwriting the > standard preamble. What happens if the TC is flushed before the function returns? A retranslation would have to know about changing the preamble. > Comments? What have I forgotten? The calling convention is platform specific. Would it be detectable via configure tests? General function wrapping is expensive. In my tool, I manage my own call stack for this, and insert a callback at start of every BB which syncs/unwinds the call stack according to %ESP. This doesn't need any temporary modification of instrumented code, and works for longjump/execptions in a natural way. As this is slow, it's not useful for general usage. Besides, I don't look at function arguments. But it would be really nice for tools to request for such a general wrapping mechanism; and the call stack clone should be able to always provide a good back trace. Josef > > J > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Jeremy F. <je...@go...> - 2005-01-22 17:17:18
|
On Fri, 2005-01-21 at 02:00 +0100, Josef Weidendorfer wrote: > On Wednesday 19 January 2005 22:29, Jeremy Fitzhardinge wrote: > > [...] > > OK, so that's normal call-return: how to deal with longjmp/exceptions? > > Well, we could just ignore it. If you call a wrapped function, and it > > longjmps back, it means that the "before" function is called but not the > > "after", and the cookie store fills up with junk. That's not optimal. > > With longjmp/exceptions, 2 cookies could be created with the same > TID+ESP+RETADDR (same call chain after a exception). Is this a problem? Well, after the longjmp, the first instance has been invalidated after ESP changes, so should "disappear". If there is a second occurance, the first should just be replaced. > > 3. Generate the call to wrap_after_func, just by overwriting the > > standard preamble. > > What happens if the TC is flushed before the function returns? A retranslation > would have to know about changing the preamble. That would be the common case - the first time a BB after a call is generally run is when that call returns. So there would need to be a structure to let the codegen know to generate the call to wrap_after_func in this case. > The calling convention is platform specific. Would it be detectable via > configure tests? Well, yes, it is platform-specific, like all the other CPU-specific stuff. So I don't think this is a special case. > General function wrapping is expensive. In my tool, I manage my own call stack > for this, and insert a callback at start of every BB which syncs/unwinds the > call stack according to %ESP. This doesn't need any temporary modification of > instrumented code, and works for longjump/execptions in a natural way. Wrapping every function is going to be expensive, yes. My design goal was for a mechanism which is efficient when applied to a relatively small number of functions. J |