|
From: Julian S. <js...@ac...> - 2006-01-17 13:46:23
|
As of a couple of days ago, the svn trunk contains support for
wrapping arbitrary functions. This works on x86, amd64 and
ppc32; it does not yet work reliably on ppc64.
This opens the door to reinstating functionality that depends on
being able to intercept the pthread_ functions - that is, Helgrind,
and the simple pthread checks that Valgrind used to do prior to
version 2.4. If anybody wants to look into these, particularly
into bringing Helgrind back to life, or just play around with
function wrapping, now is a good time to start.
At the bottom of coregrind/vg_preloaded.c are a few #if 0'd
wrappers for pthread functions. Uncomment them and rebuild, and
you'll see entries/exits to those functions as they happen.
There are also some test cases in memcheck/tests/wrap[1-6]*.c
worth looking at.
The rest of this message is a bit of technical background.
Making function wrapping work reliably and portably in the presence
of dynamic linking has proven remarkably difficult. It sounds like
a simple problem, but it isn't, and it seems to contain some
unavoidable amount of uglyness, which moves from place to place
depending on the scheme used, but which cannot be avoided entirely.
There are two parts to the problem:
(1) specifying which functions should be wrapped by which other functions
(2) getting the control flow stuff to work right
-----
(1) Specifying wraps (and redirects)
The current implementation extends the previous redirection scheme.
A wrapper function has a specially encoded name, which specifies an
arbitrary set of functions to be wrapped. Those functions are
characterised by their name and the soname of the containing ELF
shared object, and both of those may contain '*' as a wildcard.
So for example you can write a single wrapper for any function
whose name matches "pthread_create@*" in any shared object whose
soname matches "libpthread.so.*".
This generality is necessary in order to write pthread replacements
reliably, since there are many minor variants in names (eg,
pthread_create@GLIBC_2.0 vs pthread_create@@GLIBC_2.1) and we want to
write just one wrapper which covers all such variants.
The symbol table reading machinery notices when wrapper functions are
loaded, and causes wraps (and replacements) to become active as soon as
possible. When code is unloaded, any wraps/replacements which then
are impossible are cancelled. You can see what's going on with
--trace-redir=yes. The wrapping manager makes explicit a concept
which had previously been implicit: the distinction between redirect/wrap
specifications and the currently active wraps/specifications.
For example, a wrapper claiming to be the wrapper for function "foo" in
"libobscure.so" might be loaded. That remains as a specification until
such time as libobscure.so is also loaded, at which point the binding
becomes active. If libobscure.so is later unmapped, the active binding
is cancelled, but the specification remains present, just in case
libobscure.so is re-mapped again even later. Also, if the .so containing
the wrapper is unmapped, then all active wrappings involving it
are cancelled. In short, all code mmaps/munmaps are monitored, and the
mechanism binds wrappers to wrappees whenever it can.
(2) getting the control flow stuff to work right
The tricky part of function wrapping is what to do when you want to call
the "real" function. For example, a wrapper for pthread_create really
needs to call the real pthread_create at some point. The problem is that
this call, unlike all others to pthread_create, must not merely be diverted
to the wrapper; instead we want to go to the real thing.
The solution is as follows. Valgrind pretends that the CPU it is simulating
has an extra register, the non-redirected-address register ("%NRADDR").
Whenever a wrapper is entered, Valgrind writes the address of the
corresponding original function (the REAL pthread_create) to this
register.
valgrind.h supplies a C macro to get hold of this address, and that's the
first thing the wrapper should do. Then it can do whatever it likes, but
at some point it has to make a call to that address. The problem now is
that simply doing a normal call would wind up back in the wrapper (that's
how we got there in the first place).
This is where the unavoidable uglyness appears. valgrind.h therefore
supplies a bunch of ultra-magical macros which make it possible to do
function calls which disable redirection, and these must be used to call
the original. Hence a simple wrapper for pthread_mutex_lock will look
like this:
// return type
int
// name of wrapper, specifying what this is a wrapper for
I_WRAP_SONAME_FNNAME_ZU(libpthreadZdsoZd0,pthread_mutex_lock)
// formals, same as the original, of course
( pthread_mutex_t *mutex )
{
// read %NRADDR before it gets trashed
void* orig;
VALGRIND_GET_ORIG_FN(orig);
// do stuff before
fprintf(stderr, "<< pthread_mxlock %p", mutex);
// call the original
// "ret = orig(mutex), but skip the redirect this time"
CALL_FN_W_W(ret, orig, mutex);
// do stuff after
fprintf(stderr, " -> %d >>\n", ret);
return ret;
}
All the magic macros I_REPLACE_SONAME_FNNAME_ZU, VALGRIND_GET_ORIG_FN
and CALL_FN_W_W are supplied in valgrind.h.
The CALL_FN_W_W macros and its friends construct simple function calls.
This is the really nasty bit. Really we'd like to just write an ordinary
function call there, but then gcc would emit a normal call instruction
and it would be redirected, leading to an infinite loop (+ stack
overflow). The CALL_ macros therefore create calls using a special
call-no-redirect instruction which the real CPU doesn't understand
but Valgrind does.
J
|
|
From: Tom H. <to...@co...> - 2006-01-17 14:08:15
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> This opens the door to reinstating functionality that depends on
> being able to intercept the pthread_ functions - that is, Helgrind,
> and the simple pthread checks that Valgrind used to do prior to
> version 2.4. If anybody wants to look into these, particularly
> into bringing Helgrind back to life, or just play around with
> function wrapping, now is a good time to start.
One issue here is that the new function wrapping system runs the
wrapper routine on the virtual CPU so the thread wrappers will have
to use client requests to keep the thread model updated.
So the existing m_pthreadmodel.c will need quite a bit of work as
it currently has wrappers that expect to run on the real CPU and
have direct access to valgrind internals.
> This generality is necessary in order to write pthread replacements
> reliably, since there are many minor variants in names (eg,
> pthread_create@GLIBC_2.0 vs pthread_create@@GLIBC_2.1) and we want to
> write just one wrapper which covers all such variants.
You do need to be careful when doing that though, as the whole point
of the versioning of the symbols is to cope with API/ABI changes so
if the version has changed the arguments may have changed in some way
and you may need separate wrappers in order to handle that.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2006-01-17 14:48:47
|
> One issue here is that the new function wrapping system runs the > wrapper routine on the virtual CPU so the thread wrappers will have > to use client requests to keep the thread model updated. Yes. Looks like the VALGRIND_NON_SIMD_CALL* macros might come in handy for this kind of thing. J |
|
From: Tom H. <to...@co...> - 2006-01-17 16:01:11
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
>> One issue here is that the new function wrapping system runs the
>> wrapper routine on the virtual CPU so the thread wrappers will have
>> to use client requests to keep the thread model updated.
>
> Yes. Looks like the VALGRIND_NON_SIMD_CALL* macros might come
> in handy for this kind of thing.
Maybe, but how does the code in the preload library get the address
of the function in valgrind to call?
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2006-01-17 16:11:59
|
On Tuesday 17 January 2006 16:00, Tom Hughes wrote: > In message <200...@ac...> > > Julian Seward <js...@ac...> wrote: > >> One issue here is that the new function wrapping system runs the > >> wrapper routine on the virtual CPU so the thread wrappers will have > >> to use client requests to keep the thread model updated. > > > > Yes. Looks like the VALGRIND_NON_SIMD_CALL* macros might come > > in handy for this kind of thing. > > Maybe, but how does the code in the preload library get the address > of the function in valgrind to call? Ah. Yes. So .. coregrind/m_replacemalloc has exactly the same problem. The solution there is (see the bottom of vg_replace_malloc.c) that the first time through, do a special client request which gets a whole bundle of function pointers(GET_MALLOCFUNCS). After that, you can just pick the right pointer out of the bundle and use that. How does that sound? J |
|
From: Oswald B. <os...@kd...> - 2006-01-17 19:41:54
|
On Tue, Jan 17, 2006 at 01:46:02PM +0000, Julian Seward wrote: > As of a couple of days ago, the svn trunk contains support for > wrapping arbitrary functions. > i have a question. i know it was discussed, so maybe i just missed the answer. the wrapped function is actually called from within the wrapper - that means it will see an additional stack frame. i imagine this might pose a problem for some conceivable functions. additionally it might turn out to be hard or at least work intensive to replicate the args of varargs functions. iirc, it was discussed to instrument the prologues/epilogues of wrapped functions. what has become of this idea? if it was implemented that way, this extra-elaborate wrapper loading and naming scheme could go away; wraps would be just specified in the tool that needs them. -- Hi! I'm a .signature virus! Copy me into your ~/.signature, please! -- Chaos, panic, and disorder - my work here is done. |
|
From: Julian S. <js...@ac...> - 2006-01-18 05:24:41
|
> the wrapped function is actually called from within the wrapper - that > means it will see an additional stack frame. i imagine this might pose a > problem for some conceivable functions. Well, yes, and that extra frame is visible in stack traces. Not sure what problems this will cause though. > additionally it might turn out > to be hard or at least work intensive to replicate the args of varargs > functions. This is true. The wrapping is not as general or clean as would be ideal - that is part of the tradeoff needed for a relatively simple and robust implementation. For wrapping pthreads we don't need varargs, and I prefer to avoid complexity until it's demonstrated to be necessary. > iirc, it was discussed to instrument the prologues/epilogues of wrapped > functions. what has become of this idea? The just-implemented scheme is similar in some ways, but it's simpler and more portable at the expense of being less flexible. The main difference is that here you have to write by hand a wrapper function. That naturally deals with all issues of passing state from the pre-actions to the post-actions since both are in the same function. Recursion and I think longjmping still work (perhaps not on ppc64). > if it was implemented that way, this extra-elaborate wrapper loading and > naming scheme could go away; No .. the wrapper naming scheme (involving wildcards for both fn names and so names), and the specification vs active concepts, are independent of how the low level control flow stuff is done. J |
|
From: Oswald B. <os...@kd...> - 2006-01-18 08:45:27
|
On Wed, Jan 18, 2006 at 05:24:34AM +0000, Julian Seward wrote: > > the wrapped function is actually called from within the wrapper - > > that means it will see an additional stack frame. i imagine this > > might pose a problem for some conceivable functions. > > Well, yes, and that extra frame is visible in stack traces. > Not sure what problems this will cause though. > i've seen such a function only once so far, and this was function 0x26 of int 0x21 in DOS. i know this, because turbo debugger's syscall wrapping screwed it up. i'm having a hard time coming up with a function where it really might be a problem, particularly which might be a candidate for wrapping. so it's just a vague idea, as i tried to express. > > if it was implemented that way, this extra-elaborate wrapper loading and > > naming scheme could go away; > > No .. the wrapper naming scheme (involving wildcards for both fn names > and so names), and the specification vs active concepts, are independent of > how the low level control flow stuff is done. > i don't understand. having to track specifications is a direct consequence of having them loadable in the client code. if they were static parts of the tools, it would be just a static table per tool. of course, the weird naming scheme is not really imposed by the loading mechanism, as it could simply look for a specification table with a magic symbol. it just happens to be an "interesting" way of keeping related things close to each other. -- Hi! I'm a .signature virus! Copy me into your ~/.signature, please! -- Chaos, panic, and disorder - my work here is done. |
|
From: Josef W. <Jos...@gm...> - 2006-01-18 15:19:58
|
On Wednesday 18 January 2006 06:24, Julian Seward wrote: > to the post-actions since both are in the same function. Recursion and > I think longjmping still work (perhaps not on ppc64). Longjumping will skip the frame of the wrapper function, too. So post-actions are not called. I am not sure this is a problem in practice. Josef |
|
From: Julian S. <js...@ac...> - 2006-01-21 02:47:18
|
On Wednesday 18 January 2006 15:19, Josef Weidendorfer wrote:
> On Wednesday 18 January 2006 06:24, Julian Seward wrote:
> > to the post-actions since both are in the same function. Recursion and
> > I think longjmping still work (perhaps not on ppc64).
>
> Longjumping will skip the frame of the wrapper function, too.
> So post-actions are not called.
>
> I am not sure this is a problem in practice.
For x86/amd64/ppc32, function wrapping is simple, conceptually:
when a wrapped/redirected function is called, V runs instead the
replacement function, and at that instant also it writes into a
pseudo-register (guest_NRADDR) the address of the un-redirected
function ("NRADDR" == Non-Redirected Address). The wrapper
function can get hold of this address and call it to get to the
original.
So function wrapping only requires magic at entry to the wrapper,
not at exit. This means longjumping and recursion work right.
On ppc64-linux it is not so simple. For each function, %r2 must
point to a constant pool ("table of contents") which is specific to
that function, or at least to that shared object.
Therefore, when diverting to the wrapper, valgrind must reload
r2 with a new value which is correct for the wrapper*. When the
wrapper returns it must restore r2 to what it was before. This
means that V has to keep a shadow stack of (pc,r2) pairs, one
for each nested wrapper which is active, and carefully save and
restore r2 values. This means if there is a longjmp in a wrapper
it will go wrong, because this stack will be out of sync.
J
* even the problem of figuring out the correct r2 for a given
function is not simple, requires some rather fragile extensions
to the symbol table reader
|