|
From: H. J. L. <hj...@lu...> - 2002-10-18 23:58:17
|
On Fri, Oct 18, 2002 at 04:32:53PM -0700, Jeremy Fitzhardinge wrote: > Hi HJ, > > Oops. I guess lucon.com is dead to you. Let's see if this works. Is lucon.com available? I may switch to that. In the meantime, please email me at hj...@lu... :-). Jeremy, drop me a line if you want to make autofs 4.0. Content-Description: Forwarded message - Overriding glibc functions > Subject: Overriding glibc functions > From: Jeremy Fitzhardinge <je...@go...> > To: hj...@lu... > X-Mailer: Ximian Evolution 1.0.8 > Date: 18 Oct 2002 16:12:56 -0700 > > On Fri, 2002-10-18 at 13:53, Julian Seward wrote: > > sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libc-2.2.4.so | grep msgsnd > > 000e76c8 T msgsnd > > sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libc-2.2.4.so | grep msgrcv > > 000e771c T msgrcv > > > > sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libpthread-0.9.so | grep msgsnd > > sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libpthread-0.9.so | grep msgrcv > > > > IOW, msgsnd/msgrcv are exported as strong text symbols from libc, and not > > at all from libpthread. Your patch adds them to valgrind's libpthread.so. > > So any threaded program running on V will have two definitions of > > them to choose from, and it is unclear which it will get. If the libc > > version was defined as a weak symbol, then V's implementation would > > definitely overrride it (and many libc things are indeed exported weakly) > > but that is not the case. > > > > Does that make sense? Is there some other behaviour of the dynamic > > linker which might save the day and avoid this issue? I don't know much > > about ld.so; perhaps you understand more about all this? > > Hm, no, but I know who does. > > Hey, HJ: > > We have the problem that Valgrind's implementation of libpthread.so > needs to intercept various libc functions (in this instance > msgsnd/msgrcv) so it can make sure they only block the thread rather > than the whole process (since Valgrind completely re-implements the > thread library without using kernel threads). The question is which > definition will be picked up when glibc's function definition is not a > weak reference? How can we make sure that our libpthread version is > preferred over the glibc version? > > One thing occurs to me: the main Valgrind shared object, valgrind.so, is > always LD_PRELOADED, and is linked with -Wl,-z -Wl,initfirst. I wonder > if the solution is to put the intercept stub functions in valgrind.so, > and have them call either glibc or libpthread, depending on the context. > There are 2 things you should know: 1. There are supposed to be no differences between weak and strong symbols in DSOs. I submitted a patch to glibc: http://sources.redhat.com/ml/libc-alpha/2001-09/msg00109.html 2. Glibc will make sure libpthread.so will override libc.so, weak or strong. Please file a bug if it doesn't do so. But please make sure your libpthread has: # readelf -d /lib/libpthread.so.0 ... 0x6ffffffb (FLAGS_1) Flags: NODELETE INITFIRST ... by passing "-z nodelete -z initfirst" to ld. H.J. |
|
From: H. J. L. <hj...@lu...> - 2002-10-19 00:52:39
|
On Fri, Oct 18, 2002 at 05:26:21PM -0700, Jeremy Fitzhardinge wrote: > On Fri, 2002-10-18 at 16:58, H. J. Lu wrote: > > 1. There are supposed to be no differences between weak and strong > > symbols in DSOs. I submitted a patch to glibc: > > > > http://sources.redhat.com/ml/libc-alpha/2001-09/msg00109.html > > It looks from that thread that the patch wasn't applied to 2.2. Does > that mean it still needs to be applied, or has it been applied since. I > really don't understand the issues here; can or explain, or is there a I am pushing for it again. > reference? In particular, what's DT_FILTER? Is it a mechanism for > interposing symbols, or is it something else? > See http://docs.sun.com/db/doc/806-0641 As I understand and I could be wrong, there may be still some problems with DT_FILTER and DT_AUXILIARY in glibc. > > 2. Glibc will make sure libpthread.so will override libc.so, weak > > or strong. Please file a bug if it doesn't do so. But please make > > sure your libpthread has: > > > > # readelf -d /lib/libpthread.so.0 > > ... > > 0x6ffffffb (FLAGS_1) Flags: NODELETE INITFIRST > > ... > > > > by passing "-z nodelete -z initfirst" to ld. > > OK, but valgrind.so is already being linked with "-z initfirst"; what > happens if there are two .so files with initfirst? (It does seem to > work). Which ever comes first wins H.J. |
|
From: Jeremy F. <je...@go...> - 2002-10-19 01:50:01
|
On Fri, 2002-10-18 at 17:52, H. J. Lu wrote: > > OK, but valgrind.so is already being linked with "-z initfirst"; what > > happens if there are two .so files with initfirst? (It does seem to > > work). > > Which ever comes first wins So if the order of events is: 1. run executable A, with LD_PRELOAD=valgrind.so (which has initfirst set) 2. A uses function foo() which is defined in libc.so 3. A doesn't use libpthread, but after a while it dlopens libB.so, which does 4. libB.so pulls in Valgrind's libpthread.so. 5. Valgrind's libpthread.so wants to override foo(), now that this has become a multithreaded program. If A uses foo() again, which definition will it get? libc's original one (which it was using before), or the new definition in libpthread.so? I wonder if the solution is to always pull in Valgrind's libpthread.so, but only make it do special stuff once there's more than one thread... Thanks, J |
|
From: H. J. L. <hj...@lu...> - 2002-10-19 06:02:04
|
On Fri, Oct 18, 2002 at 06:50:02PM -0700, Jeremy Fitzhardinge wrote: > On Fri, 2002-10-18 at 17:52, H. J. Lu wrote: > > > OK, but valgrind.so is already being linked with "-z initfirst"; what > > > happens if there are two .so files with initfirst? (It does seem to > > > work). > > > > Which ever comes first wins > > So if the order of events is: > > 1. run executable A, with LD_PRELOAD=valgrind.so (which has initfirst > set) > 2. A uses function foo() which is defined in libc.so > 3. A doesn't use libpthread, but after a while it dlopens libB.so, which > does > 4. libB.so pulls in Valgrind's libpthread.so. > 5. Valgrind's libpthread.so wants to override foo(), now that this has > become a multithreaded program. If A uses foo() again, which definition > will it get? libc's original one (which it was using before), or the > new definition in libpthread.so? > I think A will always use the original foo. BTW, if libB.so is dlopened, it becomes very tricky. See http://sources.redhat.com/ml/libc-alpha/2002-05/msg00214.html H.J. |
|
From: Julian S. <js...@ac...> - 2002-10-20 10:54:12
|
Jeremy
Not entirely sure what the outcome of this is -- can you summarise? I get
the impression that it _should_ be OK -- libpthread.so should override libc
-- but perhaps I'm wrong?
A couple of other points.
1. Your __select and __poll renamings suffer from the same problem.
2. I didn't mention this before, but ... you might want to play with the
coregrind/dosyms script. This compares the exported symbols from
V's libpthread.so vs those from the standard one; and it is how I
navigate this swamp -- to a first approximation I try and make
V's libpthread.so export the same syms as the standard one.
J
On Saturday 19 October 2002 2:50 am, Jeremy Fitzhardinge wrote:
> On Fri, 2002-10-18 at 17:52, H. J. Lu wrote:
> > > OK, but valgrind.so is already being linked with "-z initfirst"; what
> > > happens if there are two .so files with initfirst? (It does seem to
> > > work).
> >
> > Which ever comes first wins
>
> So if the order of events is:
>
> 1. run executable A, with LD_PRELOAD=valgrind.so (which has initfirst
> set)
> 2. A uses function foo() which is defined in libc.so
> 3. A doesn't use libpthread, but after a while it dlopens libB.so, which
> does
> 4. libB.so pulls in Valgrind's libpthread.so.
> 5. Valgrind's libpthread.so wants to override foo(), now that this has
> become a multithreaded program. If A uses foo() again, which definition
> will it get? libc's original one (which it was using before), or the
> new definition in libpthread.so?
>
> I wonder if the solution is to always pull in Valgrind's libpthread.so,
> but only make it do special stuff once there's more than one thread...
>
> Thanks,
> J
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:
> Access Your PC Securely with GoToMyPC. Try Free Now
> https://www.gotomypc.com/s/OSND/DD
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
|
|
From: Jeremy F. <je...@go...> - 2002-10-20 18:35:52
|
On Sun, 2002-10-20 at 04:00, Julian Seward wrote:
Jeremy
Not entirely sure what the outcome of this is -- can you summarise? I get
the impression that it _should_ be OK -- libpthread.so should override libc
-- but perhaps I'm wrong?
Yes, but... The case where a program binds to some symbols, then
(implicitly) loads libpthread and becomes multithreaded is difficult,
because any previously bound references will remain bound. glibc w/
pthreads has this problem as well (libpthread wants to intercept fork(),
but may not get to it in time).
I think the safe and sure way of making this always work is to make
valgrind.so define all the symbols we want to intercept, and then have
it dispatch them out to our libpthread or into the real libc, depending
on whether a second thread has been created. I think this is certain to
catch all the references we want to catch under all circumstances.
A couple of other points.
1. Your __select and __poll renamings suffer from the same problem.
As do all the libc intercepts in libpthread.so (select and __select are
strong aliases and are therefore indistinguishable
2. I didn't mention this before, but ... you might want to play with the
coregrind/dosyms script. This compares the exported symbols from
V's libpthread.so vs those from the standard one; and it is how I
navigate this swamp -- to a first approximation I try and make
V's libpthread.so export the same syms as the standard one.
OK. It looks to me like the version script mechanism can also be used
to make sure there's no symbol namespace leakage (ie, it can enforce the
rule that only vgPlain_* symbols are visible outside the .so).
J
|
|
From: Julian S. <js...@ac...> - 2002-10-20 20:09:20
|
Hi. Thx for the helgrind fixes. I merged the following patches into the head this afternoon: 03-poll-select (is also in 1.0.4) 04-lax-ioctls (is also in 1.0.4) 07-seginfo 13-data-syms 14-hg-tid 06-memops 15-hg-datasym 16-ld-nodelete I think Nick has already merged 08-skin-clientreq. I just tried a run of mozilla-1.0.1 on the fully patched helgrind. There are bazillions of errors reported for the _dl_num_relocations problem which we already know about (even with LD_BIND_NOW=y). If those were to be got rid of somehow, the remaining ones are sufficiently few that they might actually tell us something useful. Which is a step in the right direction. Any ideas how to get rid of this blasted _dl_num_relocations thing? Perhaps helgrind can check to see if the variable in question is indeed _dl_num_relocations, and suppress the error if so? Another interesting test is tests/unused/pth_threadpool; this gives a large number of errors, many of which pertain to _IO_2_1_stdout_. J |
|
From: Julian S. <js...@ac...> - 2002-10-20 20:27:26
|
> Not entirely sure what the outcome of this is -- can you summarise? I > get the impression that it _should_ be OK -- libpthread.so should override > libc -- but perhaps I'm wrong? > > Yes, but... The case where a program binds to some symbols, then > (implicitly) loads libpthread and becomes multithreaded is difficult, > because any previously bound references will remain bound. glibc w/ > pthreads has this problem as well (libpthread wants to intercept fork(), > but may not get to it in time). > > I think the safe and sure way of making this always work is to make > valgrind.so define all the symbols we want to intercept, and then have > it dispatch them out to our libpthread or into the real libc, depending > on whether a second thread has been created. I think this is certain to > catch all the references we want to catch under all circumstances. How about just link in our libpthread.so under all circumstances? Surely the magic hacks it does for some functions (select, poll, etc) are harmless for single-threaded programs, just a little inefficient? Is there any advantage to the added complication of switching behaviours depending on whether or not a second thread has been created? All else being equal I prefer to avoid complexity like that. > A couple of other points. > > 1. Your __select and __poll renamings suffer from the same problem. > > As do all the libc intercepts in libpthread.so (select and __select are > strong aliases and are therefore indistinguishable Blargh. J |
|
From: Jeremy F. <je...@go...> - 2002-10-20 20:53:44
|
On Sun, 2002-10-20 at 13:33, Julian Seward wrote:
How about just link in our libpthread.so under all circumstances? Surely
the magic hacks it does for some functions (select, poll, etc) are harmless
for single-threaded programs, just a little inefficient? Is there any
advantage to the added complication of switching behaviours depending on
whether or not a second thread has been created? All else being equal
I prefer to avoid complexity like that.
It leaks the pthread names into programs which didn't ask for
libpthread. But that's pretty likely anyway, given the number of
libraries which pull in libpthread for you.
Also, some of libpthread's substitutions may not be functionally
identical to the real functions, so it would be nice to avoid
introducing that complexity unless it is necessary (I don't think that's
the case at the moment, but it is true for the non-blocking open, which
is impossible to emulate exactly).
So I don't have a strong objection to always bringing in libpthread, but
I do think we should just use the standard libc implementations unless
we actually need the threaded versions.
J
|
|
From: Jeremy F. <je...@go...> - 2002-10-20 20:54:06
|
On Sun, 2002-10-20 at 13:15, Julian Seward wrote:
Hi. Thx for the helgrind fixes. I merged the following patches into
the head this afternoon:
03-poll-select (is also in 1.0.4)
04-lax-ioctls (is also in 1.0.4)
07-seginfo
13-data-syms
14-hg-tid
06-memops
15-hg-datasym
16-ld-nodelete
I think Nick has already merged 08-skin-clientreq.
Excellent. I'll merge and regenerate my patchset.
I just tried a run of mozilla-1.0.1 on the fully patched helgrind.
There are bazillions of errors reported for the _dl_num_relocations
problem which we already know about (even with LD_BIND_NOW=y). If
those were to be got rid of somehow, the remaining ones are sufficiently
few that they might actually tell us something useful.
Yes, LD_BIND_NOW will only work for simple dynamically linked programs
which do all their linking at startup (ie, while single threaded).
Presumably Mozilla is still dynamically linking things well into its
life, with multiple threads.
Which is a step in the right direction.
Any ideas how to get rid of this blasted _dl_num_relocations thing?
Perhaps helgrind can check to see if the variable in question is
indeed _dl_num_relocations, and suppress the error if so?
Well, obviously Helgrind needs to be taught how to understand
suppressions, and we need to start building a standard suppression
file. I also think the handling of duplicate errors needs to be a bit
different from Memcheck and so on. Memcheck considers the error context
to be the same if there's matching portions of the stack backtrace. I
think for helgrind, the memory location itself is all that's
interesting: it doesn't matter what the call frames are (for the
purposes of suppression; obviously you need them for reporting).
Also, the increment of _dl_num_locations is probably an atomic inc
anyway (probably a simple addl $1, _dl_num_locations), so it doesn't
really count as an error at all. I'm not planning on teaching helgrind
about atomic operations just yet, but it is an obvious path for future
work.
Another interesting test is tests/unused/pth_threadpool; this gives
a large number of errors, many of which pertain to _IO_2_1_stdout_.
Yes, I noticed there were a number of reports coming out of stdio, but I
haven't looked into it yet. Quite possibly they're atomic ops as well.
Or just bugs.
J
|
|
From: Jeremy F. <je...@go...> - 2002-10-20 21:40:09
|
On Sun, 2002-10-20 at 13:54, Jeremy Fitzhardinge wrote:
On Sun, 2002-10-20 at 13:15, Julian Seward wrote:
I think Nick has already merged 08-skin-clientreq.
Excellent. I'll merge and regenerate my patchset.
OK, I've updated http://www.goop.org/~jeremy/valgrind - it doesn't look
like 08-skin-clientreq has been merged yet.
J
|
|
From: Julian S. <js...@ac...> - 2002-10-20 22:15:38
|
On Sunday 20 October 2002 10:40 pm, Jeremy Fitzhardinge wrote: > On Sun, 2002-10-20 at 13:54, Jeremy Fitzhardinge wrote: > On Sun, 2002-10-20 at 13:15, Julian Seward wrote: > I think Nick has already merged 08-skin-clientreq. > > Excellent. I'll merge and regenerate my patchset. > > OK, I've updated http://www.goop.org/~jeremy/valgrind - it doesn't look > like 08-skin-clientreq has been merged yet. Hmm. You're right; my memory is wrong. Nick, do you see any potential problems if I merge in 08-skin-clientreq? [just a sanity check before I do so ...] Jeremy, I see 17-hg-generic-mutex. I presume this is infrastructure for the thread-segments stuff, or something like that. J |
|
From: Jeremy F. <je...@go...> - 2002-10-20 22:50:05
|
On Sun, 2002-10-20 at 15:21, Julian Seward wrote: > Jeremy, I see 17-hg-generic-mutex. I presume this is infrastructure for > the thread-segments stuff, or something like that. I've got a number of plans for this: - keep a graph of mutex dependencies so that incorrect locking order can be detected - keep track of mutex lifetimes to handle mutexes in allocated memory which gets freed - support non-pthreads threading using client requests to indicate lock/unlock/context switch - record execution context when a mutex state changes, so it can be reported - and some other stuff (lots use uses for maintaining per-mutex information) I think the thread segments stuff only needs a redefinition of tid (which fits in well with the non-pthreads changes as well). J |
|
From: Nicholas N. <nj...@ca...> - 2002-10-21 08:59:51
|
On Sun, 20 Oct 2002, Julian Seward wrote: > > OK, I've updated http://www.goop.org/~jeremy/valgrind - it doesn't look > > like 08-skin-clientreq has been merged yet. > > Nick, do you see any potential problems if I merge in 08-skin-clientreq? > [just a sanity check before I do so ...] Nope, I haven't done it only because I am slack. I will do it today. N |
|
From: Jeremy F. <je...@go...> - 2002-10-19 00:26:20
|
On Fri, 2002-10-18 at 16:58, H. J. Lu wrote: > 1. There are supposed to be no differences between weak and strong > symbols in DSOs. I submitted a patch to glibc: > > http://sources.redhat.com/ml/libc-alpha/2001-09/msg00109.html It looks from that thread that the patch wasn't applied to 2.2. Does that mean it still needs to be applied, or has it been applied since. I really don't understand the issues here; can or explain, or is there a reference? In particular, what's DT_FILTER? Is it a mechanism for interposing symbols, or is it something else? > 2. Glibc will make sure libpthread.so will override libc.so, weak > or strong. Please file a bug if it doesn't do so. But please make > sure your libpthread has: > > # readelf -d /lib/libpthread.so.0 > ... > 0x6ffffffb (FLAGS_1) Flags: NODELETE INITFIRST > ... > > by passing "-z nodelete -z initfirst" to ld. OK, but valgrind.so is already being linked with "-z initfirst"; what happens if there are two .so files with initfirst? (It does seem to work). Thanks, J |