|
From: Joseph M L. <val...@jo...> - 2006-04-14 13:08:06
|
I have also reproduced this on FC4 with a highly threaded application that uses pthread_cancel() and depends on pthread cleanup handlers. I am still using 2.4.1, and didn't see anything to indicate that it has been fixed in more recent versions. Anyone have any luck with this issue? Thanks, Joe |
|
From: Julian S. <js...@ac...> - 2006-04-18 01:00:17
|
On Friday 14 April 2006 14:07, Joseph M Link wrote: > I have also reproduced this on FC4 with a highly threaded application > that uses pthread_cancel() and depends on pthread cleanup handlers. > > I am still using 2.4.1, and didn't see anything to indicate that it has > been fixed in more recent versions. Anyone have any luck with this issue? It would be nice to fix this, yes. Er .. no .. nobody afaik has chased it any more. It's not an easy one. My belief is that pthread_cancel throws a signal at the target thread, and the signal handler starts unwinding the stack. This is not working because the unwinder is seeing a signal frame which is different from what it expects, so it gives up. At least, that's my theory. My first line of approach would be to figure out if that's really what pthread_cancel does, and if so what it expects the signal frame to look like. J |
|
From: Joseph M L. <val...@jo...> - 2006-04-18 04:05:29
|
So far, I have traced that pthread_cancel() basically hands off to gcc 4.0.2's _Unwind_ForcedUnwind(), passing it its unwind_stop() callback. unwind_stop() is defined in the pthread library, glibc-2.3.6/nptl/unwind.c. It is called, presumably, for each frame. When running natively, it always returns. When running under valgrind, after a certain number of iterations, it doesn't return. I assume it is doing the longjmp() at the end of the call. I've tried to instrument the call, but I am having trouble building the library (something about an undefined GLIBC_PRIVATE). Joe Julian Seward wrote: > On Friday 14 April 2006 14:07, Joseph M Link wrote: >> I have also reproduced this on FC4 with a highly threaded application >> that uses pthread_cancel() and depends on pthread cleanup handlers. >> >> I am still using 2.4.1, and didn't see anything to indicate that it has >> been fixed in more recent versions. Anyone have any luck with this issue? > > It would be nice to fix this, yes. Er .. no .. nobody afaik has chased > it any more. It's not an easy one. My belief is that pthread_cancel > throws a signal at the target thread, and the signal handler starts > unwinding the stack. This is not working because the unwinder is > seeing a signal frame which is different from what it expects, so it > gives up. At least, that's my theory. My first line of approach would > be to figure out if that's really what pthread_cancel does, and if so > what it expects the signal frame to look like. > > J |
|
From: Julian S. <js...@ac...> - 2006-04-18 12:00:21
|
That's a good start. If you can figure out how to compile glibc so as to get more details on what's going on inside unwind_stop(), we might be in with a chance of at least understanding what the problem is. J On Tuesday 18 April 2006 05:05, Joseph M Link wrote: > So far, I have traced that pthread_cancel() basically hands off to gcc > 4.0.2's _Unwind_ForcedUnwind(), passing it its unwind_stop() callback. > > unwind_stop() is defined in the pthread library, > glibc-2.3.6/nptl/unwind.c. It is called, presumably, for each frame. > When running natively, it always returns. When running under valgrind, > after a certain number of iterations, it doesn't return. I assume it is > doing the longjmp() at the end of the call. I've tried to instrument > the call, but I am having trouble building the library (something about > an undefined GLIBC_PRIVATE). > > Joe > > Julian Seward wrote: > > On Friday 14 April 2006 14:07, Joseph M Link wrote: > >> I have also reproduced this on FC4 with a highly threaded application > >> that uses pthread_cancel() and depends on pthread cleanup handlers. > >> > >> I am still using 2.4.1, and didn't see anything to indicate that it has > >> been fixed in more recent versions. Anyone have any luck with this > >> issue? > > > > It would be nice to fix this, yes. Er .. no .. nobody afaik has chased > > it any more. It's not an easy one. My belief is that pthread_cancel > > throws a signal at the target thread, and the signal handler starts > > unwinding the stack. This is not working because the unwinder is > > seeing a signal frame which is different from what it expects, so it > > gives up. At least, that's my theory. My first line of approach would > > be to figure out if that's really what pthread_cancel does, and if so > > what it expects the signal frame to look like. > > > > J > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live > webcast and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Joseph M L. <val...@jo...> - 2006-04-18 14:26:55
|
It appears that unwind_stop() is doing the longjmp because it thinks it is at the end of the stack. It thinks this because gcc's _Unwind_ForcedUnwind() tells it that it is at the end of the stack. Moving my investigation to gcc. Joe Julian Seward wrote: > That's a good start. If you can figure out how to compile glibc so > as to get more details on what's going on inside unwind_stop(), we > might be in with a chance of at least understanding what the problem > is. > > J > > On Tuesday 18 April 2006 05:05, Joseph M Link wrote: >> So far, I have traced that pthread_cancel() basically hands off to gcc >> 4.0.2's _Unwind_ForcedUnwind(), passing it its unwind_stop() callback. >> >> unwind_stop() is defined in the pthread library, >> glibc-2.3.6/nptl/unwind.c. It is called, presumably, for each frame. >> When running natively, it always returns. When running under valgrind, >> after a certain number of iterations, it doesn't return. I assume it is >> doing the longjmp() at the end of the call. I've tried to instrument >> the call, but I am having trouble building the library (something about >> an undefined GLIBC_PRIVATE). >> >> Joe >> >> Julian Seward wrote: >>> On Friday 14 April 2006 14:07, Joseph M Link wrote: >>>> I have also reproduced this on FC4 with a highly threaded application >>>> that uses pthread_cancel() and depends on pthread cleanup handlers. >>>> >>>> I am still using 2.4.1, and didn't see anything to indicate that it has >>>> been fixed in more recent versions. Anyone have any luck with this >>>> issue? >>> It would be nice to fix this, yes. Er .. no .. nobody afaik has chased >>> it any more. It's not an easy one. My belief is that pthread_cancel >>> throws a signal at the target thread, and the signal handler starts >>> unwinding the stack. This is not working because the unwinder is >>> seeing a signal frame which is different from what it expects, so it >>> gives up. At least, that's my theory. My first line of approach would >>> be to figure out if that's really what pthread_cancel does, and if so >>> what it expects the signal frame to look like. >>> >>> J >> ------------------------------------------------------- >> This SF.Net email is sponsored by xPML, a groundbreaking scripting language >> that extends applications into web and mobile media. Attend the live >> webcast and join the prime developer group breaking into this new coding >> territory! >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >> _______________________________________________ >> Valgrind-developers mailing list >> Val...@li... >> https://lists.sourceforge.net/lists/listinfo/valgrind-developers > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Joseph M L. <val...@jo...> - 2006-04-18 15:21:32
|
uw_frame_state_for() (gcc-4.0.2/gcc/unwind-dw2.c) is used by way of _Unwind_ForcedUnwind() to determine end of stack. uw_frame_state_for() uses _Unwind_Find_FDE() which at some point returns NULL. This leads to the end of stack indication in both native and valgrind, it is just prematurely NULL under valgrind. _Unwind_Find_FDE() (gcc-4.0.2/gcc/unwind-dw2-fde-glibc.c) uses dl_iterate_phdr() with its callback, _Unwind_IteratePhdrCallback(). This gets to a point where I don't really know what I am looking at. The gist is that _Unwind_IteratePhdrCallback() doesn't find the fde and leaves data->ret NULL, which is what _Unwind_Find_FDE() returns, signaling the premature end of stack. Can someone help from this point? Joe Joseph M Link wrote: > It appears that unwind_stop() is doing the longjmp because it thinks it > is at the end of the stack. It thinks this because gcc's > _Unwind_ForcedUnwind() tells it that it is at the end of the stack. > > Moving my investigation to gcc. > > Joe > > Julian Seward wrote: >> That's a good start. If you can figure out how to compile glibc so >> as to get more details on what's going on inside unwind_stop(), we >> might be in with a chance of at least understanding what the problem >> is. >> >> J >> >> On Tuesday 18 April 2006 05:05, Joseph M Link wrote: >>> So far, I have traced that pthread_cancel() basically hands off to gcc >>> 4.0.2's _Unwind_ForcedUnwind(), passing it its unwind_stop() callback. >>> >>> unwind_stop() is defined in the pthread library, >>> glibc-2.3.6/nptl/unwind.c. It is called, presumably, for each frame. >>> When running natively, it always returns. When running under valgrind, >>> after a certain number of iterations, it doesn't return. I assume it is >>> doing the longjmp() at the end of the call. I've tried to instrument >>> the call, but I am having trouble building the library (something about >>> an undefined GLIBC_PRIVATE). >>> >>> Joe >>> >>> Julian Seward wrote: >>>> On Friday 14 April 2006 14:07, Joseph M Link wrote: >>>>> I have also reproduced this on FC4 with a highly threaded application >>>>> that uses pthread_cancel() and depends on pthread cleanup handlers. >>>>> >>>>> I am still using 2.4.1, and didn't see anything to indicate that it >>>>> has >>>>> been fixed in more recent versions. Anyone have any luck with this >>>>> issue? >>>> It would be nice to fix this, yes. Er .. no .. nobody afaik has chased >>>> it any more. It's not an easy one. My belief is that pthread_cancel >>>> throws a signal at the target thread, and the signal handler starts >>>> unwinding the stack. This is not working because the unwinder is >>>> seeing a signal frame which is different from what it expects, so it >>>> gives up. At least, that's my theory. My first line of approach would >>>> be to figure out if that's really what pthread_cancel does, and if so >>>> what it expects the signal frame to look like. >>>> >>>> J >>> ------------------------------------------------------- >>> This SF.Net email is sponsored by xPML, a groundbreaking scripting >>> language >>> that extends applications into web and mobile media. Attend the live >>> webcast and join the prime developer group breaking into this new coding >>> territory! >>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >>> _______________________________________________ >>> Valgrind-developers mailing list >>> Val...@li... >>> https://lists.sourceforge.net/lists/listinfo/valgrind-developers >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by xPML, a groundbreaking scripting >> language >> that extends applications into web and mobile media. Attend the live >> webcast >> and join the prime developer group breaking into this new coding >> territory! >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >> _______________________________________________ >> Valgrind-developers mailing list >> Val...@li... >> https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: Julian S. <js...@ac...> - 2006-04-18 15:36:45
|
> This gets to a point where I don't really know what I am looking at. > The gist is that _Unwind_IteratePhdrCallback() doesn't find the fde and > leaves data->ret NULL, which is what _Unwind_Find_FDE() returns, > signaling the premature end of stack. What magic incantations have you used to compile glibc etc? What do we need to do to reproduce what you've done? Can you print out the sequence of pc values presented as the first arg to _Unwind_Find_FDE? I suspect it's not that the relevant FDEs are not found; rather that the pc values for which FDEs are sought are bogus. J |
|
From: Joseph M L. <val...@jo...> - 2006-04-18 16:00:19
|
> What magic incantations have you used to compile glibc etc? What do > we need to do to reproduce what you've done? I am currently only building gcc-4.0.2 and instrumenting these calls. I compile and use the test program reported with the bug. > Can you print out the sequence of pc values presented as the first arg > to _Unwind_Find_FDE? I suspect it's not that the relevant FDEs are not > found; rather that the pc values for which FDEs are sought are bogus. running without valgrind gives the following output: main: creating thread ... main: waiting for thread to be ready ... main: thread is ready main: cancelling thread ... main: waiting for thread to be clean... PC = 0x268816 PC = 0xd96cfd PC = 0xd94da6 PC = 0x36f214 PC = 0x343951 PC = 0x2f97de PC = 0x2f9602 PC = 0x8048780 main: cleaning up PC = 0x2689a6 PC = 0x80487a1 PC = 0xd90bd3 with valgrind: laptop % valgrind -q --tool=none a.out main: creating thread ... main: waiting for thread to be ready ... main: thread is ready main: cancelling thread ... main: waiting for thread to be clean... PC = 0x3aab7816 PC = 0x3a9a6cfd PC = 0x3a9a4da6 PC = 0x3a99f3a7 PC = 0x3a9a649f PC = 0xafeff021 (hangs here) Joe |
Joseph M Link wrote: > uw_frame_state_for() (gcc-4.0.2/gcc/unwind-dw2.c) is used by way of > _Unwind_ForcedUnwind() to determine end of stack. > > uw_frame_state_for() uses _Unwind_Find_FDE() which at some point returns > NULL. This leads to the end of stack indication in both native and > valgrind, it is just prematurely NULL under valgrind. > > _Unwind_Find_FDE() (gcc-4.0.2/gcc/unwind-dw2-fde-glibc.c) uses > dl_iterate_phdr() with its callback, _Unwind_IteratePhdrCallback(). > > This gets to a point where I don't really know what I am looking at. The > gist is that _Unwind_IteratePhdrCallback() doesn't find the fde and > leaves data->ret NULL, which is what _Unwind_Find_FDE() returns, > signaling the premature end of stack. I ran into related problems with _Unwind_ForcedUnwind(), but with the kernel vDSO [virtual dynamic shared object "linux-gate.so.1"]: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180351 If valgrind has any frames on the stack (as distinguished from user frames on the stack) then the unwinder requires the DWARF2 description of those frames, or it will get lost. Using dl_iterate_phdr(), the runtime loader ld-linux.so.2 supplies the description for all the modules (main program, shared libraries, dlopen files) that it knows about; obviously this list [should] excludes anything from valgrind. At one time I considered noticing that libgcc_s had been loaded, then calling __register_frame* to inform the unwinder about the frame descriptions for my "auditing" code. However, I managed to avoid this, by using a kernel patch that used more general DWARF2 unwind info for linux-gate.so.1. Search the LKML for my patch "i386 rt_sigframe glexibility to virtualize signal delivery" (2006-02-15.) In yet another project, I found it necessary to produce a libgcc_s with modified _Unwind_GetIP() and _Unwind_SetIP() routines, and make sure that these toutines were called instead of the inline "(CONTEXT)->ra" etc. Hope this helps. -- |
|
From: Tom H. <to...@co...> - 2006-04-18 16:27:23
|
In message <44450ECE.8000600@BitWagon.com>
John Reiser <jreiser@BitWagon.com> wrote:
> Joseph M Link wrote:
> > uw_frame_state_for() (gcc-4.0.2/gcc/unwind-dw2.c) is used by way of
> > _Unwind_ForcedUnwind() to determine end of stack.
> >
> > uw_frame_state_for() uses _Unwind_Find_FDE() which at some point returns
> > NULL. This leads to the end of stack indication in both native and
> > valgrind, it is just prematurely NULL under valgrind.
> >
> > _Unwind_Find_FDE() (gcc-4.0.2/gcc/unwind-dw2-fde-glibc.c) uses
> > dl_iterate_phdr() with its callback, _Unwind_IteratePhdrCallback().
> >
> > This gets to a point where I don't really know what I am looking at. The
> > gist is that _Unwind_IteratePhdrCallback() doesn't find the fde and
> > leaves data->ret NULL, which is what _Unwind_Find_FDE() returns,
> > signaling the premature end of stack.
>
> I ran into related problems with _Unwind_ForcedUnwind(), but with the
> kernel vDSO [virtual dynamic shared object "linux-gate.so.1"]:
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180351
>
> If valgrind has any frames on the stack (as distinguished from user
> frames on the stack) then the unwinder requires the DWARF2 description
> of those frames, or it will get lost. Using dl_iterate_phdr(), the
> runtime loader ld-linux.so.2 supplies the description for all the
> modules (main program, shared libraries, dlopen files) that it knows
> about; obviously this list [should] excludes anything from valgrind.
That rings a bell... I remember looking at this, but I don't seem
to have added my conclusions to the bug for some reason.
I seem to recall that I decided that the problem was a lack
of DWARF debug information for the system call frame when valgrind
replaces the vDSO routines with it's own ones. The system ones
delberately have DWARF unwind information and ours don't.
I thought I had tried to add DWARF unwind information to our
one but I can't find any trace of that now.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2006-04-18 19:57:10
|
> If valgrind has any frames on the stack (as distinguished from user > frames on the stack) then the unwinder requires the DWARF2 description > of those frames, or it will get lost. Using dl_iterate_phdr(), the > runtime loader ld-linux.so.2 supplies the description for all the > modules (main program, shared libraries, dlopen files) that it knows > about; obviously this list [should] excludes anything from valgrind. Implication is that even if we add unwind info to V-supplied frames, it will not help unless ld-linux.so is aware of it. Which it probably isn't. J |
|
From: Julian S. <js...@ac...> - 2006-04-18 19:54:30
|
> I seem to recall that I decided that the problem was a lack > of DWARF debug information for the system call frame when valgrind > replaces the vDSO routines with it's own ones. The system ones > delberately have DWARF unwind information and ours don't. That would make sense. From Joe's printout it seems like the unwinder manages to unwind 6 frames before going wrong, so it's not the signal delivery frame that's the problem. Uh .. but which of our routines did you mean? I'm unclear. Is it VG_(x86_linux_REDIR_FOR__dl_sysinfo_int80) ? J |
|
From: Julian S. <js...@ac...> - 2006-04-18 20:09:56
|
Now I'm even more confused. I can't reproduce this hang using either the trunk, 3.1.1 or 3.0.1, using g++ 4.0.2 on SuSE 10.0, on x86. Can you clarify? What do I need to do to reproduce this with a recent Valgrind? J On Friday 14 April 2006 14:07, Joseph M Link wrote: > I have also reproduced this on FC4 with a highly threaded application > that uses pthread_cancel() and depends on pthread cleanup handlers. > > I am still using 2.4.1, and didn't see anything to indicate that it has > been fixed in more recent versions. Anyone have any luck with this issue? > > Thanks, > Joe > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live > webcast and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Joseph M L. <val...@jo...> - 2006-04-18 20:54:02
Attachments:
test.cc
|
On Fedcore Core 4, using the latest gcc and glibc rpms: % rpm -q gcc glibc gcc-4.0.2-8.fc4 glibc-2.3.6-3 I compile the attached program with: % g++ -Wall -g -lpthread test.cc And am able to reproduce the problem with both 2.4.1 and 3.1.1 Joe Julian Seward wrote: > Now I'm even more confused. I can't reproduce this hang using > either the trunk, 3.1.1 or 3.0.1, using g++ 4.0.2 on SuSE 10.0, > on x86. > > Can you clarify? What do I need to do to reproduce this with > a recent Valgrind? > > J > > On Friday 14 April 2006 14:07, Joseph M Link wrote: >> I have also reproduced this on FC4 with a highly threaded application >> that uses pthread_cancel() and depends on pthread cleanup handlers. >> >> I am still using 2.4.1, and didn't see anything to indicate that it has >> been fixed in more recent versions. Anyone have any luck with this issue? >> >> Thanks, >> Joe >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by xPML, a groundbreaking scripting language >> that extends applications into web and mobile media. Attend the live >> webcast and join the prime developer group breaking into this new coding >> territory! >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >> _______________________________________________ >> Valgrind-developers mailing list >> Val...@li... >> https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Tom H. <to...@co...> - 2006-04-18 23:31:41
|
In message <200...@ac...> you wrote: > > > I seem to recall that I decided that the problem was a lack > > of DWARF debug information for the system call frame when valgrind > > replaces the vDSO routines with it's own ones. The system ones > > delberately have DWARF unwind information and ours don't. > > That would make sense. From Joe's printout it seems like the > unwinder manages to unwind 6 frames before going wrong, so it's > not the signal delivery frame that's the problem. > > Uh .. but which of our routines did you mean? I'm unclear. > Is it VG_(x86_linux_REDIR_FOR__dl_sysinfo_int80) ? That's the routine I was thinking of, yes. Tom -- Tom Hughes (to...@co...) http://www.compton.nu/ |
|
From: Julian S. <js...@ac...> - 2006-04-28 22:53:48
|
On Wednesday 19 April 2006 00:31, Tom Hughes wrote: > In message <200...@ac...> you wrote: > > > I seem to recall that I decided that the problem was a lack > > > of DWARF debug information for the system call frame when valgrind > > > replaces the vDSO routines with it's own ones. The system ones > > > delberately have DWARF unwind information and ours don't. > > > > That would make sense. From Joe's printout it seems like the > > unwinder manages to unwind 6 frames before going wrong, so it's > > not the signal delivery frame that's the problem. > > > > Uh .. but which of our routines did you mean? I'm unclear. > > Is it VG_(x86_linux_REDIR_FOR__dl_sysinfo_int80) ? > > That's the routine I was thinking of, yes. Tom After some poking around with the information Joseph supplied, I see that that is indeed where the glibc unwinder stops - having managed to recreate the problem on FC5. I can't think of an easy way to attach unwind info to that routine (we could add it, but glibc in the client can't see it, so no point). But I can't figure out why we even need this function in the first place. What's the point? I know Jeremy added it for some reason, but I dunno what. I disabled the redirection to it (set up in m_redir.c) and Joseph's test program then works fine. J |
>>>Uh .. but which of our routines did you mean? I'm unclear. >>>Is it VG_(x86_linux_REDIR_FOR__dl_sysinfo_int80) ? >> >>That's the routine I was thinking of, yes. > After some poking around with the information Joseph supplied, > I see that that is indeed where the glibc unwinder stops - > having managed to recreate the problem on FC5. > > I can't think of an easy way to attach unwind info to that routine > (we could add it, but glibc in the client can't see it, so no point). It may be possible for Valgrind to recognize libgcc_s.so, search for the appropriate __register_* function using dlsym(), and call the function so that the unwinder knows about VG_(x86_linux_REDIR_FOR__dl_sysinfo_int80). That's very messy, but it might work. -- |
|
From: Julian S. <js...@ac...> - 2006-04-29 03:03:00
Attachments:
tentative_108528.patch
|
It seems to me the cleanest solution is to get rid of VG_(x86_linux_REDIR_FOR__dl_sysinfo_int80) completely. This makes Joe's program work properly, since the gcc unwinder is no longer confused by our fake replacement function which lacks unwind info. Unfortunately this breaks our own stack unwind hack, in m_stacktrace, on x86-linux, creating less useful traces for stacks which end in a syscall. (This is the only purpose I could find for the redirect). Fortunately it is easily fixed just by looking for and noting the address of the real _dl_sysinfo_int80, which is easily done whilst reading debug info, since m_redir inspects all incoming symbols. The attached patch (against the svn trunk, r5865) does all that. Joe: can you try it? J On Friday 28 April 2006 22:42, Julian Seward wrote: > On Wednesday 19 April 2006 00:31, Tom Hughes wrote: > > In message <200...@ac...> you wrote: > > > > I seem to recall that I decided that the problem was a lack > > > > of DWARF debug information for the system call frame when valgrind > > > > replaces the vDSO routines with it's own ones. The system ones > > > > delberately have DWARF unwind information and ours don't. > > > > > > That would make sense. From Joe's printout it seems like the > > > unwinder manages to unwind 6 frames before going wrong, so it's > > > not the signal delivery frame that's the problem. > > > > > > Uh .. but which of our routines did you mean? I'm unclear. > > > Is it VG_(x86_linux_REDIR_FOR__dl_sysinfo_int80) ? > > > > That's the routine I was thinking of, yes. > > Tom > > After some poking around with the information Joseph supplied, > I see that that is indeed where the glibc unwinder stops - > having managed to recreate the problem on FC5. > > I can't think of an easy way to attach unwind info to that routine > (we could add it, but glibc in the client can't see it, so no point). > > But I can't figure out why we even need this function in the first > place. What's the point? I know Jeremy added it for some reason, > but I dunno what. I disabled the redirection to it (set up in > m_redir.c) and Joseph's test program then works fine. > > J > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
>>+// Note: generally, putting replacement functions in here is a bad >>+// idea, since any Dwarf frame-unwind info attached to them will not >>+// be seen by the unwinder in gcc's runtime support. This means >>+// unwinding during exception handling by gcc tends to fail if it >>+// encounters one of these replacement functions. A better place to >>+// put them is in one of the .so's preloaded into the client, since >>+// the client's ld.so will know about it and so gcc's unwinder >>+// (somehow) is able to get hold of it. It's no mystery. At PT_INTERP time, ld.so registers one of its own functions into the .eh_frame machinery. (Note Elf32_Phdr.p_type of GNU_EH_FRAME.) The unwinder accesses the .eh_frame machinery and calls the registered function, which then uses dl_iterate_phdr to go over all the loaded modules and supply the corresponding info for the unwinder. -- |
|
From: Tom H. <to...@co...> - 2006-04-29 12:38:38
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> It seems to me the cleanest solution is to get rid of
> VG_(x86_linux_REDIR_FOR__dl_sysinfo_int80) completely.
> This makes Joe's program work properly, since the gcc
> unwinder is no longer confused by our fake replacement function
> which lacks unwind info.
As far as I can see that is the only purpose for it, yes. The
redirect was originally committed by me:
------------------------------------------------------------------------
r297912 | thughes | 2004-03-22 19:46:29 +0000 (Mon, 22 Mar 2004) | 4 lines
Redirect _dl_sysinfo_int80, which is glibc's default system call
routine, to the routine in our trampoline page so that the
special sysinfo unwind hack in vg_execontext.c will kick in.
------------------------------------------------------------------------
Strangely the unwind hack in vg_execontext.c was committed a couple
of months before that by Jeremy as part of him committing my TLS support
work. I not sure how the hack was triggered before the redirect went it.
Actually, thinking about it, we may have been redirecting the sysinfo
page which would have done the trick.
> Unfortunately this breaks our own stack unwind hack, in
> m_stacktrace, on x86-linux, creating less useful traces
> for stacks which end in a syscall. (This is the only
> purpose I could find for the redirect). Fortunately it is
> easily fixed just by looking for and noting the address of the
> real _dl_sysinfo_int80, which is easily done whilst reading
> debug info, since m_redir inspects all incoming symbols.
Sounds good to me. I think we always suppress the sysinfo page
on x86 now so we will always go though _dl_sysinfo_int80() to do
a system call.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2006-05-02 01:30:09
|
> > Unfortunately this breaks our own stack unwind hack, in > > m_stacktrace, on x86-linux, creating less useful traces > > for stacks which end in a syscall. (This is the only > > purpose I could find for the redirect). Fortunately it is > > easily fixed just by looking for and noting the address of the > > real _dl_sysinfo_int80, which is easily done whilst reading > > debug info, since m_redir inspects all incoming symbols. > > Sounds good to me. I think we always suppress the sysinfo page > on x86 now so we will always go though _dl_sysinfo_int80() to do > a system call. Committed; thanks for the background info. J |