|
From: Stephen M.
|
Current versions of Valgrind work less-than-perfectly with the latest version of glibc in Debian unstable (Debian version 2.3.5-6). Part of the problem seems to be that this build of libc6 manages to include even less symbol information for /lib/ld-linux.so.2 than the 2.3.2 build from sarge did (or it's in a format that Valgrind can't handle). One of the relevant symbols is _dl_relocate_object, so this may just be a result of the gratuitous "sanitizing" that John Reiser was complaining about a few weeks ago (http://sourceforge.net/mailarchive/message.php?msg_id=12915644); the new libc is also compiled with a newer GCC ("Debian 4.0.1-3 / pre-4.0.2 20050725", instead of 3.3.5). One obvious symptom of this is that suppressions don't work. For instance, there's a longstanding problem whose correct stack trace is: Conditional jump or move depends on uninitialised value(s) at 0x40086B6: _dl_relocate_object (do-rel.h:65) by 0x4002376: dl_main (rtld.c:1916) by 0x400EBDD: _dl_sysdep_start (dl-sysdep.c:237) by 0x4003675: _dl_start (rtld.c:307) by 0x40007C6: (within /usr/lib/debug/ld-2.3.5.so) With the sarge libc, Valgrind can at least see the name of the function on top of the stack: Conditional jump or move depends on uninitialised value(s) at 0x4009620: _dl_relocate_object (in /lib/ld-2.3.2.so) by 0x400211C: (within /lib/ld-2.3.2.so) by 0x400F1F6: (within /lib/ld-2.3.2.so) by 0x4000F3A: (within /lib/ld-2.3.2.so) by 0x4000C26: (within /lib/ld-2.3.2.so) With the latest libc version, no symbol information shows up at all: Conditional jump or move depends on uninitialised value(s) at 0x1B8ECB13: (within /lib/ld-2.3.5.so) by 0x1B8E631C: (within /lib/ld-2.3.5.so) by 0x1B8F2BDD: (within /lib/ld-2.3.5.so) by 0x1B8E7675: (within /lib/ld-2.3.5.so) by 0x1B8E47C6: (within /lib/ld-2.3.5.so) A similar problem seems to affect the {strlen,index}-not-intercepted-early-enough-HACK-* suppressions. These unsuppressed errors are the subject of Debian bug #326823, I believe. One potential fix for this problem is to use the debugging version of the dynamic linker, which has full static symbols (you can see it's where I got the first stack trace above). Unfortunately, I don't believe there's any standard way (equivalent to LD_LIBRARY_PATH) to tell programs to use a different dynamic linker: "/lib/ld-linux.so.2" is hardcoded in binaries. (This is especially annoying because the dynamic linkers from different versions of glibc are generally neither forward nor backward compatible with other versions of libc.so, so you can't have more than one glibc version installed at once. But that's a different rant.) It's possible to start a program with a different linker by saying something like "/usr/lib/debug/ld-linux.so.2 /bin/cat /proc/self/maps", but that isn't inherited by child processes, and it makes Valgrind report even more errors (I haven't investigated why). Since Valgrind loads ld-linux.so.2 for its guests, it would be helpful if there was a way it could be told to switch in a different version. I've appended a proof-of-concept patch that hard-codes the standard location of the debugging version. In brief testing, this change allows the suppressions to work correctly for me, and also seems to help Valgrind run better on a large program (XEmacs), perhaps because it makes some redirects work. By the way, are any of the Debian valgrind maintainers regular readers of this list? If not, I'll forward this information to the BTS. -- Stephen Index: coregrind/m_ume.c =================================================================== --- coregrind/m_ume.c (revision 4894) +++ coregrind/m_ume.c (working copy) @@ -530,19 +530,26 @@ case PT_INTERP: { char *buf = VG_(malloc)(ph->p_filesz+1); int j; - int intfd; + int intfd = -1; int baseaddr_set; vg_assert(buf); VG_(pread)(fd, buf, ph->p_filesz, ph->p_offset); buf[ph->p_filesz] = '\0'; - sres = VG_(open)(buf, VKI_O_RDONLY, 0); - if (sres.isError) { - VG_(printf)("valgrind: m_ume.c: can't open interpreter\n"); - VG_(exit)(1); + if (!VG_(strcmp)(buf, "/lib/ld-linux.so.2")) { + sres = VG_(open)("/usr/lib/debug/ld-linux.so.2", VKI_O_RDONLY, 0); + if (!sres.isError) + intfd = sres.val; } - intfd = sres.val; + if (intfd == -1) { + sres = VG_(open)(buf, VKI_O_RDONLY, 0); + if (sres.isError) { + VG_(printf)("valgrind: m_ume.c: can't open interpreter\n"); + VG_(exit)(1); + } + intfd = sres.val; + } interp = readelf(intfd, buf); if (interp == NULL) { |
|
From: Josef W. <Jos...@gm...> - 2005-10-08 22:22:23
|
On Saturday 08 October 2005 23:25, Stephen McCamant wrote: > Current versions of Valgrind work less-than-perfectly with the latest > version of glibc in Debian unstable (Debian version 2.3.5-6). Part of > the problem seems to be that this build of libc6 manages to include > even less symbol information for /lib/ld-linux.so.2 than the 2.3.2 > build from sarge did (or it's in a format that Valgrind can't > handle). One of the relevant symbols is _dl_relocate_object, > ... AFAIK, this only happens if the runtime linker (/lib/ld-2.3.5.so) is stripped. Valgrind reads both the normal symbol table and the dynamic symbol table. The symbols disappeared from the dynamic table; and stripping gets rid of the normal symbol table. Actually, I do not understand why on earth a distribution ships a stripped runtime linker. Note that this has nothing to do with debug information (!). I fiddled around with this before the callgrind release, as I need to detect "_dl_runtime_resolve". After a bug report for a callgrind alpha, I suspected Suse 10.0 to be shipped with such a stripped runtime linker, but obviously they decided against. > Conditional jump or move depends on uninitialised value(s) > at 0x1B8ECB13: (within /lib/ld-2.3.5.so) > by 0x1B8E631C: (within /lib/ld-2.3.5.so) > by 0x1B8F2BDD: (within /lib/ld-2.3.5.so) > by 0x1B8E7675: (within /lib/ld-2.3.5.so) > by 0x1B8E47C6: (within /lib/ld-2.3.5.so) There is a nice tool, the fenris debugger, see http://www.bindview.com/Services/RAZOR/Utilities/Unix_Linux/fenris_index.cfm which includes a tool to regenerate the symbol table of a stripped lib or binary (by using fingerprints of function codes). Perhaps this way, you are able to "revive" the runtime linker? > where I got the first stack trace above). Unfortunately, I don't > believe there's any standard way (equivalent to LD_LIBRARY_PATH) to > tell programs to use a different dynamic linker: "/lib/ld-linux.so.2" > is hardcoded in binaries. There is: try "/lib/ld-linux.so.2 /bin/ls" You can start the runtime linker directly with the executables name as argument. This will run the executable with the specified runtime linker. > Since Valgrind loads ld-linux.so.2 for its guests, it would be helpful > if there was a way it could be told to switch in a different version. But only if there is one... Josef |
|
From: Julian S. <js...@ac...> - 2005-10-09 00:52:17
|
> With the latest libc version, no symbol information shows up at all: > > Conditional jump or move depends on uninitialised value(s) > at 0x1B8ECB13: (within /lib/ld-2.3.5.so) > by 0x1B8E631C: (within /lib/ld-2.3.5.so) > by 0x1B8F2BDD: (within /lib/ld-2.3.5.so) > by 0x1B8E7675: (within /lib/ld-2.3.5.so) > by 0x1B8E47C6: (within /lib/ld-2.3.5.so) The same thing (or similar) happened in the release version of SuSE 9.3. Shortly thereafter SuSE put a non-stripped version on their online update and that made the problem go away, and they seem to have stayed with that in SuSE 10.0 as Josef observes. Which is excellent. I see what the patch is for and it looks plausible. However to me it is fixing the symptom -- the fundamental problem is that the glibc packagers insist on removing ever more debugging info. > By the way, are any of the Debian valgrind maintainers regular readers > of this list? If not, I'll forward this information to the BTS. Please hassle the Debian glibc maintainers. Point out that they can't expect to strip off all vestiges of debug info and still have a debuggable system. At least leave the symbols on. /lib/ld-2.3.5.so is only on the order of 110k even with symbols on, so there's really very little to be gained from stripping it anyway. J |