|
From: t. s. u. <sc...@ap...> - 2005-02-12 23:03:57
|
This is weird: Anthing option argument I give to any program I'm running valgrind crashes valgrind - but only if I'm not in my home directory. gusgus:~[1022:0]> valgrind --tool=memcheck true --ignored ==3066== Memcheck, a memory error detector for x86-linux. ==3066== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al. ==3066== Using valgrind-2.2.0, a program supervision framework for x86- linux. ==3066== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al. ==3066== For more details, rerun with: -v ==3066== ==3066== ==3066== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 12 from 1) ==3066== malloc/free: in use at exit: 0 bytes in 0 blocks. ==3066== malloc/free: 30 allocs, 30 frees, 1985 bytes allocated. ==3066== For a detailed leak analysis, rerun with: --leak-check=yes ==3066== For counts of detected errors, rerun with: -v gusgus:~[1023:0]> mkdir t ; cd t gusgus:~/t[1024:0]> valgrind --tool=memcheck true --ignored *** glibc detected *** free(): invalid next size (normal): 0xb7db8840 *** zsh: abort valgrind --tool=memcheck true --ignored The tool doesn't appear to matter. uname -a Linux gusgus.slack 2.6.10-1.760_FC3smp #1 SMP Wed Feb 2 00:29:03 EST 2005 i686 athlon i386 GNU/Linux I'm hesitant to file a bug on this one - I'm thinking maybe something else is wrong with the machine. I'll probably file one later .... -- t. scott urban <sc...@ap...> |
|
From: Paul P. <ppl...@gm...> - 2005-02-12 23:32:18
|
> gusgus:~/t[1024:0]> valgrind --tool=memcheck true --ignored > *** glibc detected *** free(): invalid next size (normal): 0xb7db8840 *** Glibc detected that VG is free()ing a block, which has invalid next pointer (or something like that) > Linux gusgus.slack 2.6.10-1.760_FC3smp #1 SMP Wed Feb 2 00:29:03 EST Does not reproduce for me using VG-2.2.0 and 2.6.9-1.667smp#1 (original FC3 kernel) and glibc-2.3.3-74. In fact, the 'next size' string doesn't appear in my glibc at all. What version of glibc do you have? Cheers, |
|
From: t. s. u. <sc...@ap...> - 2005-02-12 23:41:39
|
On Sat, 2005-02-12 at 15:32 -0800, Paul Pluzhnikov wrote:
> > gusgus:~/t[1024:0]> valgrind --tool=memcheck true --ignored
> > *** glibc detected *** free(): invalid next size (normal): 0xb7db8840 ***
>
> Glibc detected that VG is free()ing a block, which has invalid next
> pointer (or something like that)
>
> > Linux gusgus.slack 2.6.10-1.760_FC3smp #1 SMP Wed Feb 2 00:29:03 EST
>
> Does not reproduce for me using VG-2.2.0 and 2.6.9-1.667smp#1
> (original FC3 kernel)
> and glibc-2.3.3-74. In fact, the 'next size' string doesn't appear in
> my glibc at all.
>
> What version of glibc do you have?
glibc-2.3.4-2.fc3
Have you not update your fedora machine?
The problem happens when *any* arguments are passed to the valgrind
client program.
Where that string is found for me:
gusgus:~[1004:0]> ldd /bin/true
libc.so.6 => /lib/tls/libc.so.6 (0x003b7000)
/lib/ld-linux.so.2 (0x0039e000)
gusgus:~[1005:0]> rpm -qf /lib/tls/libc.so.6
glibc-2.3.4-2.fc3
gusgus:~[1008:0]> strings /lib/tls/libc.so.6 | grep "invalid next size
(normal)"
free(): invalid next size (normal)
Perhaps I should be turning off the glibc error checking? Or perhaps the
latest version is broken/incompatible with the "hoops" valgrind is
jumping through?
Thanks
scott
--
t. scott urban <sc...@ap...>
|
|
From: t. s. u. <sc...@ap...> - 2005-02-13 00:06:28
Attachments:
bt.txt.gz
|
On Sat, 2005-02-12 at 18:44 -0500, c.s...@co... wrote: <snip> > > Not sure what OP is using but /lib/tls/libc-2.3.4.so contains "free(): > invalid next size (normal)" > > It'd be nice to see the backtrace of this run under gdb. % gdb --args valgrind --tool=memcheck /bin/true foo (gdb) r Starting program: /usr/local/bin/valgrind --tool=memcheck /bin/true foo *** glibc detected *** free(): invalid next size (normal): 0xb7db8860 *** Program received signal SIGABRT, Aborted. 0xb10007a2 in ?? () (gdb) bt 248 frames of ?? () #249 0x0804bd3c in _nl_load_domain () 3 frames of ?? () #253 0x08048ac0 in as_unpad (start=0x0, end=0x0, padfile=0) at ume.c:231 Previous frame inner to this frame (corrupt stack?) - see attachment for full backtrace This, again, anywhere but in my home directory. -- t. scott urban <sc...@ap...> |
|
From: t. s. u. <sc...@ap...> - 2005-02-13 01:58:49
|
On Sat, 2005-02-12 at 19:22 -0500, c.s...@co... wrote: > On Sat, Feb 12, 2005 at 04:06:17PM -0800, t. scott urban wrote: > > On Sat, 2005-02-12 at 18:44 -0500, c.s...@co... wrote: > > <snip> > > > > > > Not sure what OP is using but /lib/tls/libc-2.3.4.so contains "free(): > > > invalid next size (normal)" > > > > > > It'd be nice to see the backtrace of this run under gdb. > > > > > > > > % gdb --args valgrind --tool=memcheck /bin/true foo > > (gdb) r > > Starting program: /usr/local/bin/valgrind --tool=memcheck /bin/true foo > > *** glibc detected *** free(): invalid next size (normal): 0xb7db8860 > > *** > > > > Program received signal SIGABRT, Aborted. > > 0xb10007a2 in ?? () > > > > (gdb) bt > > 248 frames of ?? () > > #249 0x0804bd3c in _nl_load_domain () > > 3 frames of ?? () > > #253 0x08048ac0 in as_unpad (start=0x0, end=0x0, padfile=0) at ume.c:231 > > Previous frame inner to this frame (corrupt stack?) > > crap. I think this is some bug making gcc unable to find the top of > the stack. I had to compile gdb 6.2.1 on my FC3 system to get any > useful interaction between vg and gdb. Although, I was trying to > attach gdb using --db-attach, so maybe this is something different. > > If you want to try a quick work-around you might install an old gdb > (pre 6.0) and give it a whirl. The one from FC1 should work in a pinch: I built and tried gdb-6.3 gdb 6.2 produces backtrace output like the previous attachment. gdb 5.3 gives this: (gdb) r Starting program: /usr/local/bin/valgrind --tool=memcheck /bin/true foo *** glibc detected *** free(): invalid next size (normal): 0xb7db8830 *** Program received signal SIGABRT, Aborted. 0xb10007a2 in ?? () (gdb) bt #0 0xb10007a2 in ?? () #1 0xb7ee3319 in ?? () #2 0xb7f14f9a in ?? () #3 0xb7f1b528 in ?? () #4 0xb7f1bafa in ?? () #5 0xb7fe533c in ?? () #6 0xb7fe4e3c in ?? () #7 0xb005a0f7 in ?? () #8 0xb0026f1c in ?? () #9 0xb7ecee33 in ?? () -- t. scott urban <sc...@ap...> |
|
From: t. s. u. <sc...@ap...> - 2005-02-13 18:35:25
Attachments:
vg-strace-home.gz
vg-strace-other.gz
|
On Sat, 2005-02-12 at 21:41 -0500, c.s...@co... wrote: > I'm guessing gdb 6.3 attaches just fine when you're not in your home > directory? Nope. > Maybe try something like: > > # cd ~; strace valgrind --tool=memcheck /bin/true foo 2> vg-strace-home > # cd t; strace valgrind --tool=memcheck /bin/true foo 2> ../vg-strace-other > > and see where they really start to differ? Attached the output - I used the -f option to strace, too. Nothing looks meaningfully different until the read of ./.valgrindrc - it's found in the home-directory run because I have one there, not found in the other- directory run. After that, the home-directory run starts opererating directly on stderr with fcntl, but the other version does an open on "/dev/tty" - where glibc complains. So the bit about the .valgrindrc might be interesting. In the description below [ABORT] -> *** glibc detected *** free(): invalid next size (normal): 0xb7db8810*** zsh: abort valgrind --tool=memcheck /bin/true foo [OK] -> normal execution. When I remove the file, no problem: gusgus:~> cd ~ gusgus:~> rm -f .valgrindrc gusgus:~> valgrind --tool=memcheck /bin/true foo [OK] gusgus:~> cd t gusgus:~/t> valgrind --tool=memcheck /bin/true foo [0K] When I put something in the file in the other dir: gusgus:~/t> echo '--num-callers=40' > .valgrindrc gusgus:~/t> valgrind --tool=memcheck /bin/true foo [ABORT] If I put that file in home, then I get the original behavior: ok in home, croak elsewhere. Any option in that file causes it, if the file is empty, it's ok. Even non-valgrind understood text in there causes it. Ok removing the .valgrindrc files. Trying the environment variable: gusgus:~> cd ~ gusgus:~> export VALGRIND_OPTS=--num-callers=10 gusgus:~> valgrind --tool=memcheck /bin/true foo [ABORT] gusgus:~> cd t gusgus:~/t> valgrind --tool=memcheck /bin/true foo [ABORT] Aborts from both places! Not just my user or anything - made a dummy account - same thing. Grasping at straws a bit here. > Another thing to try is: > > gdb> start > and then step into main and see where it dies. I tried this before, but it didn't seem very useful - I'll do it again and post the results in a bit. Thanks -- t. scott urban <sc...@ap...> |
|
From: t. s. u. <sc...@ap...> - 2005-02-13 21:35:21
|
On Sun, 2005-02-13 at 15:26 -0500, c.s...@co... wrote:
> On Sun, Feb 13, 2005 at 10:35:14AM -0800, t. scott urban wrote:
> > On Sat, 2005-02-12 at 21:41 -0500, c.s...@co... wrote:
> > > I'm guessing gdb 6.3 attaches just fine when you're not in your home
> > > directory?
> >
> > Nope.
> >
> >
> > > Maybe try something like:
> > >
> > > # cd ~; strace valgrind --tool=memcheck /bin/true foo 2> vg-strace-home
> > > # cd t; strace valgrind --tool=memcheck /bin/true foo 2> ../vg-strace-other
> > >
> > > and see where they really start to differ?
> >
> >
> > Attached the output - I used the -f option to strace, too. Nothing looks
> > meaningfully different until the read of ./.valgrindrc - it's found in
> > the home-directory run because I have one there, not found in the other-
> > directory run. After that, the home-directory run starts opererating
> > directly on stderr with fcntl, but the other version does an open on
> > "/dev/tty" - where glibc complains.
>
> Actually, I think what's happening is the old_mmap above that line is
> failing, and glibc is opening /dev/tty to report the error.
>
> But, congrats, you've found the error, and I can reproduce this. From the
> strace it appears that vg reads both ~/.valgrindrc and ./.valgrindrc.
> If these are not the same file (i.e. ./ != ~/) and one exists but the
> other doesn't.. BOOM!
>
> IMO it's very likely a vg memory bug in the config file handling,
> probably quite easy to fix. I don't actually have vg source, so some
> vg developer will have to confirm or deny. Maybe it's already fixed
> in CVS? This is pretty subtle so I wouldn't be surprised if you're
> the first person to characterize this behavior so clearly.
>
> Anybody want to file a bug?
> -chris
I guess I'll go ahead and do it.
> > > Another thing to try is:
> > >
> > > gdb> start
> > > and then step into main and see where it dies.
> >
> > I tried this before, but it didn't seem very useful - I'll do it again
> > and post the results in a bit.
>
> I'm curious to see this because my gdb (6.2.1) can run vg in this
> failure case, and step right up to the error. If I had the source it
> would show me more than just line numbers, too:
> 157 in stage1.c
> (gdb)
> hoops () at stage1.c:203
> 203 in stage1.c
> (gdb)
> 209 in stage1.c
> (gdb)
> ume_go (eip=2969569216, esp=3220142560) at x86/ume_go.c:36
> 36 x86/ume_go.c: No such file or directory.
> in x86/ume_go.c
> (gdb)
> Segmentation fault
I've got the source and a debug version of valgrind. I can step just
fine until it gets to a particular assembler directive, then it's toast.
The line numbers are similar to yours, and I get get the same results
with gdb-5.3, gdb-6.2.1, and gdb-6.3.
I tried differential debugging of the home and the other run, but didn't
see much different in gdb. That asm stuff is really throwing me off and
the debugger off, seemingly.
Here's the part of the run you were on:
Breakpoint 3, hoops () at stage1.c:209
209 ume_go(info.init_eip, (addr_t)esp);
(gdb) s
ume_go (eip=2969569216, esp=3219549504) at x86/ume_go.c:36
36 asm volatile ("movl %1, %%esp;" /* set esp */
(gdb) l
24 The GNU General Public License is contained in the file
COPYING.
25 */
26
27 #include "ume_arch.h"
28
29 /*
30 Jump to a particular EIP with a particular ESP. This is
intended
31 to simulate the initial CPU state when the kernel starts an
program
32 after exec; it therefore also clears all the other registers.
33 */
34 void ume_go(addr_t eip, addr_t esp)
35 {
36 asm volatile ("movl %1, %%esp;" /* set esp */
37 "pushl %%eax;" /* push esp */
38 "xorl %%eax,%%eax;" /* clear registers */
39 "xorl %%ebx,%%ebx;"
40 "xorl %%ecx,%%ecx;"
41 "xorl %%edx,%%edx;"
42 "xorl %%esi,%%esi;"
43 "xorl %%edi,%%edi;"
44 "xorl %%ebp,%%ebp;"
45
46 "ret" /* return into entry */
47 : : "a" (eip), "r" (esp));
48 /* we should never get here */
--
t. scott urban <sc...@ap...>
|
|
From: t. s. u. <sc...@ap...> - 2005-02-13 23:54:42
|
Apparently this is a known bug that was fixed in valgrind CVS sometime in September or October. Thanks Chris for looking at this with me. -- t. scott urban <sc...@ap...> |