|
From: Paul S. <ps...@no...> - 2005-03-26 00:40:55
|
Hi all. We're using valgrind in a UML simulator environment. As such, the tools we build are kind of cross-compiled, in that even though they are built on and run on x86 hardware, they don't build against the native libraries etc.; instead they build against a separate root filesystem with different libc (+all other libs), ld.so, etc. Even a different kernel version/kernel headers. The environment works well, and all the normal tools we build all run properly, etc. so I'm pretty confident that all the path/compile/link/runtime/etc. issues are all worked out. So, I've downloaded valgrind 2.4.0 and built it using the cross-environment and installed it into the UML root filesystem. I'm building with --disable-pie (valgrind dumps core immediately if I don't) and --with-x=no (if that matters). I can run valgrind itself and it works OK: # ./valgrind -v valgrind: no program specified valgrind: Use --help for more information. But, when I try to valgrind ANY program, even trivial ones like echo, it always fails with a segfault in ld.so: # /bin/echo hi hi # ./valgrind -v /bin/echo hi ==975== Memcheck, a memory error detector for x86-linux. ==975== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al. ==975== Using valgrind-2.4.0, a program supervision framework for x86-linux. ==975== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al. ==975== Valgrind library directory: /opt/msp/tools/lib/valgrind ==975== Command line ==975== /bin/echo ==975== hi ==975== Startup, with flags: ==975== -v ==975== Contents of /proc/version: ==975== Linux version 2.4.22-010xxx-1um (foo@bar) (gcc version 3.2 20020903) #1 Fri Sep 10 12:49:10 EDT 2004 ==975== Reading syms from /bin/echo (0x8048000) ==975== object doesn't have a symbol table ==975== object doesn't have any debug info ==975== Reading syms from /lib/ld-2.3.2.so (0x1B8E4000) ==975== object doesn't have any debug info ==975== Reading syms from /opt/tools/lib/valgrind/stage2 (0xB0000000) ==975== Reading syms from /lib/ld-2.3.2.so (0xB1000000) ==975== object doesn't have any debug info ==975== Reading syms from /lib/libdl-2.3.2.so (0xB1014000) ==975== object doesn't have any debug info ==975== Reading syms from /lib/libc-2.3.2.so (0xB1018000) ==975== object doesn't have a symbol table ==975== object doesn't have any debug info ==975== Reading syms from /opt/tools/lib/valgrind/vgskin_memcheck.so (0xB1256000) ==975== Reading suppressions file: /opt/tools/lib/valgrind/default.supp ==975== <...right here it waits for a long time: about a minute or so...> ==975== ==975== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==975== Access not within mapped region at address 0xBF7FFC61 ==975== at 0x1B8E4A50: (within /lib/ld-2.3.2.so) ==975== ==975== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ==975== malloc/free: in use at exit: 0 bytes in 0 blocks. ==975== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. ==975== ==975== No malloc'd blocks -- no leaks are possible. --975-- TT/TC: 0 tc sectors discarded. --975-- 0 tt_fast misses. --975-- translate: new 0 (0 -> 0; ratio 0:10) --975-- discard 0 (0 -> 0; ratio 0:10). --975-- chainings: 0 chainings, 0 unchainings. --975-- dispatch: 0 jumps (bb entries); of them 0 (0%) unchained. --975-- 1/1 major/minor sched events. --975-- reg-alloc: 0 t-req-spill, 0+0 orig+spill uis, --975-- 0 total-reg-rank --975-- sanity: 2 cheap, 1 expensive checks. --975-- No ccalls Segmentation fault Anyone have any ideas, or pointers for me to move forward solving this problem? Thanks! -- ------------------------------------------------------------------------------- Paul D. Smith <ps...@no...> HASMAT: HA Software Mthds & Tools "Please remain calm...I may be mad, but I am a professional." --Mad Scientist ------------------------------------------------------------------------------- These are my opinions---Nortel Networks takes no responsibility for them. |
|
From: Jeremy F. <je...@go...> - 2005-03-26 02:18:57
|
Paul Smith wrote:
>The environment works well, and all the normal tools we build all run
>properly, etc. so I'm pretty confident that all the
>path/compile/link/runtime/etc. issues are all worked out.
>
>
I also build Valgrind in a cross environment; that should be OK.
> ==975==
> ==975== Process terminating with default action of signal 11 (SIGSEGV): dumping core
> ==975== Access not within mapped region at address 0xBF7FFC61
> ==975== at 0x1B8E4A50: (within /lib/ld-2.3.2.so)
>
>
This address (0xBF7FFC61) is within Valgrind's address space, so it
looks like a Valgrind-internal error, though for some reason it didn't
trigger the normal internal-error printing code.
>Anyone have any ideas, or pointers for me to move forward solving this
>problem?
>
>
We've never tested running inside UML, so that's one unknown. Does it
happen with other tools (try --tool=none)?
J
|
|
From: Paul S. <ps...@no...> - 2005-03-26 13:26:25
|
%% Jeremy Fitzhardinge <je...@go...> writes: jf> I also build Valgrind in a cross environment; that should be OK. Yep, I just wanted to give some background since the error seemed to be in the runtime environment. We actually had a version of Valgrind 2.0.0 kinda-sorta-semi-working in an environment similar to this one. >> ==975== >> ==975== Process terminating with default action of signal 11 (SIGSEGV): dumping core >> ==975== Access not within mapped region at address 0xBF7FFC61 >> ==975== at 0x1B8E4A50: (within /lib/ld-2.3.2.so) jf> This address (0xBF7FFC61) is within Valgrind's address space, so jf> it looks like a Valgrind-internal error, though for some reason it jf> didn't trigger the normal internal-error printing code. Do I need to build valgrind with any flags to get better reporting? >> Anyone have any ideas, or pointers for me to move forward solving this >> problem? jf> We've never tested running inside UML, so that's one unknown. jf> Does it happen with other tools (try --tool=none)? No, all the other tools work, including addrcheck. Except helgrind; that says: # /opt/tools/bin/valgrind --tool=helgrind /bin/echo hi Can't open tool "helgrind": /opt/tools/lib/valgrind/vgskin_helgrind.so: undefined symbol: vgPlain_get_current_or_recent_tid valgrind: couldn't load tool but that seems to be something different. -- ------------------------------------------------------------------------------- Paul D. Smith <ps...@no...> HASMAT: HA Software Mthds & Tools "Please remain calm...I may be mad, but I am a professional." --Mad Scientist ------------------------------------------------------------------------------- These are my opinions---Nortel Networks takes no responsibility for them. |
|
From: Jeremy F. <je...@go...> - 2005-03-29 20:55:59
|
Paul Smith wrote:
> jf> This address (0xBF7FFC61) is within Valgrind's address space, so
> jf> it looks like a Valgrind-internal error, though for some reason it
> jf> didn't trigger the normal internal-error printing code.
>
>Do I need to build valgrind with any flags to get better reporting?
>
>
No, the internal error reporting is always enabled; it helps a lot with
remote diagnosis when it works.
>No, all the other tools work, including addrcheck. Except helgrind;
>that says:
>
> # /opt/tools/bin/valgrind --tool=helgrind /bin/echo hi
> Can't open tool "helgrind": /opt/tools/lib/valgrind/vgskin_helgrind.so: undefined symbol: vgPlain_get_current_or_recent_tid
> valgrind: couldn't load tool
>
>but that seems to be something different.
>
>
Yes. 2.4.0 doesn't support helgrind, so I suspect it's picking up a 2.2
helgrind.so file.
It would be interesting to see what valgrind --tool=none cat
/proc/self/maps under each environment.
J
|
|
From: Paul S. <ps...@no...> - 2005-03-28 23:28:24
|
%% Jeremy Fitzhardinge <je...@go...> writes: >> ==975== Process terminating with default action of signal 11 (SIGSEGV): dumping core >> ==975== Access not within mapped region at address 0xBF7FFC61 >> ==975== at 0x1B8E4A50: (within /lib/ld-2.3.2.so) jf> This address (0xBF7FFC61) is within Valgrind's address space, so jf> it looks like a Valgrind-internal error, though for some reason it jf> didn't trigger the normal internal-error printing code. Hi all; I did some more poking today. First thing I discovered was that if I run the UML instance on the same version of system as it was built on (both compiled the UML kernel and ran it on Red Hat 8.0), then valgrind worked OK for most apps (but see a followup message for a different problem). If I run the UML instance on a different version of system (UML kernel built on redhat 8.0, but running it on Red Hat Enterprise Linux 3) then I see this error from valgrind. The bizarre part is that other than this the system works the same on both. And, as I mentioned in my followup email, only the memcheck tool shows this problem: the others, including addrcheck, are fine. I'm going to check with the UML folks to see what they think about this, but is there some "deeper magic" in terms of chumminess with the kernel that memcheck might assume, that the other tools don't, for example? -- ------------------------------------------------------------------------------- Paul D. Smith <ps...@no...> HASMAT: HA Software Mthds & Tools "Please remain calm...I may be mad, but I am a professional." --Mad Scientist ------------------------------------------------------------------------------- These are my opinions---Nortel Networks takes no responsibility for them. |
|
From: Jeremy F. <je...@go...> - 2005-03-29 20:41:55
|
Paul Smith wrote:
>I did some more poking today. First thing I discovered was that if I
>run the UML instance on the same version of system as it was built on
>(both compiled the UML kernel and ran it on Red Hat 8.0), then valgrind
>worked OK for most apps (but see a followup message for a different
>problem).
>
>If I run the UML instance on a different version of system (UML kernel
>built on redhat 8.0, but running it on Red Hat Enterprise Linux 3) then
>I see this error from valgrind. The bizarre part is that other than
>this the system works the same on both. And, as I mentioned in my
>followup email, only the memcheck tool shows this problem: the others,
>including addrcheck, are fine.
>
>
>I'm going to check with the UML folks to see what they think about this,
>but is there some "deeper magic" in terms of chumminess with the kernel
>that memcheck might assume, that the other tools don't, for example?
>
>
The only thing memcheck does which the other tools don't is check the
defined-ness of the arguments to syscalls, but that isn't
kernel-dependent in any way. Does UML present a kernel ABI which is
identical to a native Linux kernel (same syscall interface, etc)?
The other possibility is that UML presents a differently shaped address
space under different circumstances, which might confuse things. Are
you building Valgrind with --enable-pie? Probably not, since you seem
to be using fairly old systems.
I can't think of anything immediately helpful.
J
|
|
From: Paul S. <ps...@no...> - 2005-03-30 22:53:18
|
%% Jeremy Fitzhardinge <je...@go...> writes: jf> The only thing memcheck does which the other tools don't is check jf> the defined-ness of the arguments to syscalls, but that isn't jf> kernel-dependent in any way. Does UML present a kernel ABI which jf> is identical to a native Linux kernel (same syscall interface, jf> etc)? I believe so. You can take stuff built for native Linux, even glibc, and run it inside UML without recompiling it, assuming the userspace libraries are compatible. jf> The other possibility is that UML presents a differently shaped jf> address space under different circumstances, which might confuse jf> things. Are you building Valgrind with --enable-pie? Probably jf> not, since you seem to be using fairly old systems. I had to set --disable-pie, or else valgrind dumped core immediately upon starting up. jf> Yes. 2.4.0 doesn't support helgrind, so I suspect it's picking up jf> a 2.2 helgrind.so file. Yes, that's exactly what's happening. I even know how. jf> It would be interesting to see what valgrind --tool=none cat jf> /proc/self/maps under each environment. Ouch. OK, after I've cleaned up my setup and recreated it with a less chaotic environment, I've discovered that in reality ALL the tools cause the core dump on the RHEW-hosted system, even --tool=none: # /opt/tools/bin/valgrind --tool=none /bin/echo hi ==1026== Nulgrind, a binary JIT-compiler for x86-linux. ==1026== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote. ==1026== Using valgrind-2.4.0, a program supervision framework for x86-linux. ==1026== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al. ==1026== For more details, rerun with: -v ==1026== ==1026== ==1026== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==1026== Access not within mapped region at address 0xBF7FFC61 ==1026== at 0x3A965A50: (within /lib/ld-2.3.2.so) ==1026== Segmentation fault Hm. I must have been high on something (or, typing into the wrong window) when I thought that the other tools all worked except for memcheck. Bummer... :-/. -- ------------------------------------------------------------------------------- Paul D. Smith <ps...@no...> HASMAT: HA Software Mthds & Tools "Please remain calm...I may be mad, but I am a professional." --Mad Scientist ------------------------------------------------------------------------------- These are my opinions---Nortel Networks takes no responsibility for them. |