|
From: Dave N. <dc...@us...> - 2006-01-26 19:01:18
|
I am running on an IBM Power5 machine running SuSe Enterprise Server 9
and am getting the following crash when starting up Valgrind (copied
from SVN on 1-18) on any 64 bit program:
--24861-- Command line
--24861-- /home/dcn/null
--24861-- Startup, with flags:
--24861-- -v
--24861-- Contents of /proc/version:
--24861-- Linux version 2.6.5-7.97-pseries64.nm (geeko@buildhost) (gcc
version 3.3.3 (SuSE Linux)) #1 SMP Thu Mar 31 10:55:11 PST 2005
--24861-- Arch and subarch: PPC64, ppc64-int-and-fp
--24861-- Valgrind library directory:
/home/dcn/svn/1-18/install_bin/lib/valgrind/
--24861-- Reading syms from /lib64/ld-2.3.3.so (0x4000000)
--24861-- Reading syms from /home/dcn/null (0x10000000)
--24861-- Reading syms from
/home/dcn/svn/1-18/install_bin/lib/valgrind/ppc64-linux/memcheck
(0x70000000)
--24861-- object doesn't have a dynamic symbol table
--24861-- Reading suppressions file:
/home/dcn/svn/1-18/install_bin/lib/valgrind//default.supp
==24861== Jump to the invalid address stated on the next line
==24861== at 0x53E0: ???
==24861== Address 0x53E0 is not stack'd, malloc'd or (recently) free'd
==24861==
==24861== Process terminating with default action of signal 11
(SIGSEGV)==24861== Bad permissions for mapped region at address 0x53E0
==24861== at 0x53E0: ???
After some debugging on an IBM PPC970 where the same SVN version of
valgrind works just fine on 64 bit programs I noticed that one
difference was the dynamic loader: Power5 -> /lib64/ld-2.3.3.so
970 -> /lib64/ld-2.3.4.so
I think the significant difference between these two loaders is the
load address (objdump -h) :
970:LOAD off 0x0000000000000000 vaddr 0x000000806cb20000 paddr
0x000000806cb20000 align 2**16
P5:LOAD off 0x0000000000000000 vaddr 0x0000000000000000 paddr
0x0000000000000000 align 2**16
While debugging on both machines I noticed that vaddr==0 causes
Load_ELF to traverse different paths and I'm guessing that you haven't
tried out the vaddr=0 path. The comment in load_ELF says:
("Otherwise" means dynamically linked)
- Otherwise, we need to use mapelf() a second time to load the
interpreter. The interpreter can go anywhere, but mapelf() wants
to be told a specific address to put it at. So an advisory query
is passed to aspacem, asking where it would put an anonymous
client mapping of size INTERP_SIZE. That address is then used
as the mapping address for the interpreter.
While debugging on the Power5 I see the code allocating a segment at
0x400000:
Breakpoint 19, load_ELF (fd=0x6, name=0x1 <Address 0x1 out of bounds>,
info=0x70bc1ed0) at m_ume.c:481
481 (void)mapelf(interp, (ESZ(Addr))advised - interp_addr);
(gdb) a
483 VG_(close)(interp->fd);
(gdb) p interp_addr
$103 = 0x0
(gdb) p advised
$104 = 0x4000000
If I step a bit further where the entry address is calculated:
(gdb) a
485 entry = (void *)(advised - interp_addr + interp->e.e_entry);
(gdb) a
486 info->interp_base = (ESZ(Addr))advised;
(gdb) p entry
$105 = (void *) 0x402d2d0
(gdb) p interp->e.e_entry
$106 = 0x2d2d0
(gdb) x/4x 0x402d2d0
0x402d2d0: 0x00000000 0x000053e0 0x00000000 0x000363e8
Eventually this entry address 0x53e0 will be propagated out to start up
the dynamic linker and get the segfault. My guess is that this value
needs to be relocated relative to 0x400000 but I'm not that familiar
at how this should work.
If you need more information to re-create this please let me know.
Is this the correct forum for this sort of report or should I be
submitting a Bugzilla?
What is the criteria I should use for deciding where to submit future
problem reports?
Bugzilla ==> reports against official Valgrind releases?
|
|
From: Julian S. <js...@ac...> - 2006-01-26 19:24:34
|
Dave > I am running on an IBM Power5 machine running SuSe Enterprise Server 9 > and am getting the following crash when starting up Valgrind (copied > from SVN on 1-18) on any 64 bit program: > ==24861== Process terminating with default action of signal 11 > (SIGSEGV)==24861== Bad permissions for mapped region at address 0x53E0 > ==24861== at 0x53E0: ??? I also tried it out on POWER5 / SLES9 a few days ago and fixed exactly this. It's because the initial PC and toc pointer are not properly relocated (as you also noticed). > Eventually this entry address 0x53e0 will be propagated out to start up > the dynamic linker and get the segfault. My guess is that this value > needs to be relocated relative to 0x400000 but I'm not that familiar > at how this should work. Good detective work. That's exactly the same diagnosis I came to, and I fixed it in valgrind r5576, last Friday. Can you svn up and try again? > If you need more information to re-create this please let me know. > > Is this the correct forum for this sort of report or should I be > submitting a Bugzilla? For code which is currently under active development (you are keeping an eye on the svn commit messages, right?), this is the right kind of place. For stable/official releases, or for bugs in code which hasn't been hacked around recently, bugzilla is better. The line is a bit hazy. > What is the criteria I should use for deciding where to submit future > problem reports? I would only add that if you're reporting bugs against recent dev code, it's worth svn up-ing and rebuilding everything from scratch before chasing a bug. The rate of change can sometimes be very high, as it has been these past couple of weeks for ppc64. Good to see you folks exercising the ppc32/64 port. J |