|
From: Ashley P. <as...@qu...> - 2005-08-25 15:19:02
|
On Thu, 2005-08-25 at 08:29 -0500, Nicholas Nethercote wrote:
> On Thu, 25 Aug 2005, Ashley Pittman wrote:
>
> > $ echo $LD_LIBRARY_PATH
> > /opt/intel/compiler81/lib
> > $ ldd `which mping`
> > libmpi.so.1.0 => /usr/lib/mpi/default/lib/libmpi.so.1.0
> > (0x40025000)
> > libmpio.so.1.0 => /usr/lib/mpi/default/lib/libmpio.so.1.0
> > (0x40083000)
> > libelan.so.1 => /usr/lib/libelan.so.1 (0x400af000)
> > libm.so.6 => /lib/i686/libm.so.6 (0x40102000)
> > libc.so.6 => /lib/i686/libc.so.6 (0x42000000)
> > libimf.so => /opt/intel/compiler81/lib/libimf.so (0x40124000)
> > librmscall.so.1 => /usr/lib/librmscall.so.1 (0x402ca000)
> > libelanctrl.so.2 => /usr/lib/libelanctrl.so.2 (0x402cf000)
> > libelan4.so.1 => /usr/lib/libelan4.so.1 (0x402d6000)
> > /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
> > [...]
> > $ mkdir tmp/v_test
> > $ cp /usr/lib/qsnet/elan4/lib/libelan.* tmp/v_test
> > $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ashley/tmp/v_test
> > $ prun -N2 ~duncant/valgrind_install/bin/valgrind mping
> > [snip]
> > /usr/lib/libelan.so.1 is a symbolic link which eventually points
> > to /usr/lib/qsnet/elan4/lib/libelan.so.1, setting my LD_LIBRARY_PATH
> > to /usr/lib/qsnet/elan4/lib/ directly I get the first style of output.
>
> So if you put /usr/lib/qsnet/elan4/lib/ in your LD_LIBRARY_PATH, you don't
> get the line number, but if you copy the contents of
> /usr/lib/qsnet/elan4/lib/ into tmp/v_test/ and put tmp/v_test/ in
> LD_LIBRARY_PATH you do get the line number? That sounds strange. Are you
> sure the .so files being picked up are exactly the same in both cases?
That's exactly what I'm seeing.
> You can try the --trace-redir=yes option... it spits out lots of debugging
> output, look for the "Just loaded" lines which tell you which .so files
> have been loaded. So you can check if you are loading the same files in
> both cases. If that doesn't clarify, you could try --trace-symtab=yes
> which does a similar thing but also emits all the symbols read. You could
> again look for differences between the two.
the --trace-redir=yes output looks as I expected, certinally no surprises that I've spotted.
stratumi:v_test> grep libelan.so v.out.0*
With LD_LIBRARY_PATH set to /usr/lib/qsnet/elan4/lib/dbg/:
--2246-- Just loaded /usr/lib/qsnet/elan4/lib/dbg/libelan.so.1 (soname=libelan.so.1),
With LD_LIBRARY_PATH set to /usr/lib/qsnet/elan4/lib/:
--2251-- Just loaded /usr/lib/qsnet/elan4/lib/libelan.so.1 (soname=libelan.so.1),
==2251== at 0x1B9CFB05: elan_createSubGroup (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==2251== at 0x1B9CFB3E: elan_createSubGroup (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
Now "cp -a /usr/lib/qsnet/elan4/lib ." and with LD_LIBRARY_PATH set to /home/ashley/tmp/v_test/lib/libelan.so.1
--2258-- Just loaded /home/ashley/tmp/v_test/lib/libelan.so.1 (soname=libelan.so.1),
Turning on symbol reads gave me more information though, here's what I
get when I grep for the function name:
With LD_LIBRARY_PATH set to /usr/lib/qsnet/elan4/lib/:
raw symbol [1552]: GLO FUN : value 0x1B909000, size 2004, name elan_createSubGroup
ignore -- valu=0: elan_createSubGroup
raw symbol [94]: GLO FUN : value 0x1B909000, size 2004, name elan_createSubGroup
ignore -- valu=0: elan_createSubGroup
raw symbol [362]: GLO FUN : value 0x1B9CF784, size 2991, name elan_createSubGroup
raw symbol [41]: GLO FUN : value 0x1B9CF784, size 2991, name elan_createSubGroup
choosing between 'elan_createSubGroup' and 'elan_createSubGroup'
==2262== at 0x1B9CFB05: elan_createSubGroup (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==2262== at 0x1B9CFB3E: elan_createSubGroup (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
with LD_LIBRARY_PATH set to /home/ashley/tmp/v_test/lib/libelan.so.1
raw symbol [1552]: GLO FUN : value 0x1B909000, size 2004, name elan_createSubGroup
ignore -- valu=0: elan_createSubGroup
raw symbol [94]: GLO FUN : value 0x1B909000, size 2004, name elan_createSubGroup
ignore -- valu=0: elan_createSubGroup
raw symbol [362]: GLO FUN : value 0x1B9CF784, size 2991, name elan_createSubGroup
raw symbol [41]: GLO FUN : value 0x1B9CF784, size 2991, name elan_createSubGroup
choosing between 'elan_createSubGroup' and 'elan_createSubGroup'
==2260== at 0x1B9CFB05: elan_createSubGroup (common/groupSub.c:176)
==2260== at 0x1B9CFB3E: elan_createSubGroup (common/groupSub.c:192)
The "choosing between" lines here are interesting, there are 9186
instances of this line being printed in each case and as far as I can
tell they are all alike.
The [1552] instance is where createSubGroup is called from mpi.so, the
[94] instance is where it's called from pmpi.so, the [362] and [41]
appear to be duplicates from the same file?
soname=libelan.so.1
shoff = 1202604, shnum = 27, size = 40, n_vg_oimage = 1232372
.dynsym : 0xB0AD619C .. 0xB0AF988C
.dynstr : 0xB0AD850C .. 0xB0ADA71C
.plt : 0x1B998C3C .. 0x1B99A45B
.eh_frame : 0xB0B24C34 .. 0xB0B24C37
.got : 0x1B9E2D28 .. 0x1B9E341B
.stab : 0xB0B25420 .. 0xB0B9685B
.stabstr : 0xB0B9685C .. 0xB0BCFF71
.symtab : 0xB0BFADE4 .. 0xB0C325D4
.strtab : 0xB0BFE564 .. 0xB0C01DF3
Reading symbol table (888 entries)
<snip>
raw symbol [362]: GLO FUN : value 0x1B9CF784, size 2991, name
elan_createSubGroup
<snip>
raw symbol [885]: GLO FUN : value 0x1B9A17A0, size 1068, name
elan_hgsyncNet
raw symbol [886]: WEA NOT : value 0x1B992000, size 0, name
__gmon_start__
raw symbol [887]: GLO FUN : value 0x1B992000, size 31, name
strcpy@@GLIBC_2.0
ignore -- valu=0: strcpy@@GLIBC_2.0
Reading dynamic symbol table (567 entries)
raw symbol [1]: LOC SEC : value 0x1B992094, size 0, name NONAME
raw symbol [2]: LOC SEC : value 0x1B99319C, size 0, name NONAME
raw symbol [3]: LOC SEC : value 0x1B99550C, size 0, name NONAME
raw symbol [4]: LOC SEC : value 0x1B99771E, size 0, name NONAME
<snip>
raw symbol [41]: GLO FUN : value 0x1B9CF784, size 2991, name
elan_createSubGroup
stratumi:v_test> !nm
nm /usr/lib/mpi/default/lib/libmpi.so.1.0 | grep elan_cr
U elan_createSubGroup
stratumi:v_test> nm lib/libelan.so.1 | grep elan_createSubGroup
0003d784 T elan_createSubGroup
Another odd thing, between the two runs a lot of the pointers values
changed by a number of pages, almost as if the library's were being
loaded in a different order?
I can send you more information but the log files are 3Mb each and there
are four of them, I could probably put them on a ftp site it you require
though.
> Also if you can reduce this to a small test case that would be very
> helpful.
Well, I can point you at the RPM but unless you have a sumpercomputer to
run this on it won't get you much very far, I'm working on a smaller
reproducer but haven't got one yet.
Ashley,
|