|
From: Ashley P. <as...@qu...> - 2005-08-24 17:41:20
|
Hi,
Is it possible to write suppressions at line number resolution rather
than just the function names, I've found a couple of false positives
that I need to suppress and it seems a bit of a sledgehammer technique
to suppress the whole function. Currently I've got this:
{
subgroup gather check
Memcheck:Cond
fun:elan_createSubGroup
}
Also, one thing I've spotted, if an error is found in a installed shared
library the report only shows the filename but if the library is loaded
from a LD_LIBRARY_PATH setting it gets a full filename:line_number. Is
there a way to turn this behaviour on by default?
Ashley,
|
|
From: Nicholas N. <nj...@cs...> - 2005-08-24 18:08:32
|
On Wed, 24 Aug 2005, Ashley Pittman wrote: > Is it possible to write suppressions at line number resolution rather > than just the function names No. But Joseph M Link wrote a patch just recently that lets you do this (see "Patch to support suppressions by source file" on valgrind-users). Perhaps this should go into the repository. > Also, one thing I've spotted, if an error is found in a installed shared > library the report only shows the filename but if the library is loaded > from a LD_LIBRARY_PATH setting it gets a full filename:line_number. Is > there a way to turn this behaviour on by default? That isn't intentional, it sounds like a bug. Can you give a more specific example? Nick |
|
From: Rob H. <ti...@ge...> - 2005-08-24 18:28:17
|
On Wed, 2005-08-24 at 13:08 -0500, Nicholas Nethercote wrote: > > Also, one thing I've spotted, if an error is found in a installed share= d > > library the report only shows the filename but if the library is loaded > > from a LD_LIBRARY_PATH setting it gets a full filename:line_number. Is > > there a way to turn this behaviour on by default? Sounds like the normal library was compiled without debugging symbols and the LD_LIBRARY_PATH one has them to me. Check what "file" says about both libraries. --=20 |
|
From: Ashley P. <as...@qu...> - 2005-08-25 07:52:39
|
On Wed, 2005-08-24 at 13:08 -0500, Nicholas Nethercote wrote:
> On Wed, 24 Aug 2005, Ashley Pittman wrote:
>
> > Is it possible to write suppressions at line number resolution rather
> > than just the function names
>
> No. But Joseph M Link wrote a patch just recently that lets you do this
> (see "Patch to support suppressions by source file" on valgrind-users).
> Perhaps this should go into the repository.
I can't see any harm in this functionality and indeed from your own
website: "Each error to be suppressed is described very specifically, to
minimise the possibility that a suppression-directive inadvertantly
suppresses a bunch of similar errors which you did want to see. The
suppression mechanism is designed to allow precise yet flexible
specification of errors to suppress."
> > Also, one thing I've spotted, if an error is found in a installed shared
> > library the report only shows the filename but if the library is loaded
> > from a LD_LIBRARY_PATH setting it gets a full filename:line_number. Is
> > there a way to turn this behaviour on by default?
By "only shows the filename" here I do of course mean the shared library
filename, not the source code filename.
> That isn't intentional, it sounds like a bug. Can you give a more
> specific example?
Ok, here's what I'm doing
$ echo $LD_LIBRARY_PATH
/opt/intel/compiler81/lib
$ ldd `which mping`
libmpi.so.1.0 => /usr/lib/mpi/default/lib/libmpi.so.1.0
(0x40025000)
libmpio.so.1.0 => /usr/lib/mpi/default/lib/libmpio.so.1.0
(0x40083000)
libelan.so.1 => /usr/lib/libelan.so.1 (0x400af000)
libm.so.6 => /lib/i686/libm.so.6 (0x40102000)
libc.so.6 => /lib/i686/libc.so.6 (0x42000000)
libimf.so => /opt/intel/compiler81/lib/libimf.so (0x40124000)
librmscall.so.1 => /usr/lib/librmscall.so.1 (0x402ca000)
libelanctrl.so.2 => /usr/lib/libelanctrl.so.2 (0x402cf000)
libelan4.so.1 => /usr/lib/libelan4.so.1 (0x402d6000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
$ prun -N2 ~duncant/valgrind_install/bin/valgrind mping
<snip>
==27906== Conditional jump or move depends on uninitialised value(s)
==27906== at 0x1B9CFB3E: elan_createSubGroup
(in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==27906== by 0x1B91D48E: MPID_CommInit (adi2init.c:1491)
==27906== by 0x1B93B8E3: MPIR_Init (initutil.c:403)
==27906== by 0x1B93B00B: MPI_Init (init.c:163)
==27906== by 0x8048ECC: main (in /usr/lib/mpi/mpi_intel/bin/mping)
<snip>
$ mkdir tmp/v_test
$ cp /usr/lib/qsnet/elan4/lib/libelan.* tmp/v_test
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ashley/tmp/v_test
$ prun -N2 ~duncant/valgrind_install/bin/valgrind mping
<snip>
==27908== Conditional jump or move depends on uninitialised value(s)
==27908== at 0x1B9CFB3E: elan_createSubGroup (common/groupSub.c:192)
==27908== by 0x1B91D48E: MPID_CommInit (adi2init.c:1491)
==27908== by 0x1B93B8E3: MPIR_Init (initutil.c:403)
==27908== by 0x1B93B00B: MPI_Init (init.c:163)
==27908== by 0x8048ECC: main (in /usr/lib/mpi/mpi_intel/bin/mping)
<snip>
The differencing being "(in /usr/lib/qsnet/elan4/lib/libelan.so.1)" in
the first trace and "(common/groupSub.c:192)" in the second.
At first I thought this was a neat trick that valgrind was doing but
that isn't so, the MPI routines come from a installed library as well
and they don't get the same treatment.
/usr/lib/libelan.so.1 is a symbolic link which eventually points
to /usr/lib/qsnet/elan4/lib/libelan.so.1, setting my LD_LIBRARY_PATH
to /usr/lib/qsnet/elan4/lib/ directly I get the first style of output.
Ashley,
|
|
From: Nicholas N. <nj...@cs...> - 2005-08-25 13:29:58
|
On Thu, 25 Aug 2005, Ashley Pittman wrote: > $ echo $LD_LIBRARY_PATH > /opt/intel/compiler81/lib > $ ldd `which mping` > libmpi.so.1.0 => /usr/lib/mpi/default/lib/libmpi.so.1.0 > (0x40025000) > libmpio.so.1.0 => /usr/lib/mpi/default/lib/libmpio.so.1.0 > (0x40083000) > libelan.so.1 => /usr/lib/libelan.so.1 (0x400af000) > libm.so.6 => /lib/i686/libm.so.6 (0x40102000) > libc.so.6 => /lib/i686/libc.so.6 (0x42000000) > libimf.so => /opt/intel/compiler81/lib/libimf.so (0x40124000) > librmscall.so.1 => /usr/lib/librmscall.so.1 (0x402ca000) > libelanctrl.so.2 => /usr/lib/libelanctrl.so.2 (0x402cf000) > libelan4.so.1 => /usr/lib/libelan4.so.1 (0x402d6000) > /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) > [...] > $ mkdir tmp/v_test > $ cp /usr/lib/qsnet/elan4/lib/libelan.* tmp/v_test > $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ashley/tmp/v_test > $ prun -N2 ~duncant/valgrind_install/bin/valgrind mping > [snip] > /usr/lib/libelan.so.1 is a symbolic link which eventually points > to /usr/lib/qsnet/elan4/lib/libelan.so.1, setting my LD_LIBRARY_PATH > to /usr/lib/qsnet/elan4/lib/ directly I get the first style of output. So if you put /usr/lib/qsnet/elan4/lib/ in your LD_LIBRARY_PATH, you don't get the line number, but if you copy the contents of /usr/lib/qsnet/elan4/lib/ into tmp/v_test/ and put tmp/v_test/ in LD_LIBRARY_PATH you do get the line number? That sounds strange. Are you sure the .so files being picked up are exactly the same in both cases? You can try the --trace-redir=yes option... it spits out lots of debugging output, look for the "Just loaded" lines which tell you which .so files have been loaded. So you can check if you are loading the same files in both cases. If that doesn't clarify, you could try --trace-symtab=yes which does a similar thing but also emits all the symbols read. You could again look for differences between the two. Also if you can reduce this to a small test case that would be very helpful. Nick |
|
From: Julian S. <js...@ac...> - 2005-08-25 13:44:23
|
> Also if you can reduce this to a small test case that would be very > helpful. Ashley -- yes, let me second that. I wonder if it's something to do with our symbol table reader not dereferencing symlinks when attempting to read debug info. Or something. Burble burble. J |
|
From: Ashley P. <as...@qu...> - 2005-08-25 15:19:02
|
On Thu, 2005-08-25 at 08:29 -0500, Nicholas Nethercote wrote:
> On Thu, 25 Aug 2005, Ashley Pittman wrote:
>
> > $ echo $LD_LIBRARY_PATH
> > /opt/intel/compiler81/lib
> > $ ldd `which mping`
> > libmpi.so.1.0 => /usr/lib/mpi/default/lib/libmpi.so.1.0
> > (0x40025000)
> > libmpio.so.1.0 => /usr/lib/mpi/default/lib/libmpio.so.1.0
> > (0x40083000)
> > libelan.so.1 => /usr/lib/libelan.so.1 (0x400af000)
> > libm.so.6 => /lib/i686/libm.so.6 (0x40102000)
> > libc.so.6 => /lib/i686/libc.so.6 (0x42000000)
> > libimf.so => /opt/intel/compiler81/lib/libimf.so (0x40124000)
> > librmscall.so.1 => /usr/lib/librmscall.so.1 (0x402ca000)
> > libelanctrl.so.2 => /usr/lib/libelanctrl.so.2 (0x402cf000)
> > libelan4.so.1 => /usr/lib/libelan4.so.1 (0x402d6000)
> > /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
> > [...]
> > $ mkdir tmp/v_test
> > $ cp /usr/lib/qsnet/elan4/lib/libelan.* tmp/v_test
> > $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ashley/tmp/v_test
> > $ prun -N2 ~duncant/valgrind_install/bin/valgrind mping
> > [snip]
> > /usr/lib/libelan.so.1 is a symbolic link which eventually points
> > to /usr/lib/qsnet/elan4/lib/libelan.so.1, setting my LD_LIBRARY_PATH
> > to /usr/lib/qsnet/elan4/lib/ directly I get the first style of output.
>
> So if you put /usr/lib/qsnet/elan4/lib/ in your LD_LIBRARY_PATH, you don't
> get the line number, but if you copy the contents of
> /usr/lib/qsnet/elan4/lib/ into tmp/v_test/ and put tmp/v_test/ in
> LD_LIBRARY_PATH you do get the line number? That sounds strange. Are you
> sure the .so files being picked up are exactly the same in both cases?
That's exactly what I'm seeing.
> You can try the --trace-redir=yes option... it spits out lots of debugging
> output, look for the "Just loaded" lines which tell you which .so files
> have been loaded. So you can check if you are loading the same files in
> both cases. If that doesn't clarify, you could try --trace-symtab=yes
> which does a similar thing but also emits all the symbols read. You could
> again look for differences between the two.
the --trace-redir=yes output looks as I expected, certinally no surprises that I've spotted.
stratumi:v_test> grep libelan.so v.out.0*
With LD_LIBRARY_PATH set to /usr/lib/qsnet/elan4/lib/dbg/:
--2246-- Just loaded /usr/lib/qsnet/elan4/lib/dbg/libelan.so.1 (soname=libelan.so.1),
With LD_LIBRARY_PATH set to /usr/lib/qsnet/elan4/lib/:
--2251-- Just loaded /usr/lib/qsnet/elan4/lib/libelan.so.1 (soname=libelan.so.1),
==2251== at 0x1B9CFB05: elan_createSubGroup (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==2251== at 0x1B9CFB3E: elan_createSubGroup (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
Now "cp -a /usr/lib/qsnet/elan4/lib ." and with LD_LIBRARY_PATH set to /home/ashley/tmp/v_test/lib/libelan.so.1
--2258-- Just loaded /home/ashley/tmp/v_test/lib/libelan.so.1 (soname=libelan.so.1),
Turning on symbol reads gave me more information though, here's what I
get when I grep for the function name:
With LD_LIBRARY_PATH set to /usr/lib/qsnet/elan4/lib/:
raw symbol [1552]: GLO FUN : value 0x1B909000, size 2004, name elan_createSubGroup
ignore -- valu=0: elan_createSubGroup
raw symbol [94]: GLO FUN : value 0x1B909000, size 2004, name elan_createSubGroup
ignore -- valu=0: elan_createSubGroup
raw symbol [362]: GLO FUN : value 0x1B9CF784, size 2991, name elan_createSubGroup
raw symbol [41]: GLO FUN : value 0x1B9CF784, size 2991, name elan_createSubGroup
choosing between 'elan_createSubGroup' and 'elan_createSubGroup'
==2262== at 0x1B9CFB05: elan_createSubGroup (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==2262== at 0x1B9CFB3E: elan_createSubGroup (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
with LD_LIBRARY_PATH set to /home/ashley/tmp/v_test/lib/libelan.so.1
raw symbol [1552]: GLO FUN : value 0x1B909000, size 2004, name elan_createSubGroup
ignore -- valu=0: elan_createSubGroup
raw symbol [94]: GLO FUN : value 0x1B909000, size 2004, name elan_createSubGroup
ignore -- valu=0: elan_createSubGroup
raw symbol [362]: GLO FUN : value 0x1B9CF784, size 2991, name elan_createSubGroup
raw symbol [41]: GLO FUN : value 0x1B9CF784, size 2991, name elan_createSubGroup
choosing between 'elan_createSubGroup' and 'elan_createSubGroup'
==2260== at 0x1B9CFB05: elan_createSubGroup (common/groupSub.c:176)
==2260== at 0x1B9CFB3E: elan_createSubGroup (common/groupSub.c:192)
The "choosing between" lines here are interesting, there are 9186
instances of this line being printed in each case and as far as I can
tell they are all alike.
The [1552] instance is where createSubGroup is called from mpi.so, the
[94] instance is where it's called from pmpi.so, the [362] and [41]
appear to be duplicates from the same file?
soname=libelan.so.1
shoff = 1202604, shnum = 27, size = 40, n_vg_oimage = 1232372
.dynsym : 0xB0AD619C .. 0xB0AF988C
.dynstr : 0xB0AD850C .. 0xB0ADA71C
.plt : 0x1B998C3C .. 0x1B99A45B
.eh_frame : 0xB0B24C34 .. 0xB0B24C37
.got : 0x1B9E2D28 .. 0x1B9E341B
.stab : 0xB0B25420 .. 0xB0B9685B
.stabstr : 0xB0B9685C .. 0xB0BCFF71
.symtab : 0xB0BFADE4 .. 0xB0C325D4
.strtab : 0xB0BFE564 .. 0xB0C01DF3
Reading symbol table (888 entries)
<snip>
raw symbol [362]: GLO FUN : value 0x1B9CF784, size 2991, name
elan_createSubGroup
<snip>
raw symbol [885]: GLO FUN : value 0x1B9A17A0, size 1068, name
elan_hgsyncNet
raw symbol [886]: WEA NOT : value 0x1B992000, size 0, name
__gmon_start__
raw symbol [887]: GLO FUN : value 0x1B992000, size 31, name
strcpy@@GLIBC_2.0
ignore -- valu=0: strcpy@@GLIBC_2.0
Reading dynamic symbol table (567 entries)
raw symbol [1]: LOC SEC : value 0x1B992094, size 0, name NONAME
raw symbol [2]: LOC SEC : value 0x1B99319C, size 0, name NONAME
raw symbol [3]: LOC SEC : value 0x1B99550C, size 0, name NONAME
raw symbol [4]: LOC SEC : value 0x1B99771E, size 0, name NONAME
<snip>
raw symbol [41]: GLO FUN : value 0x1B9CF784, size 2991, name
elan_createSubGroup
stratumi:v_test> !nm
nm /usr/lib/mpi/default/lib/libmpi.so.1.0 | grep elan_cr
U elan_createSubGroup
stratumi:v_test> nm lib/libelan.so.1 | grep elan_createSubGroup
0003d784 T elan_createSubGroup
Another odd thing, between the two runs a lot of the pointers values
changed by a number of pages, almost as if the library's were being
loaded in a different order?
I can send you more information but the log files are 3Mb each and there
are four of them, I could probably put them on a ftp site it you require
though.
> Also if you can reduce this to a small test case that would be very
> helpful.
Well, I can point you at the RPM but unless you have a sumpercomputer to
run this on it won't get you much very far, I'm working on a smaller
reproducer but haven't got one yet.
Ashley,
|
|
From: Ashley P. <as...@qu...> - 2005-08-25 16:37:49
|
> Another odd thing, between the two runs a lot of the pointers values > changed by a number of pages, almost as if the library's were being > loaded in a different order? This is key, it turns out I've been fooled by the RHEL4 prelinking/address space randomisation code again. Although the library's are the "same" to the extent that they come from the same rpm and the rpm passes a verify test the installed ones have been noodled in some way (not simply stripped). I was running the prun command on the head node (rh 7.3) which was then forking valgrind on a compute node (RHEL4) so when I did the copy and set LD_LIBRARY_PATH or set LD_LIBRARY_PATH to a .so that's installed but not in /usr/lib I was running a libelan.so which hadn't been touched but when I didn't set it I was running with a modified one. Running "prelink -ua" as root on the computer node causes it to output properly, "prelink -a" causes it to only print the .so name. My best guess is that in the prelinked library the "real" static symbols are being used where as in the non prelinked library the relocation table is being used and without having a handle on the relocation table valgrind isn't able to correlate the address with the source file. This isn't really my area of expertise though. Ashley, |
|
From: Nicholas N. <nj...@cs...> - 2005-08-26 03:23:20
|
On Thu, 25 Aug 2005, Ashley Pittman wrote: > This is key, it turns out I've been fooled by the RHEL4 > prelinking/address space randomisation code again. Although the > library's are the "same" to the extent that they come from the same rpm > and the rpm passes a verify test the installed ones have been noodled in > some way (not simply stripped). > [...] > My best guess is that in the prelinked library the "real" static symbols > are being used where as in the non prelinked library the relocation > table is being used and without having a handle on the relocation table > valgrind isn't able to correlate the address with the source file. This > isn't really my area of expertise though. So it sounds like this isn't really a problem with Valgrind, but an issue with the environment? Nick |
|
From: Ashley P. <as...@qu...> - 2005-08-26 09:18:32
|
On Thu, 2005-08-25 at 22:23 -0500, Nicholas Nethercote wrote: > On Thu, 25 Aug 2005, Ashley Pittman wrote: > > > This is key, it turns out I've been fooled by the RHEL4 > > prelinking/address space randomisation code again. Although the > > library's are the "same" to the extent that they come from the same rpm > > and the rpm passes a verify test the installed ones have been noodled in > > some way (not simply stripped). > > [...] > > My best guess is that in the prelinked library the "real" static symbols > > are being used where as in the non prelinked library the relocation > > table is being used and without having a handle on the relocation table > > valgrind isn't able to correlate the address with the source file. This > > isn't really my area of expertise though. > > So it sounds like this isn't really a problem with Valgrind, but an issue > with the environment? Maybe, I *thought* that prelinking worked my modifying the load base address of the library and when you did prelink -u (undo) it reverted it's changes to the file in place, this would imply that when the library is in a prelinked state debugging is there but valgrind can't extract it. It is possible however that the undo operation just copys a origional back over the top. If the former it's likely a valgrind bug, if the latter it's probably a prelink bug/feature. Now I know what's causing it I'm less concerned, I know a workaround and I'll try playing around with prelink a bit more to see if I can understand it properly. Ashley, |
|
From: Paul P. <ppl...@gm...> - 2005-08-26 14:25:26
|
On 8/26/05, Ashley Pittman <as...@qu...> wrote: > I *thought* that prelinking worked my modifying the load base > address of the library and when you did prelink -u (undo) it reverted > it's changes to the file in place, That's correct. The PT_LOAD segment addresses and relocations are updated. > this would imply that when the > library is in a prelinked state debugging is there but valgrind can't > extract it. That's what appears to be happening. However, when I tried prelinking some of my libraries, VG decoded the debug info just fine, so there is more to this apparent VG bug then just prelinking. > It is possible however that the undo operation just copys a > origional back over the top. It doesn't. Cheers, |
|
From: Ashley P. <as...@qu...> - 2005-08-26 15:24:04
|
On Fri, 2005-08-26 at 07:25 -0700, Paul Pluzhnikov wrote:
> On 8/26/05, Ashley Pittman <as...@qu...> wrote:
>
> > I *thought* that prelinking worked my modifying the load base
> > address of the library and when you did prelink -u (undo) it reverted
> > it's changes to the file in place,
>
> That's correct. The PT_LOAD segment addresses and relocations are updated.
Thanks for the info, I thought that's what it was but I wasn't 100%
sure. It's quite a nice idea when it works.
> > this would imply that when the
> > library is in a prelinked state debugging is there but valgrind can't
> > extract it.
>
> That's what appears to be happening.
> However, when I tried prelinking some of my libraries, VG decoded
> the debug info just fine, so there is more to this apparent VG bug
> then just prelinking.
Ok, I've got a reproducer here, I wrote a simple test case which
provides dud memory to the initilisation function of the library,
valgrind correctly spots the dodgy reads but doesn't show the source
line.
I've removed the use of MPI, any parallel code and the intel compiler to
keep it simple.
The library's are open source LGPL but you need a password to download
them, if you email directly me or su...@qu... we can send you
one.
[root@stratum5 v]# prelink -a
prelink: /usr/lib/mpi/mpi_intel/bin/mping: Could not find one of the dependencies
prelink: /usr/lib/mpi/mpi_gnu/bin/mping: Could not find one of the dependencies
prelink: /usr/bin/emacs-21.3: COPY relocations don't point into .bss or .sbss section
[root@stratum5 v]# /home/duncant/valgrind_install/bin/valgrind a.out
==7211== Memcheck, a memory error detector.
==7211== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==7211== Using LibVEX rev 1283, a library for dynamic binary translation.
==7211== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==7211== Using valgrind-3.0.0.SVN, a dynamic binary instrumentation framework.
==7211== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==7211== For more details, rerun with: -v
==7211==
==7211== Conditional jump or move depends on uninitialised value(s)
==7211== at 0x1B911837: init_parseEnv (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==7211== by 0x1B911CF1: elan_init (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==7211== by 0x80484DF: main (in /root/v/a.out)
ELAN_EXCEPTION @ --: 6 (Initialisation error)
elan_init: Can't get capability from environment : 0 : Success
==7211==
==7211== Conditional jump or move depends on uninitialised value(s)
==7211== at 0x1B931D79: elan_exception (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==7211== by 0x1B9119FC: init_getCap (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==7211== by 0x1B911CF9: elan_init (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==7211== by 0x80484DF: main (in /root/v/a.out)
==7211==
==7211== Conditional jump or move depends on uninitialised value(s)
==7211== at 0x1B931DC4: elan_exception (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==7211== by 0x1B9119FC: init_getCap (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==7211== by 0x1B911CF9: elan_init (in /usr/lib/qsnet/elan4/lib/libelan.so.1)
==7211== by 0x80484DF: main (in /root/v/a.out)
==7211==
==7211== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 28 from 3)
==7211== malloc/free: in use at exit: 2440 bytes in 3 blocks.
==7211== malloc/free: 3 allocs, 0 frees, 2440 bytes allocated.
==7211== For counts of detected errors, rerun with: -v
==7211== searching for pointers to 3 not-freed blocks.
==7211== checked 108936 bytes.
==7211==
==7211== LEAK SUMMARY:
==7211== definitely lost: 0 bytes in 0 blocks.
==7211== possibly lost: 0 bytes in 0 blocks.
==7211== still reachable: 2440 bytes in 3 blocks.
==7211== suppressed: 0 bytes in 0 blocks.
==7211== Reachable blocks (those to which a pointer was found) are not shown.
==7211== To see them, rerun with: --show-reachable=yes
Aborted (core dumped)
[root@stratum5 v]# prelink -u /usr/lib/qsnet/elan4/lib/libelan.so.1
[root@stratum5 v]# /home/duncant/valgrind_install/bin/valgrind a.out
==7215== Memcheck, a memory error detector.
==7215== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==7215== Using LibVEX rev 1283, a library for dynamic binary translation.
==7215== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==7215== Using valgrind-3.0.0.SVN, a dynamic binary instrumentation framework.
==7215== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==7215== For more details, rerun with: -v
==7215==
==7215== Conditional jump or move depends on uninitialised value(s)
==7215== at 0x1B911837: init_parseEnv (elan4/init.c:205)
==7215== by 0x1B911CF1: elan_init (elan4/init.c:384)
==7215== by 0x80484DF: main (in /root/v/a.out)
ELAN_EXCEPTION @ --: 6 (Initialisation error)
elan_init: Can't get capability from environment : 0 : Success
==7215==
==7215== Conditional jump or move depends on uninitialised value(s)
==7215== at 0x1B931D79: elan_exception (common/misc.c:232)
==7215== by 0x1B9119FC: init_getCap (elan4/init.c:261)
==7215== by 0x1B911CF9: elan_init (elan4/init.c:386)
==7215== by 0x80484DF: main (in /root/v/a.out)
==7215==
==7215== Conditional jump or move depends on uninitialised value(s)
==7215== at 0x1B931DC4: elan_exception (common/misc.c:239)
==7215== by 0x1B9119FC: init_getCap (elan4/init.c:261)
==7215== by 0x1B911CF9: elan_init (elan4/init.c:386)
==7215== by 0x80484DF: main (in /root/v/a.out)
==7215==
==7215== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 28 from 3)
==7215== malloc/free: in use at exit: 2440 bytes in 3 blocks.
==7215== malloc/free: 3 allocs, 0 frees, 2440 bytes allocated.
==7215== For counts of detected errors, rerun with: -v
==7215== searching for pointers to 3 not-freed blocks.
==7215== checked 108936 bytes.
==7215==
==7215== LEAK SUMMARY:
==7215== definitely lost: 0 bytes in 0 blocks.
==7215== possibly lost: 0 bytes in 0 blocks.
==7215== still reachable: 2440 bytes in 3 blocks.
==7215== suppressed: 0 bytes in 0 blocks.
==7215== Reachable blocks (those to which a pointer was found) are not shown.
==7215== To see them, rerun with: --show-reachable=yes
Aborted (core dumped)
[root@stratum5 v]# cat test.c
#include <elan/elan.h>
#include <malloc.h>
main () {
ELAN_FLAGS *i;
i = malloc(sizeof(*i));
elan_init(*i);
}
[root@stratum5 v]# ldd ./a.out
libelan.so.1 => /usr/lib/libelan.so.1 (0x003fe000)
libc.so.6 => /lib/tls/libc.so.6 (0x0011a000)
librmscall.so.1 => /usr/lib/librmscall.so.1 (0x006b9000)
libelanctrl.so.2 => /usr/lib/libelanctrl.so.2 (0x004e9000)
libelan4.so.1 => /usr/lib/libelan4.so.1 (0x007b0000)
/lib/ld-linux.so.2 (0x00101000)
|
|
From: Josef W. <Jos...@gm...> - 2005-08-26 17:47:09
|
On Friday 26 August 2005 05:23, Nicholas Nethercote wrote: > On Thu, 25 Aug 2005, Ashley Pittman wrote: > > This is key, it turns out I've been fooled by the RHEL4 > > prelinking/address space randomisation code again. Although the > > library's are the "same" to the extent that they come from the same rpm > > and the rpm passes a verify test the installed ones have been noodled in > > some way (not simply stripped). > > [...] > > My best guess is that in the prelinked library the "real" static symbols > > are being used where as in the non prelinked library the relocation > > table is being used and without having a handle on the relocation table > > valgrind isn't able to correlate the address with the source file. This > > isn't really my area of expertise though. > > So it sounds like this isn't really a problem with Valgrind, but an issue > with the environment? I do not think so. The prelinker of course does not change the debug information. Valgrind probably should take some additional address offset into account, which seems to be set to zero in the usual case without prelinking, so VG is working in the normal case. Is gdb able to show source lines with your prelinked library? Josef > > Nick > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
|
From: Ashley P. <as...@qu...> - 2005-08-30 19:39:34
|
On Fri, 2005-08-26 at 19:45 +0200, Josef Weidendorfer wrote: > On Friday 26 August 2005 05:23, Nicholas Nethercote wrote: > > So it sounds like this isn't really a problem with Valgrind, but an issue > > with the environment? > > I do not think so. > The prelinker of course does not change the debug information. > Valgrind probably should take some additional address offset into account, > which seems to be set to zero in the usual case without prelinking, so VG > is working in the normal case. I'm still working on creating a standalone reproducer for this, I can only get it to fail using one of our shared library's, not if I generate one by hand. >From what I've found today it seems to me that it may be a prelink issue although I'm not at the bottom of it yet, when valgrind is walking the stabs info (read_debuginfo_stabs in coregrind/m_debuginfo/stabs.c) one of the st->n_value values for a N_SLINE entry appears to be wrong, in the non-prelink case it's 0x97 (correct) however when the library is prelinked it's 0x4E9097 prelinked case: 1957 type=68 othr=0 desc=281 value=0x4E9097 strx=0 src prev 0x1B90FAEB next 0x1BDF8AF7 start 0x1B90FA60 offset 0x4E9097 src (unknown) gencapabilities.c line 276 0x1B90FAEB-0x1BDF8AF7 normal case: 1957 type=68 othr=0 desc=281 value=0x97 strx=0 src prev 0x1B90FAEB next 0x1B90FAF7 start 0x1B90FA60 offset 0x97 src (unknown) gencapabilities.c line 276 0x1B90FAEB-0x1B90FAF7 In both cases the few entries before this are: 1954 type=68 othr=0 desc=273 value=0x7F strx=0 src prev 0x1B90FAD9 next 0x1B90FADF start 0x1B90FA60 offset 0x7F src (unknown) gencapabilities.c line 272 0x1B90FAD9-0x1B90FADF 1955 type=68 othr=0 desc=274 value=0x85 strx=0 src prev 0x1B90FADF next 0x1B90FAE5 start 0x1B90FA60 offset 0x85 src (unknown) gencapabilities.c line 273 0x1B90FADF-0x1B90FAE5 1956 type=68 othr=0 desc=276 value=0x8B strx=0 src prev 0x1B90FAE5 next 0x1B90FAEB start 0x1B90FA60 offset 0x8B src (unknown) gencapabilities.c line 274 0x1B90FAE5-0x1B90FAEB Ashley, |
|
From: Ashley P. <as...@qu...> - 2005-08-31 14:49:49
|
On Tue, 2005-08-30 at 20:39 +0100, Ashley Pittman wrote: > >From what I've found today it seems to me that it may be a prelink issue > although I'm not at the bottom of it yet, when valgrind is walking the > stabs info (read_debuginfo_stabs in coregrind/m_debuginfo/stabs.c) one > of the st->n_value values for a N_SLINE entry appears to be wrong, in > the non-prelink case it's 0x97 (correct) however when the library is > prelinked it's 0x4E9097 I've finally got to the bottom of this, it appears to be a prelink bug, I've got a prelink patch which fixes the problem and allows correct line number reporting withing valgrind. Ashley, |
|
From: Nicholas N. <nj...@cs...> - 2005-08-31 15:24:12
|
On Wed, 31 Aug 2005, Ashley Pittman wrote: > I've finally got to the bottom of this, it appears to be a prelink bug, > I've got a prelink patch which fixes the problem and allows correct line > number reporting withing valgrind. That's good to know. Thanks for hunting this down. Nick |
|
From: Julian S. <js...@ac...> - 2005-08-31 16:18:42
|
On Wednesday 31 August 2005 16:23, Nicholas Nethercote wrote: > On Wed, 31 Aug 2005, Ashley Pittman wrote: > > I've finally got to the bottom of this, it appears to be a prelink bug, > > I've got a prelink patch which fixes the problem and allows correct line > > number reporting withing valgrind. > > That's good to know. Thanks for hunting this down. Yes indeed. Your patch -- is it for Valgrind, or for GNU binutils/whatever that does the prelinking? It wasn't clear to me which it applies to. J |
|
From: Ashley P. <as...@qu...> - 2005-08-31 16:34:43
|
On Wed, 2005-08-31 at 17:18 +0100, Julian Seward wrote: > On Wednesday 31 August 2005 16:23, Nicholas Nethercote wrote: > > On Wed, 31 Aug 2005, Ashley Pittman wrote: > > > I've finally got to the bottom of this, it appears to be a prelink bug, > > > I've got a prelink patch which fixes the problem and allows correct line > > > number reporting withing valgrind. > > > > That's good to know. Thanks for hunting this down. > > Yes indeed. > > Your patch -- is it for Valgrind, or for GNU binutils/whatever that > does the prelinking? It wasn't clear to me which it applies to. It's for prelink as can be downloaded at the following URL. There appears to be very little on-line documentation or user forums for it so I'll get in touch with the developer directly. ftp://people.redhat.com/jakub/prelink/ When prelink is changing the base load address of the shared library it walks the stabs table converting from old to new addresses for the different components. Function entry and exit points are absolute so need changing but line number entries are relative to the fn base so don't want to me adjusted. As to why my library's were affected but no others it would appear that my library's have many more N_SLINE entries than most, possibly because of the coding style that we use (although I didn't think our functions were that long) or maybe just the compiler flags. The first N_SLINE entry per file seems to work, just not subsequent ones which is odd. This patch works for me and doesn't have any side effects that I've spotted but needs looking over by someone who understands prelinking/stabs a little more than I do. diff -r -u prelink-0.0.20050610/src/stabs.c prelink-0.0.20050610-edited/src/stabs.c --- prelink-0.0.20050610/src/stabs.c 2004-09-30 17:05:22.000000000 +0100 +++ prelink-0.0.20050610-edited/src/stabs.c 2005-08-31 17:23:30.000000000 +0100 @@ -143,18 +143,18 @@ case N_CATCH: case N_SO: case N_SOL: + value = read_32 (data->d_buf + off + 8); + sec = addr_to_sec (dso, value); + if (sec != -1) + { + addr_adjust (value, start, adjust); + write_32 (data->d_buf + off + 8, value); + } + break; /* I'm not 100% sure about the following 3. */ case N_SLINE: case N_BSLINE: case N_DSLINE: - value = read_32 (data->d_buf + off + 8); - sec = addr_to_sec (dso, value); - if (sec != -1) - { - addr_adjust (value, start, adjust); - write_32 (data->d_buf + off + 8, value); - } - break; /* These should be always 0. */ case N_GSYM: case N_BINCL: Ashley, |
|
From: Ashley P. <as...@qu...> - 2005-09-12 09:48:33
|
On Wed, 2005-08-31 at 17:18 +0100, Julian Seward wrote: > On Wednesday 31 August 2005 16:23, Nicholas Nethercote wrote: > > On Wed, 31 Aug 2005, Ashley Pittman wrote: > > > I've finally got to the bottom of this, it appears to be a prelink bug, > > > I've got a prelink patch which fixes the problem and allows correct line > > > number reporting withing valgrind. > > > > That's good to know. Thanks for hunting this down. > > Yes indeed. > > Your patch -- is it for Valgrind, or for GNU binutils/whatever that > does the prelinking? It wasn't clear to me which it applies to. Fixed upstream, thank you for everyones help. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=167628 Ashley, |