|
From: Brian W. <br...@ls...> - 2008-10-10 18:31:14
|
It has been a while since I've used valgrind much, and have started trying
it again lately, but have a LOT of what I believe are bogus error reports
about uninitialized variables. Several I've traced back to obvious
initialization statements. I think the relevant point is that I'm running
on an x86_64 system, with 64 bit pointers and 32 bit data, but all the
"unitialized value" errors say "size 8"! But all my data is size 4!
I tried building a small example (the program I actually work on and use
valgrind for is approx 1.5M lines, mostly Fortran and some C), but of
course couldn't duplicate somthing quite so obviously wrong. I did get
this, which may or may not be helpful. This small program:
#include <stdio.h>
main()
{
int i,j;
for(i=j=0; i< 5; i++) {
j=do_sub(j);
}
printf("Result = %d\n",j);
}
int do_sub(int in)
{
int out;
//int junk=1; /* No errors */
int junk; /* Errors */
out = in+junk;
return out;
}
has an obvious error ("junk" is not initialized). When I build this on my
OpenSuSE 11.0 x86_64 system with the gcc 4.3.1 compiler, I either get no
valgrind errors (if junk=1 is used), or I get this (when junk is not
initialized):
==22349== Use of uninitialised value of size 8
==22349== at 0x4E6E423: (within /lib64/libc-2.8.so)
==22349== by 0x4E7166F: vfprintf (in /lib64/libc-2.8.so)
==22349== by 0x4E78FE9: printf (in /lib64/libc-2.8.so)
==22349== by 0x40055E: main (in /home/brian/tmp/tst)
Now, this "error" is in the system libraries, and I don't KNOW that the
error report is wrong. But I don't get ANY errors about the usage of junk,
or anything else. I've tried several arithmetic operations involving junk,
or out, or the results thereof, all with no errors reported.
In any case, in my real application, I get TONS of these "uninitialized
value" errors, and ALL say "size 8" -- even the ones that point to lines in
my code, where all the data is 32 bit data, and all the values HAVE been
initialized.
I suspect it is only looking at 8 byte chunks, and perhaps the 4 bytes I'm
using sit next to 4 bytes that are not initialized, and hence I get errors.
But I can't find any way to tell valgrind to do this check on 4 byte
intervals instead of 8.
Any suggestions?
(I just did this all with valgrind 3.3.1, and I've been seeing these kinds
of errors with 3.3.0, and some earlier versions. But I didn't have
problems before on 32 bit systems with earlier versions of valgrind...)
-- Brian
|
|
From: tom f. <tf...@al...> - 2008-10-10 20:14:19
|
Brian Wainscott <br...@ls...> writes:
> #include <stdio.h>
> main()
> {
> int i,j;
>
> for(i=j=0; i< 5; i++) {
> j=do_sub(j);
> }
>
> printf("Result = %d\n",j);
> }
> int do_sub(int in)
> {
> int out;
> //int junk=1; /* No errors */
> int junk; /* Errors */
>
> out = in+junk;
> return out;
> }
>
> has an obvious error ("junk" is not initialized). When I build this on my
> OpenSuSE 11.0 x86_64 system with the gcc 4.3.1 compiler, I either get no
> valgrind errors (if junk=1 is used), or I get this (when junk is not
> initialized):
>
> ==22349== Use of uninitialised value of size 8
> ==22349== at 0x4E6E423: (within /lib64/libc-2.8.so)
> ==22349== by 0x4E7166F: vfprintf (in /lib64/libc-2.8.so)
> ==22349== by 0x4E78FE9: printf (in /lib64/libc-2.8.so)
> ==22349== by 0x40055E: main (in /home/brian/tmp/tst)
>
> Now, this "error" is in the system libraries, and I don't KNOW that the
> error report is wrong. But I don't get ANY errors about the usage of junk,
> or anything else.
Errors are not reported until they are `visible'. The
uninitialized-ness of junk propagates to do_sub:out, and then main:j,
but it is only upon the observed access in the printf that the
uninitialized nature of main:j is visible.
See:
http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.value
HTH,
-tom
|
|
From: tom f. <tf...@al...> - 2008-10-10 22:51:18
|
I hope you'll forgive me pushing your reply back to the list. Please
try to keep the ML in the loop, so that any solutions get archived.
Brian Wainscott <br...@ls...> writes:
>
> Yeah, I know, which makes tracking these things down a pain. Like I
> said, this example isn't perfect.
*shrug*, yes. I imagine it is a performance tradeoff -- the branch
incurred for checking every access isn't worth it, if one can show
that the invalid code will never cause a problem.
I'm no valgrind developer, but `code speaks louder than words'...
> But I'm still seeing LOTS of these which I didn't see in earlier
> versions of valgrind (circa. 2 years ago, I don't remember the
> version number).
Why is this an issue? This code is \emph{wrong}. It doesn't matter
that valgrind didn't notice until it was passed into printf. If
earlier versions of valgrind did not report this error, then earlier
versions of valgrind are buggy.
Certainly it would be nicer if memcheck could check all the arguments
to printf and give out a warning upon the call instead of upon the
access. That would require valgrind to be a higher level tool than I
get the impression it is.
> And I DON'T understand why they all complain about 8 byte sizes when
> all my data is 4 bytes....
Last I checked (1.5[?] years ago), gcc conformed to an x86_64 ABI which
said arguments are passed in 8-byte locations on the stack. I can
imagine it also reserves 8 bytes for the return value as well (even if
the return value is smaller, that is).
I guess the next step would be examining the assembly to verify. I'll
leave that as an excercise for the reader ;)
-tom
> tom fogal wrote:
> > Brian Wainscott <br...@ls...> writes:
> >> #include <stdio.h>
> >> main()
> >> {
> >> int i,j;
> >>
> >> for(i=j=0; i< 5; i++) {
> >> j=do_sub(j);
> >> }
> >>
> >> printf("Result = %d\n",j);
> >> }
> >> int do_sub(int in)
> >> {
> >> int out;
> >> //int junk=1; /* No errors */
> >> int junk; /* Errors */
> >>
> >> out = in+junk;
> >> return out;
> >> }
> >>
> >> has an obvious error ("junk" is not initialized). When I build this on my
>
> >> OpenSuSE 11.0 x86_64 system with the gcc 4.3.1 compiler, I either get no
> >> valgrind errors (if junk=1 is used), or I get this (when junk is not
> >> initialized):
> >>
> >> ==22349== Use of uninitialised value of size 8
> >> ==22349== at 0x4E6E423: (within /lib64/libc-2.8.so)
> >> ==22349== by 0x4E7166F: vfprintf (in /lib64/libc-2.8.so)
> >> ==22349== by 0x4E78FE9: printf (in /lib64/libc-2.8.so)
> >> ==22349== by 0x40055E: main (in /home/brian/tmp/tst)
> >>
> >> Now, this "error" is in the system libraries, and I don't KNOW that the
> >> error report is wrong. But I don't get ANY errors about the usage of junk
> ,
> >> or anything else.
> >
> > Errors are not reported until they are `visible'. The
> > uninitialized-ness of junk propagates to do_sub:out, and then main:j,
> > but it is only upon the observed access in the printf that the
> > uninitialized nature of main:j is visible.
> >
> > See:
> >
> > http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.value
|
|
From: Brian W. <br...@ls...> - 2008-10-10 23:09:37
|
Tom,
tom fogal wrote:
> I hope you'll forgive me pushing your reply back to the list. Please
> try to keep the ML in the loop, so that any solutions get archived.
Not a problem...
>
> Brian Wainscott <br...@ls...> writes:
>> Yeah, I know, which makes tracking these things down a pain. Like I
>> said, this example isn't perfect.
>
> *shrug*, yes. I imagine it is a performance tradeoff -- the branch
> incurred for checking every access isn't worth it, if one can show
> that the invalid code will never cause a problem.
>
> I'm no valgrind developer, but `code speaks louder than words'...
>
>> But I'm still seeing LOTS of these which I didn't see in earlier
>> versions of valgrind (circa. 2 years ago, I don't remember the
>> version number).
>
> Why is this an issue? This code is \emph{wrong}. It doesn't matter
> that valgrind didn't notice until it was passed into printf. If
> earlier versions of valgrind did not report this error, then earlier
> versions of valgrind are buggy.
This code is wrong, of course. But I've got code where I've seen so many
of these reports it is pretty much useless to me. SEVERAL times I have
picked one or two, and tried to track them down, and found them completely
bogus. For example, earlier today, deep inside a program, I got such an
error report. It pointed to a line that read:
ind[ii] = old_ind[ii] = 0;
and ii was the loop variable! There was a nice, clean
"for(ii=0; ii<n; ii++)" up above, and valgrind complained about it.
Now, I'm not claiming the code I'm working on is perfect, by any stretch.
But there are SO many of these that I can't find any that are meaningful.
When I turn off this particular warning.....I get no errors at all, until
it dies in "free()", presumably from heap corruption. So I think there IS
a write to an invalid address somewhere, but valgrind isn't reporting it.
Basically, I wrote to the list to see if anyone else had experienced this
kind of problem, or there was a known issue, or work around, or ????
Valgrind USED to be so helpful to me, quickly tracking down invalid
accesses (and the occasional, but very rare "uninitialized value" error),
but it seems that something happened at some point....
I'd try older versions, but they don't support x86_64. I'm now building my
application on a IA32 machine, in hopes that I can debug it there, maybe
with an old version of valgrind, but I'm not holding out much hope for
that. I'd hate to have to go back and try ElectricFence, but may resort to
that too...
>
> Certainly it would be nicer if memcheck could check all the arguments
> to printf and give out a warning upon the call instead of upon the
> access. That would require valgrind to be a higher level tool than I
> get the impression it is.
>
>> And I DON'T understand why they all complain about 8 byte sizes when
>> all my data is 4 bytes....
>
> Last I checked (1.5[?] years ago), gcc conformed to an x86_64 ABI which
> said arguments are passed in 8-byte locations on the stack. I can
> imagine it also reserves 8 bytes for the return value as well (even if
> the return value is smaller, that is).
Let's not get too caught up in this example! In my real code, I get
complaints on some assignment lines, where as I explained above, I've
tracked the values back and seen no errors. But I believe there IS an
invalid write somewhere, and I'm not seeing it. (At the moment, the only
Invalid write report I get is in "free()" when it would normally segfault,
and my understanding is that this almost certainly means the heap was
corrupted, meaning there was an invalid write somewhere else, but it wasn't
caught by valgrind...)
-- Brian
|
|
From: tom f. <tf...@al...> - 2008-10-10 23:37:58
|
Brian Wainscott <br...@ls...> writes:
>
> tom fogal wrote:
>
> > Brian Wainscott <br...@ls...> writes:
> >
> >> But I'm still seeing LOTS of these which I didn't see in earlier
> >> versions of valgrind (circa. 2 years ago, I don't remember the
> >> version number).
> >
> > Why is this an issue? This code is \emph{wrong}. It doesn't matter
> > that valgrind didn't notice until it was passed into printf. If
> > earlier versions of valgrind did not report this error, then earlier
> > versions of valgrind are buggy.
>
> This code is wrong, of course. But I've got code where I've seen so many
> of these reports it is pretty much useless to me.
Okay. I misunderstood the purpose of your mail. You're concerned that
you've got a program with so many errors, but you're 99% sure that most
of the ones memcheck reports aren't related to the one critical bug
you're currently searching for. A needle in a haystack of bugs.
I suppose the cop-out response is `fix all your bugs!', but I
understand debugging time is scarce at best.
So try a suppression. Quick guess:
{
BW:quietness
Memcheck:Value8
fun:printf
}
Beware, I've not even tested if VG parses that.
More info:
http://valgrind.org/docs/manual/manual-core.html#manual-core.suppress
(or try --gen-suppressions=yes).
I'm sure you know this, but I would be remiss if I did not explicitly
note that, of course, if the `larger' error you search for is only
visible in a printf, you won't see it with such a suppression.
Good luck,
-tom
|
|
From: tom f. <tf...@al...> - 2008-10-10 23:43:22
|
Brian Wainscott <br...@ls...> writes: > > tom fogal wrote: [snip] > When I turn off this particular warning.....I get no errors at all, until > it dies in "free()", presumably from heap corruption. Doh. I must have missed this in my earlier email; guess you've already gone the suppression route. Sorry for the noise. Guess we'll have to wait for someone more knowledgable than me to speak up ... -tom |
|
From: Dan K. <da...@ke...> - 2008-10-11 00:08:54
|
On Fri, Oct 10, 2008 at 11:11 AM, Brian Wainscott <br...@ls...> wrote: > It has been a while since I've used valgrind much, and have started trying > it again lately, but have a LOT of what I believe are bogus error reports > about uninitialized variables. Several I've traced back to obvious > initialization statements. I think the relevant point is that I'm running > on an x86_64 system, with 64 bit pointers and 32 bit data, but all the > "unitialized value" errors say "size 8"! But all my data is size 4! Perhaps the architecture (not valgrind) does loads of size 8 anyway. > Now, this "error" is in the system libraries, and I don't KNOW that the > error report is wrong. But I don't get ANY errors about the usage of junk, > or anything else. Check out the new valgrind option --track-origins=yes. It's only in the trunk, not in any release yet, but it's the cat's pajamas. - Dan |
|
From: Brian W. <br...@ls...> - 2008-10-11 01:17:24
|
Dan, Thanks -- I'll check that out. My immediate problem is solved, in that I was able to use an old copy of valgrind (2.2) on an IA32 machine: it reported SOME uninitialized values (all in some fortran libraries which I've come to expect as normal). It DID report "Invalid write" which valgrind 3.3 did NOT report on the x86_64 platform....and fixing that fixed my problem. I still think there is something suspicious here, but will look into some of these "uninitialized values" using the --trace-origins option you mention. But it bothers me that valgrind 3.3 did not detect the invalid write, which was corrupting my heap, and valgrind 2.2 did.... -- Brian Dan Kegel wrote: > On Fri, Oct 10, 2008 at 11:11 AM, Brian Wainscott <br...@ls...> wrote: >> It has been a while since I've used valgrind much, and have started trying >> it again lately, but have a LOT of what I believe are bogus error reports >> about uninitialized variables. Several I've traced back to obvious >> initialization statements. I think the relevant point is that I'm running >> on an x86_64 system, with 64 bit pointers and 32 bit data, but all the >> "unitialized value" errors say "size 8"! But all my data is size 4! > > Perhaps the architecture (not valgrind) does loads of size 8 anyway. > >> Now, this "error" is in the system libraries, and I don't KNOW that the >> error report is wrong. But I don't get ANY errors about the usage of junk, >> or anything else. > > Check out the new valgrind option --track-origins=yes. It's only > in the trunk, not in any release yet, but it's the cat's pajamas. > - Dan |
|
From: Dan K. <da...@ke...> - 2008-10-11 02:58:34
|
On Fri, Oct 10, 2008 at 6:17 PM, Brian Wainscott <br...@ls...> wrote: > Thanks -- I'll check that out. My immediate problem is solved, in that I > was able to use an old copy of valgrind (2.2) on an IA32 machine: it > reported SOME uninitialized values (all in some fortran libraries which I've > come to expect as normal). It DID report "Invalid write" which valgrind 3.3 > did NOT report on the x86_64 platform....and fixing that fixed my problem. > I still think there is something suspicious here, but will look into some > of these "uninitialized values" using the --trace-origins option you > mention. But it bothers me that valgrind 3.3 did not detect the invalid > write, which was corrupting my heap, and valgrind 2.2 did.... Could be because the heap is aligned on 16 byte boundaries for 64 bit machines, so there's more slop allowed, as it were... - Dan |
|
From: Brian W. <br...@ls...> - 2008-10-12 15:54:49
|
Dan Kegel wrote: > On Fri, Oct 10, 2008 at 6:17 PM, Brian Wainscott <br...@ls...> wrote: >> Thanks -- I'll check that out. My immediate problem is solved, in that I >> was able to use an old copy of valgrind (2.2) on an IA32 machine: it >> reported SOME uninitialized values (all in some fortran libraries which I've >> come to expect as normal). It DID report "Invalid write" which valgrind 3.3 >> did NOT report on the x86_64 platform....and fixing that fixed my problem. >> I still think there is something suspicious here, but will look into some >> of these "uninitialized values" using the --trace-origins option you >> mention. But it bothers me that valgrind 3.3 did not detect the invalid >> write, which was corrupting my heap, and valgrind 2.2 did.... > > Could be because the heap is aligned on 16 byte > boundaries for 64 bit machines, so there's > more slop allowed, as it were... > - Dan Well, I was overwriting enough memory to corrupt the heap (I was writing to the bytes before my allocation, not after) -- seems to me this should have been caught. When I did catch in on the 32 bit system, and fixed the code, it ran fine on the 64 bit system. I'll try tracking down some of the uninitialized values it now reports, using the --track-origins=yes option, and see how that goes. In this particular case it wouldn't have helped, because the array index was initialized, just holding the wrong value. -- Brian |