|
From: Yeshurun, M. <mei...@in...> - 2006-06-04 04:56:41
|
Before the experts answer you, it sounds to me like maybe you're passing around some un-initialized value which happens to have the right value (or just a "better" value) when running under Valgrind. Thanks, Meir -----Original Message----- From: val...@li... [mailto:val...@li...] On Behalf Of Mike Mueller Sent: Sunday, June 04, 2006 6:29 AM To: val...@li... Cc: Bob Rossi Subject: [Valgrind-users] Segfault not caught by valgrind I have a weird situation. I'll try to describe it as completely as possible: I have a program that does not crash, and when run through valgrind, 0 errors are reported. Someone else patched it to get it to recompile on FreeBSD by moving a single include (sys/types.h) from the bottom to the top of the includes list. Now the program segfaults on my AMD64 computer (but not on any other computers, as far as I can tell, including x86, ppc). Now, I debug it in gdb and I find the line where it's crashing. It's a call to ptsname that's returning garbage instead of a valid pointer to a string (or NULL, the failure case). Can't figure out why ptsname is returning garbage unless there's a weird memory corruption or the system library is buggy on amd64. However, when I run the program through valgrind (using the memcheck tool), it does NOT crash and valgrind reports 0 errors. One last bit, if I call ptsname_r instead of ptsname, there is no segfault. Have you ever seen anything like this? Thanks! Mike _______________________________________________ Valgrind-users mailing list Val...@li... https://lists.sourceforge.net/lists/listinfo/valgrind-users |
|
From: Mike M. <mi...@su...> - 2006-06-04 08:49:12
|
Meir, that's what I thought at first, too. What's interesting though, is the function ptsname() is supposed to return a pointer to a statically allocated buffer, or NULL on failure. It's hard for me to imagine a possible case where that function would return a bad pointer. I even opened up the code for glibc-2.3.5 to see how it's implemented and it's a simple one liner (call to ptsname_r). Thanks, Mike On 6/4/06, Yeshurun, Meir <mei...@in...> wrote: > > Before the experts answer you, it sounds to me like maybe you're passing > around some un-initialized value which happens to have the right value > (or just a "better" value) when running under Valgrind. > > Thanks, > Meir > > -----Original Message----- > From: val...@li... > [mailto:val...@li...] On Behalf Of Mike > Mueller > Sent: Sunday, June 04, 2006 6:29 AM > To: val...@li... > Cc: Bob Rossi > Subject: [Valgrind-users] Segfault not caught by valgrind > > I have a weird situation. I'll try to describe it as completely as > possible: > > I have a program that does not crash, and when run through valgrind, 0 > errors are reported. Someone else patched it to get it to recompile > on FreeBSD by moving a single include (sys/types.h) from the bottom to > the top of the includes list. Now the program segfaults on my AMD64 > computer (but not on any other computers, as far as I can tell, > including x86, ppc). > > Now, I debug it in gdb and I find the line where it's crashing. It's > a call to ptsname that's returning garbage instead of a valid pointer > to a string (or NULL, the failure case). Can't figure out why ptsname > is returning garbage unless there's a weird memory corruption or the > system library is buggy on amd64. > > However, when I run the program through valgrind (using the memcheck > tool), it does NOT crash and valgrind reports 0 errors. > > One last bit, if I call ptsname_r instead of ptsname, there is no > segfault. > > Have you ever seen anything like this? > > Thanks! > Mike > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
|
From: Bob R. <bob...@co...> - 2006-06-04 17:15:57
|
On Sun, Jun 04, 2006 at 07:56:27AM +0300, Yeshurun, Meir wrote: > > Before the experts answer you, it sounds to me like maybe you're passing > around some un-initialized value which happens to have the right value > (or just a "better" value) when running under Valgrind. I thought valgrind would discover if unintialized values are being used though. Isn't that true? Bob Rossi |
|
From: Nicholas N. <nj...@cs...> - 2006-06-04 22:26:11
|
On Sun, 4 Jun 2006, Bob Rossi wrote: >> Before the experts answer you, it sounds to me like maybe you're passing >> around some un-initialized value which happens to have the right value >> (or just a "better" value) when running under Valgrind. > > I thought valgrind would discover if unintialized values are being used > though. Isn't that true? In general, yes. But the execution environment under Valgrind is different to that natively -- memory is laid out in different ways, etc. Very occasionally this changes program behaviour; but when it does it's usually because the program is buggy, eg. it's got a wild memory read/write that may hit addressible, initialised memory uner Valgrind but not natively. I imagine something like that is happening here. It's unfortunate for the poor user because Memcheck then can't detect the problem. I heard of one person who chose to always run a program under Valgrind because it crashed natively but not under Valgrind! (I think Memcheck must have issued warnings, but maybe he didn't care and just used --tool=none.) Nick |
|
From: Bob R. <bob...@co...> - 2006-06-05 20:24:44
|
On Mon, Jun 05, 2006 at 08:25:54AM +1000, Nicholas Nethercote wrote:
> On Sun, 4 Jun 2006, Bob Rossi wrote:
>
> >> Before the experts answer you, it sounds to me like maybe you're passing
> >> around some un-initialized value which happens to have the right value
> >> (or just a "better" value) when running under Valgrind.
> >
> > I thought valgrind would discover if unintialized values are being used
> > though. Isn't that true?
>
> In general, yes. But the execution environment under Valgrind is different
> to that natively -- memory is laid out in different ways, etc. Very
> occasionally this changes program behaviour; but when it does it's usually
> because the program is buggy, eg. it's got a wild memory read/write that may
> hit addressible, initialised memory uner Valgrind but not natively. I
> imagine something like that is happening here. It's unfortunate for the
> poor user because Memcheck then can't detect the problem.
OK, if we include
#define _GNU_SOURCE /* ptsname_r() under Linux */
#include <sys/types.h>
we get the prototype
extern char *ptsname (int __fd) __attribute__ ((__nothrow__));
in the translation unit.
If we include
#include <sys/types.h>
#define _GNU_SOURCE /* ptsname_r() under Linux */
we do not get the prototype in the translation unit.
Then, we do this,
char *name;
if (!(name = ptsname(*masterfd)))
In the case where the prototype is defined, it works fine and there
is no crash. In the case where there is no prototype, we get this
error.
../../../../cgdb/various/util/src/pseudo.c:304: warning: assignment
makes pointer from integer without a cast
That's because it thinks ptsname returns an int, and assigns it to a
char*. This works when the sizeof(char*) == sizeof(int). However on this
64 amd machine, sizeof (char*)=8 sizeof (int)=4. This gives us the
crash.
Now, I'm wondering why valgrind reports no error on this circumstance.
Would this be an improvement to memcheck? It took us quite some time to
figure out the problem.
Thanks,
Bob Rossi
|
|
From: Nicholas N. <nj...@cs...> - 2006-06-05 21:57:27
|
On Mon, 5 Jun 2006, Bob Rossi wrote: > OK, if we include > #define _GNU_SOURCE /* ptsname_r() under Linux */ > #include <sys/types.h> > we get the prototype > extern char *ptsname (int __fd) __attribute__ ((__nothrow__)); > in the translation unit. > > If we include > #include <sys/types.h> > #define _GNU_SOURCE /* ptsname_r() under Linux */ > we do not get the prototype in the translation unit. > > Then, we do this, > char *name; > if (!(name = ptsname(*masterfd))) > > In the case where the prototype is defined, it works fine and there > is no crash. In the case where there is no prototype, we get this > error. > ../../../../cgdb/various/util/src/pseudo.c:304: warning: assignment > makes pointer from integer without a cast > > That's because it thinks ptsname returns an int, and assigns it to a > char*. This works when the sizeof(char*) == sizeof(int). However on this > 64 amd machine, sizeof (char*)=8 sizeof (int)=4. This gives us the > crash. > > Now, I'm wondering why valgrind reports no error on this circumstance. > Would this be an improvement to memcheck? It took us quite some time to > figure out the problem. Here's my guess as to what happened. The lower 4 bytes of the return register (%rax, I think) got set to the return value. The upper 4 bytes got left as whatever they were before; but importantly, those 4 bytes were defined (ie. initialised). So the whole register is seen by Memcheck as defined, because it is. You then used that value as a pointer, which was bogus, but under Memcheck you got lucky/unlucky and the access through that pointer hit addressable memory. Or possibly(?) Valgrind changes the way the values go through the registers somewhat, and you luckily/unluckily ended up with the correct value. The problem is that Memcheck does all its analysis at the byte-level. But your problem here (assuming I've diagnosed it correctly) is that you erroneously combined two 4-byte values into an 8-byte value which you then used. As for whether Memcheck or another tool could detect this... it seems like it would be hard, because multi-byte values get constructed from single-byte or fewer-byte values all the time, and I can't think how to distinguish the erroneous ones from the legitimate ones. So it's something that Memcheck can't detect. But the compiler can :) Nick |
|
From: Mike M. <mi...@su...> - 2006-06-05 22:09:38
|
Yes, Nick, you've accurately summarized the problem. Too bad memcheck can't catch it, but you're right, the compiler did warn us, we just missed it. Thanks for the help. On 6/5/06, Nicholas Nethercote <nj...@cs...> wrote: > On Mon, 5 Jun 2006, Bob Rossi wrote: > > > OK, if we include > > #define _GNU_SOURCE /* ptsname_r() under Linux */ > > #include <sys/types.h> > > we get the prototype > > extern char *ptsname (int __fd) __attribute__ ((__nothrow__)); > > in the translation unit. > > > > If we include > > #include <sys/types.h> > > #define _GNU_SOURCE /* ptsname_r() under Linux */ > > we do not get the prototype in the translation unit. > > > > Then, we do this, > > char *name; > > if (!(name = ptsname(*masterfd))) > > > > In the case where the prototype is defined, it works fine and there > > is no crash. In the case where there is no prototype, we get this > > error. > > ../../../../cgdb/various/util/src/pseudo.c:304: warning: assignment > > makes pointer from integer without a cast > > > > That's because it thinks ptsname returns an int, and assigns it to a > > char*. This works when the sizeof(char*) == sizeof(int). However on this > > 64 amd machine, sizeof (char*)=8 sizeof (int)=4. This gives us the > > crash. > > > > Now, I'm wondering why valgrind reports no error on this circumstance. > > Would this be an improvement to memcheck? It took us quite some time to > > figure out the problem. > > Here's my guess as to what happened. The lower 4 bytes of the return > register (%rax, I think) got set to the return value. The upper 4 bytes got > left as whatever they were before; but importantly, those 4 bytes were > defined (ie. initialised). So the whole register is seen by Memcheck as > defined, because it is. You then used that value as a pointer, which was > bogus, but under Memcheck you got lucky/unlucky and the access through that > pointer hit addressable memory. Or possibly(?) Valgrind changes the way the > values go through the registers somewhat, and you luckily/unluckily ended up > with the correct value. > > The problem is that Memcheck does all its analysis at the byte-level. But > your problem here (assuming I've diagnosed it correctly) is that you > erroneously combined two 4-byte values into an 8-byte value which you then > used. As for whether Memcheck or another tool could detect this... it seems > like it would be hard, because multi-byte values get constructed from > single-byte or fewer-byte values all the time, and I can't think how to > distinguish the erroneous ones from the legitimate ones. > > So it's something that Memcheck can't detect. But the compiler can :) > > Nick > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |