|
From: Christoffer H. <Chr...@nu...> - 2010-08-09 12:26:45
|
Hi, here's a little problem that has bothered me for a few days. I'm working on isolating it further but thought there might be someone here who might just know if this is expected behaviour of Valgrind, or if this is something that should be reported as a bug. As the title says, I've managed to build an application that runs correctly in Valgrind, but crashes with a segmentation fault when run normally in a console. It's a normal C89 application, built with GCC on a x86_64 machine. The culprit seems to be the auto-vectorization module (which vectorizes some loops using SSE instructions). The segfault occurs inside a loop that is reported as vectorized (with -ftree-vectorizer-verbose=2), and the application runs as expected both in Valgrind an in console when built with -fno-tree-vectorize. Now, I'm not saying that the compiler or code is flawless, but I had expected the application to crash in Valgrind as well :-) The Valgrind version is reported as 3.6.0.SVN-Debian GCC version 4.4.3 and 4.3.4 displays the mentioned behaviour, when building with 4.1.3 the application works both in Valgrind and in the console when vectorization is enabled. // Christoffer _____________________________________________ CHRISTOFFER HAGLUND Software Developer, Decuma Nuance Communications Sweden Ole Römers väg 16 SE-223 70 Lund +46 (0) 46 286 53 43 Office +46 (0) 709 59 63 06 Mobile NUANCE.COM The experience speaks for itself ™ |
|
From: Julian S. <js...@ac...> - 2010-08-09 12:40:28
|
This is a known phenomenon. Valgrind (well, Memcheck, really) is more robust against applications that overwrite the ends/beginnings of heap blocks than the normal C malloc/free routines are. So if your application is doing such overwrites, that could explain the difference. What really would be a bug in Memcheck though is if it doesn't tell you about such overruns. J On Monday, August 09, 2010, Christoffer Haglund wrote: > Hi, > > here's a little problem that has bothered me for a few days. I'm working > on isolating it further but thought there might be someone here who > might just know if this is expected behaviour of Valgrind, or if this is > something that should be reported as a bug. > > As the title says, I've managed to build an application that runs > correctly in Valgrind, but crashes with a segmentation fault when run > normally in a console. It's a normal C89 application, built with GCC on > a x86_64 machine. The culprit seems to be the auto-vectorization module > (which vectorizes some loops using SSE instructions). The segfault > occurs inside a loop that is reported as vectorized (with > -ftree-vectorizer-verbose=2), and the application runs as expected both > in Valgrind an in console when built with -fno-tree-vectorize. > > Now, I'm not saying that the compiler or code is flawless, but I had > expected the application to crash in Valgrind as well :-) > > The Valgrind version is reported as 3.6.0.SVN-Debian > GCC version 4.4.3 and 4.3.4 displays the mentioned behaviour, when > building with 4.1.3 the application works both in Valgrind and in the > console when vectorization is enabled. > > // Christoffer > _____________________________________________ > > CHRISTOFFER HAGLUND > Software Developer, Decuma > > Nuance Communications Sweden > Ole Römers väg 16 > SE-223 70 Lund > +46 (0) 46 286 53 43 Office > +46 (0) 709 59 63 06 Mobile > NUANCE.COM > The experience speaks for itself ™ |
|
From: Christoffer H. <Chr...@nu...> - 2010-08-10 07:02:37
|
That's interesting. This what Memcheck tells me about the application
that crashes:
$ valgrind --leak-check=full --show-reachable=yes
--track-origins=yes ./unittest_t9write -r api -o api.xml
==19513== Memcheck, a memory error detector
==19513== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et
al.
==19513== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for
copyright info
==19513== Command: ./unittest_t9write -r api -o api.xml
==19513==
This is t9write unit test runner, compiled with: -Wall -Wextra -ansi
-pedantic -O3
No errors.
==19513==
==19513== HEAP SUMMARY:
==19513== in use at exit: 0 bytes in 0 blocks
==19513== total heap usage: 1,050,205 allocs, 1,050,205 frees,
268,688,656 bytes allocated
==19513==
==19513== All heap blocks were freed -- no leaks are possible
==19513==
==19513== For counts of detected and suppressed errors, rerun with: -v
==19513== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
So Memcheck does not find any overwrites, but that's likely the problem
anyway; using GDB I can see that the segfault occurs inside a loop
that's basically a memset:
for ( ; n > 0; n-- )
*intPtr++ = value;
Now, I haven't checked out exactly what instructions are generated, and
I probably won't unless anyone finds this interesting or think it looks
like a Memcheck bug :-)
// Christoffer
mån 2010-08-09 klockan 14:43 +0200 skrev Julian Seward:
> This is a known phenomenon. Valgrind (well, Memcheck, really) is more
> robust against applications that overwrite the ends/beginnings of heap blocks
> than the normal C malloc/free routines are. So if your application
> is doing such overwrites, that could explain the difference. What
> really would be a bug in Memcheck though is if it doesn't tell you
> about such overruns.
>
> J
>
> On Monday, August 09, 2010, Christoffer Haglund wrote:
> > Hi,
> >
> > here's a little problem that has bothered me for a few days. I'm working
> > on isolating it further but thought there might be someone here who
> > might just know if this is expected behaviour of Valgrind, or if this is
> > something that should be reported as a bug.
> >
> > As the title says, I've managed to build an application that runs
> > correctly in Valgrind, but crashes with a segmentation fault when run
> > normally in a console. It's a normal C89 application, built with GCC on
> > a x86_64 machine. The culprit seems to be the auto-vectorization module
> > (which vectorizes some loops using SSE instructions). The segfault
> > occurs inside a loop that is reported as vectorized (with
> > -ftree-vectorizer-verbose=2), and the application runs as expected both
> > in Valgrind an in console when built with -fno-tree-vectorize.
> >
> > Now, I'm not saying that the compiler or code is flawless, but I had
> > expected the application to crash in Valgrind as well :-)
> >
> > The Valgrind version is reported as 3.6.0.SVN-Debian
> > GCC version 4.4.3 and 4.3.4 displays the mentioned behaviour, when
> > building with 4.1.3 the application works both in Valgrind and in the
> > console when vectorization is enabled.
> >
> > // Christoffer
> > _____________________________________________
> >
> > CHRISTOFFER HAGLUND
> > Software Developer, Decuma
> >
> > Nuance Communications Sweden
> > Ole Römers väg 16
> > SE-223 70 Lund
> > +46 (0) 46 286 53 43 Office
> > +46 (0) 709 59 63 06 Mobile
> > NUANCE.COM
> > The experience speaks for itself ™
>
|
|
From: Bart V. A. <bva...@ac...> - 2010-08-10 07:14:44
|
On Mon, Aug 9, 2010 at 4:01 PM, Christoffer Haglund < Chr...@nu...> wrote: > That's interesting. This what Memcheck tells me about the application that > crashes: > > $ valgrind --leak-check=full --show-reachable=yes --track-origins=yes > ./unittest_t9write -r api -o api.xml > ==19513== Memcheck, a memory error detector > ==19513== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. > ==19513== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for > copyright info > ==19513== Command: ./unittest_t9write -r api -o api.xml > ==19513== > > This is t9write unit test runner, compiled with: -Wall -Wextra -ansi > -pedantic -O3 > > No errors. > > ==19513== > ==19513== HEAP SUMMARY: > ==19513== in use at exit: 0 bytes in 0 blocks > ==19513== total heap usage: 1,050,205 allocs, 1,050,205 frees, > 268,688,656 bytes allocated > ==19513== > ==19513== All heap blocks were freed -- no leaks are possible > ==19513== > ==19513== For counts of detected and suppressed errors, rerun with: -v > ==19513== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4) > > > So Memcheck does not find any overwrites, but that's likely the problem > anyway; using GDB I can see that the segfault occurs inside a loop that's > basically a memset: > > for ( ; n > 0; n-- ) > *intPtr++ = value; > > Now, I haven't checked out exactly what instructions are generated, and I > probably won't unless anyone finds this interesting or think it looks like a > Memcheck bug :-) > Does intPtr point to an array or to dynamically allocated memory ? Bart. |
|
From: Christoffer H. <Chr...@nu...> - 2010-08-10 07:33:43
|
Dynamic memory, it points to an array inside a struct allocated with normal calloc() tis 2010-08-10 klockan 09:14 +0200 skrev Bart Van Assche: > On Mon, Aug 9, 2010 at 4:01 PM, Christoffer Haglund > <Chr...@nu...> wrote: > > That's interesting. This what Memcheck tells me about the > application that crashes: > > $ valgrind --leak-check=full --show-reachable=yes > --track-origins=yes ./unittest_t9write -r api -o api.xml > ==19513== Memcheck, a memory error detector > ==19513== Copyright (C) 2002-2009, and GNU GPL'd, by Julian > Seward et al. > ==19513== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun > with -h for copyright info > ==19513== Command: ./unittest_t9write -r api -o api.xml > ==19513== > > This is t9write unit test runner, compiled with: -Wall -Wextra > -ansi -pedantic -O3 > > No errors. > > ==19513== > ==19513== HEAP SUMMARY: > ==19513== in use at exit: 0 bytes in 0 blocks > ==19513== total heap usage: 1,050,205 allocs, 1,050,205 > frees, 268,688,656 bytes allocated > ==19513== > ==19513== All heap blocks were freed -- no leaks are possible > ==19513== > ==19513== For counts of detected and suppressed errors, rerun > with: -v > ==19513== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: > 4 from 4) > > > So Memcheck does not find any overwrites, but that's likely > the problem anyway; using GDB I can see that the segfault > occurs inside a loop that's basically a memset: > > for ( ; n > 0; n-- ) > *intPtr++ = value; > > Now, I haven't checked out exactly what instructions are > generated, and I probably won't unless anyone finds this > interesting or think it looks like a Memcheck bug :-) > > > > Does intPtr point to an array or to dynamically allocated memory ? > > Bart. // Christoffer _____________________________________________ CHRISTOFFER HAGLUND Software Developer, Decuma Nuance Communications Sweden Ole Römers väg 16 SE-223 70 Lund +46 (0) 46 286 53 43 Office +46 (0) 709 59 63 06 Mobile NUANCE.COM The experience speaks for itself ™ |
|
From: Christoffer H. <Chr...@nu...> - 2010-08-10 07:59:55
|
Oh, sorry, that last reply wasn't very clear, let my try again. The
problem occurs inside some proprietary code, so unfortunately I can't
just copy the exact code that fails, but I'll try to describe it.
There's a struct with the following definition
typedef struct _session {
...
char generic_buffer[255];
} session_t;
Before triggering the segfault, the session is allocated with
session_t * pSession = calloc(get_session_size());
The function that segfaults (that contains the loop I was referring to
earlier) is called with
status = do_stuff(input, &pSession->generic_buffer[0]);
The function has the following definition:
status_t do_stuff(input_t * input, char * charPtr)
{
int n = calculate_n(input);
int value = calculate_value(input);
int * intPrt = (int *) charPtr;
for ( ; n < 0; n-- )
*intPtr++ = value;
}
...and when debugging I can see that calculate_n() returns 27 and
calculate_value() returns 0x000000 for the test case when the
application fails (which are correct values, compared to when the
application is built with another compiler or with
"-fno-tree-vectorize").
// Christoffer
tis 2010-08-10 klockan 09:14 +0200 skrev Bart Van Assche:
> On Mon, Aug 9, 2010 at 4:01 PM, Christoffer Haglund
> <Chr...@nu...> wrote:
>
> That's interesting. This what Memcheck tells me about the
> application that crashes:
>
> $ valgrind --leak-check=full --show-reachable=yes
> --track-origins=yes ./unittest_t9write -r api -o api.xml
> ==19513== Memcheck, a memory error detector
> ==19513== Copyright (C) 2002-2009, and GNU GPL'd, by Julian
> Seward et al.
> ==19513== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun
> with -h for copyright info
> ==19513== Command: ./unittest_t9write -r api -o api.xml
> ==19513==
>
> This is t9write unit test runner, compiled with: -Wall -Wextra
> -ansi -pedantic -O3
>
> No errors.
>
> ==19513==
> ==19513== HEAP SUMMARY:
> ==19513== in use at exit: 0 bytes in 0 blocks
> ==19513== total heap usage: 1,050,205 allocs, 1,050,205
> frees, 268,688,656 bytes allocated
> ==19513==
> ==19513== All heap blocks were freed -- no leaks are possible
> ==19513==
> ==19513== For counts of detected and suppressed errors, rerun
> with: -v
> ==19513== ERROR SUMMARY: 0 errors from 0 contexts (suppressed:
> 4 from 4)
>
>
> So Memcheck does not find any overwrites, but that's likely
> the problem anyway; using GDB I can see that the segfault
> occurs inside a loop that's basically a memset:
>
> for ( ; n > 0; n-- )
> *intPtr++ = value;
>
> Now, I haven't checked out exactly what instructions are
> generated, and I probably won't unless anyone finds this
> interesting or think it looks like a Memcheck bug :-)
>
>
>
> Does intPtr point to an array or to dynamically allocated memory ?
>
> Bart.
// Christoffer
_____________________________________________
CHRISTOFFER HAGLUND
Software Developer, Decuma
Nuance Communications Sweden
Ole Römers väg 16
SE-223 70 Lund
+46 (0) 46 286 53 43 Office
+46 (0) 709 59 63 06 Mobile
NUANCE.COM
The experience speaks for itself ™
|
|
From: Christoffer H. <Chr...@nu...> - 2010-08-10 08:57:40
|
Ah, I got around to looking at the disassembly as well.
The instruction that fails is
movdqa %xmm0,(%r9,%rdi,1)
The contents of the input CPU registers rdi and r9 are
r9 = 7026104
rdi = 0
Now, I'm not that good at reading x86 assembler, but I'm pretty good at
Googling. If I've understood this correctly, movdqa expects a 16-byte
aligned memory address as input, and will throw an exception otherwise.
The contents of r9 is the value of intPtr, which is not on a 16-byte
boundary (7026104 % 16 = 8)
I'm not too familiar with the inner workings of Valgrind either, but
from what I can see it seems like the movdqa instruction is not executed
correctly - it should throw an exception but does not. Here's a small
test application that shows the problem:
#include <stdlib.h>
int main(void)
{
int * buffer = calloc(1, 255);
asm ( "movdqa %%xmm0, (%0, %%rdi, 1)" : : "r"(buffer + 1) : "%
xmm0" );
free(buffer);
return EXIT_SUCCESS;
}
On my system, this will run without any reported errors in Valgrind but
segfault in normal console. I built it with a few version of GCC and
CLang for good measure, it's always reproducible. Can anyone else
confirm this before I file a bug report?
(Also, again, I'm not too familiar with x86 assembler, the instruction
might be valid under some constraint I'm not aware of. I'm happy to
learn though. :-)
// Christoffer
tis 2010-08-10 klockan 09:14 +0200 skrev Bart Van Assche:
> On Mon, Aug 9, 2010 at 4:01 PM, Christoffer Haglund
> <Chr...@nu...> wrote:
>
> That's interesting. This what Memcheck tells me about the
> application that crashes:
>
> $ valgrind --leak-check=full --show-reachable=yes
> --track-origins=yes ./unittest_t9write -r api -o api.xml
> ==19513== Memcheck, a memory error detector
> ==19513== Copyright (C) 2002-2009, and GNU GPL'd, by Julian
> Seward et al.
> ==19513== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun
> with -h for copyright info
> ==19513== Command: ./unittest_t9write -r api -o api.xml
> ==19513==
>
> This is t9write unit test runner, compiled with: -Wall -Wextra
> -ansi -pedantic -O3
>
> No errors.
>
> ==19513==
> ==19513== HEAP SUMMARY:
> ==19513== in use at exit: 0 bytes in 0 blocks
> ==19513== total heap usage: 1,050,205 allocs, 1,050,205
> frees, 268,688,656 bytes allocated
> ==19513==
> ==19513== All heap blocks were freed -- no leaks are possible
> ==19513==
> ==19513== For counts of detected and suppressed errors, rerun
> with: -v
> ==19513== ERROR SUMMARY: 0 errors from 0 contexts (suppressed:
> 4 from 4)
>
>
> So Memcheck does not find any overwrites, but that's likely
> the problem anyway; using GDB I can see that the segfault
> occurs inside a loop that's basically a memset:
>
> for ( ; n > 0; n-- )
> *intPtr++ = value;
>
> Now, I haven't checked out exactly what instructions are
> generated, and I probably won't unless anyone finds this
> interesting or think it looks like a Memcheck bug :-)
>
>
>
> Does intPtr point to an array or to dynamically allocated memory ?
>
> Bart.
|
|
From: Tom H. <to...@co...> - 2010-08-10 08:34:07
|
On 10/08/10 09:00, Christoffer Haglund wrote:
> Oh, sorry, that last reply wasn't very clear, let my try again. The
> problem occurs inside some proprietary code, so unfortunately I can't
> just copy the exact code that fails, but I'll try to describe it.
>
> There's a struct with the following definition
>
> typedef struct _session {
> ...
> char generic_buffer[255];
> } session_t;
Well the first question I'd be asking here, given that you are later
accessing that buffer though an int pointer, is whether it is aligned on
a 4 byte boundary or not..
> The function has the following definition:
>
> status_t do_stuff(input_t * input, char * charPtr)
> {
> int n = calculate_n(input);
> int value = calculate_value(input);
> int * intPrt = (int *) charPtr;
> for ( ; n < 0; n-- )
> *intPtr++ = value;
> }
>
> ...and when debugging I can see that calculate_n() returns 27 and
> calculate_value() returns 0x000000 for the test case when the
> application fails (which are correct values, compared to when the
> application is built with another compiler or with "-fno-tree-vectorize").
As I said above, my guess is that you have an alignment problem. On x86
the alignment doesn't normally matter (though doing 4 byte accesses
through an unaligned pointer is likely to be inefficient).
Once you start vectorising and using SSE instructions that will change
however as many SSE instructions do care about alignment and will fault
if the data is not correctly aligned.
Most probably the reason it works in valgrind is that in the process of
annotating the code the actual instructions being used are changed and
the ones valgrind has selected don't have the same alignment requirements.
Tom
--
Tom Hughes (to...@co...)
http://compton.nu/
|
|
From: Julian S. <js...@ac...> - 2010-08-10 11:14:48
|
> Once you start vectorising and using SSE instructions that will change > however as many SSE instructions do care about alignment and will fault > if the data is not correctly aligned. > > Most probably the reason it works in valgrind is that in the process of > annotating the code the actual instructions being used are changed and > the ones valgrind has selected don't have the same alignment requirements. Urr, you're right. The amd64 front end (guest_amd64_toIR.c) mostly does compile in such alignment checks (iow, the check is expressed in IR along with the rest of the instruction's behaviour), but in the case of movdqa it seems to skip it. Darn. It's a 1-liner fix (for movdqa, at least); just insert a call "gen_SEGV_if_not_16_aligned( addr )" in the memory case for movdqa, around about line 11189 in the svn trunk. J |
|
From: Christoffer H. <Chr...@nu...> - 2010-08-10 09:08:29
|
Ok, I think I understand. This means the test application I sent in my
previous mail shows the lack of alignment-awareness rather than an
actual bug, right?
// Christoffer
tis 2010-08-10 klockan 09:33 +0100 skrev Tom Hughes:
> On 10/08/10 09:00, Christoffer Haglund wrote:
>
> > Oh, sorry, that last reply wasn't very clear, let my try again. The
> > problem occurs inside some proprietary code, so unfortunately I can't
> > just copy the exact code that fails, but I'll try to describe it.
> >
> > There's a struct with the following definition
> >
> > typedef struct _session {
> > ...
> > char generic_buffer[255];
> > } session_t;
>
> Well the first question I'd be asking here, given that you are later
> accessing that buffer though an int pointer, is whether it is aligned on
> a 4 byte boundary or not..
>
> > The function has the following definition:
> >
> > status_t do_stuff(input_t * input, char * charPtr)
> > {
> > int n = calculate_n(input);
> > int value = calculate_value(input);
> > int * intPrt = (int *) charPtr;
> > for ( ; n < 0; n-- )
> > *intPtr++ = value;
> > }
> >
> > ...and when debugging I can see that calculate_n() returns 27 and
> > calculate_value() returns 0x000000 for the test case when the
> > application fails (which are correct values, compared to when the
> > application is built with another compiler or with "-fno-tree-vectorize").
>
> As I said above, my guess is that you have an alignment problem. On x86
> the alignment doesn't normally matter (though doing 4 byte accesses
> through an unaligned pointer is likely to be inefficient).
>
> Once you start vectorising and using SSE instructions that will change
> however as many SSE instructions do care about alignment and will fault
> if the data is not correctly aligned.
>
> Most probably the reason it works in valgrind is that in the process of
> annotating the code the actual instructions being used are changed and
> the ones valgrind has selected don't have the same alignment requirements.
>
> Tom
>
|
|
From: Tom H. <to...@co...> - 2010-08-10 11:10:29
|
On 10/08/10 10:08, Christoffer Haglund wrote: > Ok, I think I understand. This means the test application I sent in my > previous mail shows the lack of alignment-awareness rather than an > actual bug, right? Well it's a bug in your application rather than a bug in valgrind. I'm not quite sure how the compiler's vectoriser knows that it is safe to use an instruction that requires 16 byte alignment to process an array of ints, but it is certainly entitled to assume that an array of int's is 4 byte aligned and you have tricked it with your cast. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |