|
From: Andrew C. M. <and...@gm...> - 2012-08-27 21:57:01
|
Hi - I recently built valgrind-3.8.0 using clang from XCode 4.5, via macports, on OS X 10.8.1. It builds successfully, but the resulting valgrind binary doesn't work. Any invocation of valgrind (even just 'valgrind --help') fails as follows: --2030:0:main Root stack 0x238E20000 to 0x238F24000, a local 0x238F21FB0 --2030:0:main Valgrind: FATAL: Initial stack switched failed. --2030:0:main Cannot continue. Sorry. Turning on debug logging doesn't reveal anything interesting: --2127:1:debuglog DebugLog system started by Stage 1, level 1 logging requested --2127:1:launcher no tool requested, defaulting to 'memcheck' --2127:1:launcher valgrind_lib = /opt/local/lib/valgrind --2127:1:launcher arch 'x86' IS NOT installed --2127:1:launcher arch 'amd64' IS installed --2127:1:launcher arch 'arm' IS NOT installed --2127:1:launcher arch 'ppc32' IS NOT installed --2127:1:launcher arch 'ppc64' IS NOT installed --2127:1:launcher no client specified, defaulting arch to 'amd64' --2127:1:launcher launcher_name = /opt/local/bin/valgrind --2127:1:launcher launching /opt/local/lib/valgrind/memcheck-amd64-darwin --2127:1:debuglog DebugLog system started by Stage 2 (main), level 1 logging requested --2127:1:main Welcome to Valgrind version 3.8.0 debug logging --2127:1:main Checking current stack is plausible --2127:0:main Root stack 0x238E20000 to 0x238F24000, a local 0x238F21FB0 --2127:0:main Valgrind: FATAL: Initial stack switched failed. --2127:0:main Cannot continue. Sorry. If I rebuild valgrind with llvm-gcc-4.2 as the compiler, all is well. Is this a known issue? I searched the bug database but nothing jumped out. If not, does anyone have any thoughts on what may be causing this or where to start debugging? Obviously I can work around by building with GCC, but AFAIK clang is the future on Apple platforms so it would be good if this worked. Thanks, Andrew |
|
From: Julian S. <js...@ac...> - 2012-08-28 09:06:28
|
On Monday, August 27, 2012, Andrew C. Morrow wrote:
> --2030:0:main Root stack 0x238E20000 to 0x238F24000, a local
> 0x238F21FB0 --2030:0:main Valgrind: FATAL: Initial stack switched
> failed.
> --2030:0:main Cannot continue. Sorry.
>
> If I rebuild valgrind with llvm-gcc-4.2 as the compiler, all is well.
Ah yes. I fell over this too, and documented it in the sources, but
failed to document it anywhere else.
It is to do with this bit of m_main.c, around line 1614 -- see the
comment. I didn't chase it further, because it compiled with gcc and
it had (and still does have) more serious problems on OSX 10.8, so
I spent time investigating those instead.
If you want to try and make sense of what's going on here with
clang, that would be good, since I find it hard to believe that
clang would really have miscompiled an expression as simple
as "aLocal < limLo || aLocal >= limHi".
J
//--------------------------------------------------------------
// Ensure we're on a plausible stack.
// p: logging
//--------------------------------------------------------------
VG_(debugLog)(1, "main", "Checking current stack is plausible\n");
{ HChar* limLo = (HChar*)(&VG_(interim_stack).bytes[0]);
HChar* limHi = limLo + sizeof(VG_(interim_stack));
HChar* aLocal = (HChar*)&limLo; /* any auto local will do */
/* "Apple clang version 4.0 (tags/Apple/clang-421.0.57) (based on
LLVM 3.1svn)" appears to miscompile the following check,
causing run to abort at this point (in 64-bit mode) even
though aLocal is within limLo .. limHi. Try building with
gcc instead. */
if (aLocal < limLo || aLocal >= limHi) {
/* something's wrong. Stop. */
VG_(debugLog)(0, "main", "Root stack %p to %p, a local %p\n",
limLo, limHi, aLocal );
VG_(debugLog)(0, "main", "Valgrind: FATAL: "
"Initial stack switched failed.\n");
VG_(debugLog)(0, "main", " Cannot continue. Sorry.\n");
VG_(exit)(1);
}
}
|
|
From: Andrew C. M. <and...@gm...> - 2012-08-28 15:27:41
|
Hi Julian -
Thanks for the pointer to the relevant code.
I can make the check work again by marking aLocal as volatile:
HChar* volatile aLocal = (HChar*)&limLo; /* any auto local will do */
Complete speculation, but maybe what is happening is that clang is
optimizing away the entire conditional as always taken, reasoning
(correctly or incorrectly) that an auto variable will never fall
within an address range that is reserved for an extern char array?
On Tue, Aug 28, 2012 at 5:05 AM, Julian Seward <js...@ac...> wrote:
>
> On Monday, August 27, 2012, Andrew C. Morrow wrote:
>> --2030:0:main Root stack 0x238E20000 to 0x238F24000, a local
>> 0x238F21FB0 --2030:0:main Valgrind: FATAL: Initial stack switched
>> failed.
>> --2030:0:main Cannot continue. Sorry.
>>
>> If I rebuild valgrind with llvm-gcc-4.2 as the compiler, all is well.
>
> Ah yes. I fell over this too, and documented it in the sources, but
> failed to document it anywhere else.
>
> It is to do with this bit of m_main.c, around line 1614 -- see the
> comment. I didn't chase it further, because it compiled with gcc and
> it had (and still does have) more serious problems on OSX 10.8, so
> I spent time investigating those instead.
>
> If you want to try and make sense of what's going on here with
> clang, that would be good, since I find it hard to believe that
> clang would really have miscompiled an expression as simple
> as "aLocal < limLo || aLocal >= limHi".
>
> J
>
>
> //--------------------------------------------------------------
> // Ensure we're on a plausible stack.
> // p: logging
> //--------------------------------------------------------------
> VG_(debugLog)(1, "main", "Checking current stack is plausible\n");
> { HChar* limLo = (HChar*)(&VG_(interim_stack).bytes[0]);
> HChar* limHi = limLo + sizeof(VG_(interim_stack));
> HChar* aLocal = (HChar*)&limLo; /* any auto local will do */
> /* "Apple clang version 4.0 (tags/Apple/clang-421.0.57) (based on
> LLVM 3.1svn)" appears to miscompile the following check,
> causing run to abort at this point (in 64-bit mode) even
> though aLocal is within limLo .. limHi. Try building with
> gcc instead. */
> if (aLocal < limLo || aLocal >= limHi) {
> /* something's wrong. Stop. */
> VG_(debugLog)(0, "main", "Root stack %p to %p, a local %p\n",
> limLo, limHi, aLocal );
> VG_(debugLog)(0, "main", "Valgrind: FATAL: "
> "Initial stack switched failed.\n");
> VG_(debugLog)(0, "main", " Cannot continue. Sorry.\n");
> VG_(exit)(1);
> }
> }
>
|
|
From: Julian S. <js...@ac...> - 2012-08-28 15:44:10
|
On Tuesday, August 28, 2012, Andrew C. Morrow wrote: > I can make the check work again by marking aLocal as volatile: > > HChar* volatile aLocal = (HChar*)&limLo; /* any auto local will do */ > > Complete speculation, but maybe what is happening is that clang is > optimizing away the entire conditional as always taken, reasoning > (correctly or incorrectly) that an auto variable will never fall > within an address range that is reserved for an extern char array? Hmm, that's a good point. I couldn't figure out why it was failing. If you can verify that the change works with both clang and gcc then I'll be happy to commit it. J |
|
From: Andrew C. M. <and...@gm...> - 2012-08-28 17:03:05
|
> > Hmm, that's a good point. I couldn't figure out why it was failing. > > If you can verify that the change works with both clang and gcc then > I'll be happy to commit it. > I just tried it with both llvm-gcc-4.2 and clang, and the resulting valgrind binary does start up correctly in both cases. However, as subsequent emails from Florian and Patrick point out, this is still undefined behavior. |
|
From: Florian K. <br...@ac...> - 2012-08-28 16:34:33
|
On 08/28/2012 11:27 AM, Andrew C. Morrow wrote:
> Hi Julian -
>
> Thanks for the pointer to the relevant code.
>
> I can make the check work again by marking aLocal as volatile:
>
> HChar* volatile aLocal = (HChar*)&limLo; /* any auto local will do */
>
> Complete speculation, but maybe what is happening is that clang is
> optimizing away the entire conditional as always taken, reasoning
> (correctly or incorrectly) that an auto variable will never fall
> within an address range that is reserved for an extern char array?
The reason is that the comparisons aLocal < limLo and aLocal >= limHi
cause undefined behaviour (according to c99 6.5.8) because they compare
pointers that do not point into the same aggregate.
Compilers are free to do whatever they like in that case.
Another fix would be to transform the pointers to like-sized integers
and compare those. That kind of conversion causes implementation defined
behaviour which compilers are supposed to document. So you could look up
whether it works.. I suppose, using volatile is the easier "fix".
Florian
>> VG_(debugLog)(1, "main", "Checking current stack is plausible\n");
>> { HChar* limLo = (HChar*)(&VG_(interim_stack).bytes[0]);
>> HChar* limHi = limLo + sizeof(VG_(interim_stack));
>> HChar* aLocal = (HChar*)&limLo; /* any auto local will do */
>> /* "Apple clang version 4.0 (tags/Apple/clang-421.0.57) (based on
>> LLVM 3.1svn)" appears to miscompile the following check,
>> causing run to abort at this point (in 64-bit mode) even
>> though aLocal is within limLo .. limHi. Try building with
>> gcc instead. */
>> if (aLocal < limLo || aLocal >= limHi) {
|
|
From: Patrick J. L. <lop...@gm...> - 2012-08-28 16:56:44
|
On Tue, Aug 28, 2012 at 9:34 AM, Florian Krohm <br...@ac...> wrote: > > The reason is that the comparisons aLocal < limLo and aLocal >= limHi > cause undefined behaviour (according to c99 6.5.8) because they compare > pointers that do not point into the same aggregate. > Compilers are free to do whatever they like in that case. > Another fix would be to transform the pointers to like-sized integers > and compare those. This sounds like a better fix to me, if you do not mind using C99. Just cast to uintptr_t (from <stdint.h>). (If not C99, "unsigned long" should work.) Comparing such integers is technically implementation-defined, but (a) that is much better than "undefined" and (b) let's face it, on any conceivable system it will work fine. Assuming this actually fixes the problem, of course. - Pat |
|
From: Florian K. <br...@ac...> - 2012-08-28 17:33:13
|
On 08/28/2012 12:56 PM, Patrick J. LoPresti wrote: > On Tue, Aug 28, 2012 at 9:34 AM, Florian Krohm <br...@ac...> wrote: >> > Comparing such integers is technically implementation-defined, but (a) > that is much better than "undefined" Yes, very true. > and (b) let's face it, on any > conceivable system it will work fine. But so will be the volatile hack by making one of the pointers in the comparison volatile. The compiler would have to prove that there is undefined behavior in order to exploit it. But as a volatile variable can change its value in ways invisible to the compiler, the compiler must make the conservative assumption that it points into the same aggregate as the other pointer its compared against. I.e. the behaviour is possibly defined. Florian |
|
From: Julian S. <js...@ac...> - 2012-08-28 18:57:15
|
On Tuesday, August 28, 2012, Patrick J. LoPresti wrote: > On Tue, Aug 28, 2012 at 9:34 AM, Florian Krohm <br...@ac...> wrote: > > The reason is that the comparisons aLocal < limLo and aLocal >= limHi > > cause undefined behaviour (according to c99 6.5.8) because they compare > > pointers that do not point into the same aggregate. > > Compilers are free to do whatever they like in that case. > > Another fix would be to transform the pointers to like-sized integers > > and compare those. > > This sounds like a better fix to me, if you do not mind using C99. > Just cast to uintptr_t (from <stdint.h>). (If not C99, "unsigned > long" should work.) > > Comparing such integers is technically implementation-defined, but (a) > that is much better than "undefined" and (b) let's face it, on any > conceivable system it will work fine. > > Assuming this actually fixes the problem, of course. I don't remember exactly, but I think I tried casting the 3 values to "HWord" (unsigned int the size of a host word) and it still didn't work. But, Andrew, if you are inclined to check that, pls do :) Since the volatile fix is verified by Andrew as working, I'm inclined to go with it, unless the cast-to-int thing works, in which case let's do that instead. J |
|
From: Julian S. <js...@ac...> - 2012-08-28 18:59:35
|
On Tuesday, August 28, 2012, Florian Krohm wrote: > On 08/28/2012 11:27 AM, Andrew C. Morrow wrote: > > Hi Julian - > > > > Thanks for the pointer to the relevant code. > > > > I can make the check work again by marking aLocal as volatile: > > HChar* volatile aLocal = (HChar*)&limLo; /* any auto local will do > > */ > > > > Complete speculation, but maybe what is happening is that clang is > > optimizing away the entire conditional as always taken, reasoning > > (correctly or incorrectly) that an auto variable will never fall > > within an address range that is reserved for an extern char array? > > The reason is that the comparisons aLocal < limLo and aLocal >= limHi > cause undefined behaviour (according to c99 6.5.8) because they compare > pointers that do not point into the same aggregate. > Compilers are free to do whatever they like in that case. Wow .. I think I would not have guessed that in a long time. So, it seems we've been holed beneath the waterline by LLVM's alias analyser, yes? Interesting. J |
|
From: Andrew C. M. <and...@gm...> - 2012-08-28 19:42:50
|
On Tue, Aug 28, 2012 at 2:58 PM, Julian Seward <js...@ac...> wrote: > On Tuesday, August 28, 2012, Florian Krohm wrote: >> On 08/28/2012 11:27 AM, Andrew C. Morrow wrote: >> > Hi Julian - >> > >> > Thanks for the pointer to the relevant code. >> > >> > I can make the check work again by marking aLocal as volatile: >> > HChar* volatile aLocal = (HChar*)&limLo; /* any auto local will do >> > */ >> > >> > Complete speculation, but maybe what is happening is that clang is >> > optimizing away the entire conditional as always taken, reasoning >> > (correctly or incorrectly) that an auto variable will never fall >> > within an address range that is reserved for an extern char array? >> >> The reason is that the comparisons aLocal < limLo and aLocal >= limHi >> cause undefined behaviour (according to c99 6.5.8) because they compare >> pointers that do not point into the same aggregate. >> Compilers are free to do whatever they like in that case. > > Wow .. I think I would not have guessed that in a long time. > So, it seems we've been holed beneath the waterline by LLVM's alias > analyser, yes? Interesting. > > J I was curious about this, so I asked on stackoverflow: http://stackoverflow.com/questions/12165288/c99-is-it-possible-to-portably-determine-if-two-pointers-point-within-the-same The consensus seems to be that it can be done portably, but not in constant time, by relying on the fact that is is legal to compare any two pointers for equality, even if they are in different aggregates. Which in hindsight is sort of obvious. So the stack switch check could iterate across the VgStack and verify that it eventually reaches the address of aLocal. The check overhead would be linear in the size of a VgStack, but it is only once per valgrind invocation. The VgStack looks to be ~1MB. Would the overhead of iterating across interim_stack once at valgrind startup even be noticeable on modern hardware? |
|
From: Andrew C. M. <and...@gm...> - 2012-08-30 14:25:23
|
There were three possible solutions proposed in this thread: - Using volatile - Converting to an integer representation and comparing. - My proposal to just do a linear scan of interim_stack looking for aLocal I didn't get any feedback on the last idea, so I guess it wasn't too well received. Is there any consensus on what is the best fix? I'm happy to put together a patch if that would help get this resolved (or maybe someone with commit rights can just add 'volatile' if that is the chosen solution) but I'd rather make sure it is the right patch first. Thanks, Andrew |
|
From: Julian S. <js...@ac...> - 2012-08-30 14:46:00
|
On Thursday, August 30, 2012, Andrew C. Morrow wrote: > There were three possible solutions proposed in this thread: > > - Using volatile > - Converting to an integer representation and comparing. > - My proposal to just do a linear scan of interim_stack looking for aLocal > > I didn't get any feedback on the last idea, so I guess it wasn't too > well received. It sounded a bit expensive. > Is there any consensus on what is the best fix? I'm happy to put > together a patch if that would help get this resolved (or > maybe someone with commit rights can just add 'volatile' if that is > the chosen solution) but I'd rather make sure it is the right patch > first. I'd be happy to go with either the volatile thing, or the cast-to-HWord scheme assuming that works. I don't know enough about the details of the C99 language spec to guess which is less likely to get messed up by future compilers. Florian, any opinion here? J |
|
From: Florian K. <br...@ac...> - 2012-08-30 15:19:38
|
On 08/30/2012 10:25 AM, Andrew C. Morrow wrote: > There were three possible solutions proposed in this thread: > > - Using volatile > - Converting to an integer representation and comparing. > - My proposal to just do a linear scan of interim_stack looking for aLocal > > I didn't get any feedback on the last idea, so I guess it wasn't too > well received. Yeah, it seems excessive... > Is there any consensus on what is the best fix? Everybody felt, that relying on implementation behaviour would be better than relying on undefined behaviour be handled in a specific way. But it was not verified yet, that that casting the pointers to HWords and comparing them would actually work. Did you try it? Florian |
|
From: Andrew C. M. <and...@gm...> - 2012-09-10 01:19:45
|
On Thu, Aug 30, 2012 at 11:19 AM, Florian Krohm <br...@ac...> wrote: > On 08/30/2012 10:25 AM, Andrew C. Morrow wrote: >> There were three possible solutions proposed in this thread: >> >> - Using volatile >> - Converting to an integer representation and comparing. >> - My proposal to just do a linear scan of interim_stack looking for aLocal >> >> I didn't get any feedback on the last idea, so I guess it wasn't too >> well received. > > Yeah, it seems excessive... > >> Is there any consensus on what is the best fix? > > Everybody felt, that relying on implementation behaviour would be better > than relying on undefined behaviour be handled in a specific way. > But it was not verified yet, that that casting the pointers to HWords > and comparing them would actually work. Did you try it? > > Florian > Hi Florian - Apologies for the delay: I was away on vacation last week. I have not yet tried the HWord conversion but I will try to do so this week. Thanks, Andrew |
|
From: Florian K. <br...@ac...> - 2012-09-11 02:20:04
|
On 09/09/2012 09:19 PM, Andrew C. Morrow wrote: > On Thu, Aug 30, 2012 at 11:19 AM, Florian Krohm <br...@ac...> wrote: >> >> Everybody felt, that relying on implementation behaviour would be better >> than relying on undefined behaviour be handled in a specific way. >> But it was not verified yet, that that casting the pointers to HWords >> and comparing them would actually work. Did you try it? >> >> Florian >> > > Hi Florian - > > Apologies for the delay: I was away on vacation last week. I have not > yet tried the HWord conversion but I will try to do so this week. > Hi Andrew, before you embark on something do check the source. I think Julian already committed a change using volatile. Florian |