|
From: Carl E. L. <ce...@us...> - 2017-04-25 17:36:29
|
Valgrind developers:
The GCC 7 compiler for power has a new optimization that is tripping up
Valgrind. Basically they are taking the strcmp() function and doing
some inlining of the code. Here is a snipit of the generated code
return strcmp(str1, str2);
100004bc: 28 4c 20 7d ldbrx r9,0,r9
100004c0: 28 54 40 7d ldbrx r10,0,r10
100004c4: 51 48 6a 7c subf. r3,r10,r9
100004c8: 1c 00 82 40 bne 100004e4 <main+0x84>
100004cc: f8 1b 2a 7d cmpb r10,r9,r3
100004d0: 00 00 aa 2f cmpdi cr7,r10,0
100004d4: 38 00 9e 41 beq cr7,1000050c <main+0xac>
The inlined code has two load double word instructions (ldbrx inst) that
are partially uninitialized. Following the two double word loads we do a
subf. instruction to subtract the values and set the condition code.
Then we get to the branch instruction (bne) and valgrind flags the
error:
==23948== Conditional jump or move depends on uninitialised value(s)
==23948== at 0x100004C8: main (bug80497.c:9)
==23948==
==23948== Syscall param exit_group(status) contains uninitialised
byte(s)
==23948== at 0x41BDEA4: _Exit (_exit.c:31)
The code has some cmpb instructions to make sure they don't actually use
the uninitialized bytes but that doesn't really help Valgrind. I was
thinking of trying to create a rule to ignore the error but not sure I
can do this as it is inlined code. It isn't like trying to ignore
errors from a function. I have looked a little at the suppression rules
and from what I know of them it isn't clear how to write one for this
case were inlined code could show up anywhere.
Wondering if anyone has thoughts on how to asddress fixing the issue or
how to suppress the issue?
|
|
From: Patrick J. L. <lop...@gm...> - 2017-04-25 18:19:34
|
This sort of code is supposed to be handled by "--partial-loads-ok=yes". (Which should be made the default, in my opinion.) If that does not work, it is a bug in the partial-loads-ok support. - Pat On Tue, Apr 25, 2017 at 10:36 AM, Carl E. Love <ce...@us...> wrote: > Valgrind developers: > > The GCC 7 compiler for power has a new optimization that is tripping up > Valgrind. Basically they are taking the strcmp() function and doing > some inlining of the code. Here is a snipit of the generated code > > return strcmp(str1, str2); > 100004bc: 28 4c 20 7d ldbrx r9,0,r9 > 100004c0: 28 54 40 7d ldbrx r10,0,r10 > 100004c4: 51 48 6a 7c subf. r3,r10,r9 > 100004c8: 1c 00 82 40 bne 100004e4 <main+0x84> > 100004cc: f8 1b 2a 7d cmpb r10,r9,r3 > 100004d0: 00 00 aa 2f cmpdi cr7,r10,0 > 100004d4: 38 00 9e 41 beq cr7,1000050c <main+0xac> > > The inlined code has two load double word instructions (ldbrx inst) that > are partially uninitialized. Following the two double word loads we do a > subf. instruction to subtract the values and set the condition code. > Then we get to the branch instruction (bne) and valgrind flags the > error: > > ==23948== Conditional jump or move depends on uninitialised value(s) > ==23948== at 0x100004C8: main (bug80497.c:9) > ==23948== > ==23948== Syscall param exit_group(status) contains uninitialised > byte(s) > ==23948== at 0x41BDEA4: _Exit (_exit.c:31) > > The code has some cmpb instructions to make sure they don't actually use > the uninitialized bytes but that doesn't really help Valgrind. I was > thinking of trying to create a rule to ignore the error but not sure I > can do this as it is inlined code. It isn't like trying to ignore > errors from a function. I have looked a little at the suppression rules > and from what I know of them it isn't clear how to write one for this > case were inlined code could show up anywhere. > > Wondering if anyone has thoughts on how to asddress fixing the issue or > how to suppress the issue? > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Mark W. <ma...@kl...> - 2017-04-25 18:36:27
|
On Tue, 2017-04-25 at 11:19 -0700, Patrick J. LoPresti wrote:
> This sort of code is supposed to be handled by
> "--partial-loads-ok=yes". (Which should be made the default, in my
> opinion.)
It already is, according to NEWS, since:
Release 3.11.0 (22 September 2015)
- The default value for --partial-loads-ok has been changed
from
"no" to "yes", so as to avoid false positive errors
resulting
from some kinds of vectorised loops.
Cheers,
Mark
|
|
From: Julian S. <js...@ac...> - 2017-04-25 18:32:54
|
> The inlined code has two load double word instructions (ldbrx inst) that > are partially uninitialized. Following the two double word loads we do a > subf. instruction to subtract the values and set the condition code. Does it help to run with --expensive-definedness-checks=yes? That enables more accurate but more expensive definedness tracking for subtracts, among other things. J |
|
From: Julian S. <js...@ac...> - 2017-04-25 18:42:54
|
On 25/04/17 20:19, Patrick J. LoPresti wrote: > This sort of code is supposed to be handled by > "--partial-loads-ok=yes". (Which should be made the default, in my > opinion.) I thought it was the default now. Isn't it? It certainly should be. But I think that's not the issue. Memcheck isn't complaining about bad addresses here. It's complaining that the resulting branch is on undefined data -- and that undefined data exists exactly because the partial-loads machinery paints the data acquired from the invalid areas as undefined. So the partial-loads machinery is working perfectly. IIUC, that is. J |
|
From: Patrick J. L. <lop...@gm...> - 2017-04-25 19:07:56
|
On Tue, Apr 25, 2017 at 11:42 AM, Julian Seward <js...@ac...> wrote: > > I thought it was the default now. Isn't it? It certainly should be. > > But I think that's not the issue. Memcheck isn't complaining about > bad addresses here. It's complaining that the resulting branch is > on undefined data -- and that undefined data exists exactly because > the partial-loads machinery paints the data acquired from the > invalid areas as undefined. So the partial-loads machinery is > working perfectly. IIUC, that is. Yes, my mistake. I should have read more carefully. This looks very hard to fix in general, since Valgrind is correct that the branch depends on invalid data. Setting partial-loads-ok=no would not help; it would just move the complaint to the memory access itself. Maybe try compiling the application with "-fno-builtin-strcmp"? - Pat |
|
From: Carl E. L. <ce...@us...> - 2017-04-25 19:22:45
|
> On Tue, 2017-04-25 at 11:19 -0700, Patrick J. LoPresti wrote:
>> This sort of code is supposed to be handled by
>> "--partial-loads-ok=yes". (Which should be made the default, in my
>> opinion.)
>>
>> If that does not work, it is a bug in the partial-loads-ok support.
On Tue, 2017-04-25 at 20:32 +0200, Julian Seward wrote:
> > The inlined code has two load double word instructions (ldbrx inst) that
> > are partially uninitialized. Following the two double word loads we do a
> > subf. instruction to subtract the values and set the condition code.
>
> Does it help to run with --expensive-definedness-checks=yes? That
> enables more accurate but more expensive definedness tracking for
> subtracts, among other things.
>
> J
>
Julian, Patrick:
So I tried the --partial-loads first. It didn't change things.
valgrind --partial-loads-ok=yes ./bug80497-gcc7 --track-origins=yes
==32322== Memcheck, a memory error detector
==32322== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==32322== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info
==32322== Command: ./bug80497-gcc7 --track-origins=yes
==32322==
==32322== Conditional jump or move depends on uninitialised value(s)
==32322== at 0x100004C8: main (bug80497.c:9)
==32322==
==32322== Syscall param exit_group(status) contains uninitialised byte(s)
==32322== at 0x41BDEA4: _Exit (_exit.c:31)
==32322== by 0x411520B: __run_exit_handlers (exit.c:98)
==32322== by 0x40F29A3: generic_start_main.isra.0 (libc-start.c:323)
==32322== by 0x40F2BB7: (below main) (libc-start.c:102)
I then tried the expensive-definedness-checking
valgrind --partial-loads-ok=yes ./bug80497-gcc7 --track-origins=yes
==32322== Memcheck, a memory error detector
==32322== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==32322== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for copyright info
==32322== Command: ./bug80497-gcc7 --track-origins=yes
==32322==
==32322== Conditional jump or move depends on uninitialised value(s)
==32322== at 0x100004C8: main (bug80497.c:9)
==32322==
==32322== Syscall param exit_group(status) contains uninitialised byte(s)
==32322== at 0x41BDEA4: _Exit (_exit.c:31)
==32322== by 0x411520B: __run_exit_handlers (exit.c:98)
==32322== by 0x40F29A3: generic_start_main.isra.0 (libc-start.c:323)
==32322== by 0x40F2BB7: (below main) (libc-start.c:102)
I did try recompiling the test case with -fno-builtin-strcmp and running without any
additional Valgrind flags and still got the issue. The option does not turn off the
optimization of the strcmp. Even if it did, I am not sure that would be a satisfactory
solution for the user in this case. I will have to check with them for sure.
Other ideas?
Carl Love
|
|
From: Patrick J. L. <lop...@gm...> - 2017-04-25 23:21:25
|
On Tue, Apr 25, 2017 at 12:22 PM, Carl E. Love <ce...@us...> wrote:
>
> I did try recompiling the test case with -fno-builtin-strcmp and running without any
> additional Valgrind flags and still got the issue.
Hm. You are sure the warning is still from the application code and
not the C library?
That is... strange. I thought the whole point of -fno-builtin-XXX was
to force the compiler to invoke the C library, forgetting whatever it
thinks it knows about the function. Is it possible strcmp() is
actually "inline" in your glibc headers?
Another idea: I suspect it would be pretty simple to hack the
partial-loads-ok logic to mark the "extra" bytes as defined rather
than undefined. Search memcheck/mc_main.c for "partial-loads-ok"; the
change should be just a line or two in mc_LOADV_128_or_256_slow().
...
Specifically, you can try commenting out this "for" loop on lines
1354-1355 of mc_main.c:
for (j = szL-1; j >= 0; j--)
res[j] |= pessim[j];
(Note: It has been a long time since I was in this code, so I could be
completely wrong. If so, hopefully Julian will correct me...)
Of course, this does risk masking real errors. But it should be a
quick way to move forward while exploring other options. And this sort
of partially-valid aligned load is probably not terribly common apart
from this sort of optimization.
That said, -fno-builtin-strcmp really should have caused GCC to emit a
call to the C library strcmp(). I suggest investigating why it didn't.
- Pat
|
|
From: Ivo R. <iv...@iv...> - 2017-04-26 01:04:47
|
2017-04-26 1:21 GMT+02:00 Patrick J. LoPresti <lop...@gm...>: > On Tue, Apr 25, 2017 at 12:22 PM, Carl E. Love <ce...@us...> wrote: >> >> I did try recompiling the test case with -fno-builtin-strcmp and running without any >> additional Valgrind flags and still got the issue. > > Hm. You are sure the warning is still from the application code and > not the C library? I'd suggest to try gdb+vgdb combo, as described nicely for example here: http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver Start your application with: --vgdb=full --vgdb-error=1 [--vgdb-shadow-registers=yes] and after attaching with gdb you'll get exact location which is causing all the trouble. I. |
|
From: Julian S. <js...@ac...> - 2017-04-26 14:11:25
|
Carl,
There are various refinements in memcheck/mc_translate.c that are related
to accurate definedness tracking in this kind of case, for
Iop_Cmp{32,64}, Iop_{Add,Sub}{32,64}, and Iop_CmpORD{32,64}{S,U}, the
last of which are PPC64 specials. But they don't seem to have had any
effect here.
To diagnose this we really need to see the front end translation of the
assembly fragment you showed, plus the uninstrumented, optimised IR
relating to it. You should be able to get those with --tool=none
--trace-flags=10001000 --trace-notbelow=<whatever>. Please get those
and then we can see what's going on. Maybe!
J
|
|
From: Carl E. L. <ce...@us...> - 2017-04-26 16:03:32
Attachments:
bug80497-debug-trimmed
|
On Wed, 2017-04-26 at 16:11 +0200, Julian Seward wrote:
> Carl,
>
> There are various refinements in memcheck/mc_translate.c that are related
> to accurate definedness tracking in this kind of case, for
> Iop_Cmp{32,64}, Iop_{Add,Sub}{32,64}, and Iop_CmpORD{32,64}{S,U}, the
> last of which are PPC64 specials. But they don't seem to have had any
> effect here.
>
> To diagnose this we really need to see the front end translation of the
> assembly fragment you showed, plus the uninstrumented, optimised IR
> relating to it. You should be able to get those with --tool=none
> --trace-flags=10001000 --trace-notbelow=<whatever>. Please get those
> and then we can see what's going on. Maybe!
>
> J
>
Julian:
Here is the test program.
#include <string.h>
int main ()
{
char str1[15];
char str2[15];
strcpy(str1, "abcdef");
strcpy(str2, "ABCDEF");
return strcmp(str1, str2);
}
Here is a dump of the Power binary produced by gcc7 for the code we are
interested in.
0000000010000460 <main>:
#include <string.h>
int main ()
{
10000460: 02 10 40 3c lis r2,4098
10000464: 00 7f 42 38 addi r2,r2,32512
char str1[15];
char str2[15];
strcpy(str1, "abcdef");
strcpy(str2, "ABCDEF");
10000468: fe ff 22 3d addis r9,r2,-2
1000046c: 40 8a 29 81 lwz r9,-30144(r9)
10000470: fe ff 42 3d addis r10,r2,-2
10000474: 46 8a 4a 89 lbz r10,-30138(r10)
{
10000478: c1 ff 21 f8 stdu r1,-64(r1)
strcpy(str1, "abcdef");
1000047c: fe ff a2 3c addis r5,r2,-2
10000480: 38 8a a5 80 lwz r5,-30152(r5)
10000484: fe ff c2 3c addis r6,r2,-2
10000488: 3c 8a c6 a0 lhz r6,-30148(r6)
1000048c: fe ff e2 3c addis r7,r2,-2
10000490: 3e 8a e7 88 lbz r7,-30146(r7)
strcpy(str2, "ABCDEF");
10000494: fe ff 02 3d addis r8,r2,-2
10000498: 44 8a 08 a1 lhz r8,-30140(r8)
1000049c: 20 00 21 91 stw r9,32(r1)
100004a0: 26 00 41 99 stb r10,38(r1)
return strcmp(str1, str2);
100004a4: 30 00 21 39 addi r9,r1,48
100004a8: 20 00 41 39 addi r10,r1,32
strcpy(str1, "abcdef");
100004ac: 30 00 a1 90 stw r5,48(r1)
100004b0: 34 00 c1 b0 sth r6,52(r1)
100004b4: 36 00 e1 98 stb r7,54(r1)
strcpy(str2, "ABCDEF");
100004b8: 24 00 01 b1 sth r8,36(r1)
return strcmp(str1, str2);
100004bc: 28 4c 20 7d ldbrx r9,0,r9
100004c0: 28 54 40 7d ldbrx r10,0,r10
100004c4: 51 48 6a 7c subf. r3,r10,r9
100004c8: 1c 00 82 40 bne 100004e4 <main+0x84>
100004cc: f8 1b 2a 7d cmpb r10,r9,r3
100004d0: 00 00 aa 2f cmpdi cr7,r10,0
100004d4: 38 00 9e 41 beq cr7,1000050c <main+0xac>
}
100004d8: b4 07 63 7c extsw r3,r3
100004dc: 40 00 21 38 addi r1,r1,64
100004e0: 20 00 80 4e blr
return strcmp(str1, str2);
100004e4: 00 00 00 39 li r8,0
100004e8: f8 53 23 7d cmpb r3,r9,r10
100004ec: f8 43 28 7d cmpb r8,r9,r8
100004f0: 38 1b 03 7d orc r3,r8,r3
100004f4: 74 00 63 7c cntlzd r3,r3
100004f8: 08 00 63 38 addi r3,r3,8
100004fc: 30 1e 29 79 rldcl r9,r9,r3,56
10000500: 30 1e 4a 79 rldcl r10,r10,r3,56
10000504: 50 48 6a 7c subf r3,r10,r9
10000508: d0 ff ff 4b b 100004d8 <main+0x78>
The loads in question are at addresses 100004bc and 100004c0. The
optimization loads these partially ininitialized values. The compiler
uses the cmpb instruction to make sure it really only looks at the valid
bytes, but as we said Valgrind doesn't know all that.
I ran valgrind as:
valgrind --tool=none --trace-flags=10001000
--trace-notbelow=1408 ./bug80497-gcc7 > bug80497-debug 2>&1
Took a little playing but it looks like SB 1408 corresponds to the
beginning of main and the above assembly code runs thru SB1409. I
edited down the valgrind output to just SB1408 and SB1409. I have
attached it as a file. I have the complete output if I threw away too
much. Thanks for your help on this.
Carl Love
|
|
From: Julian S. <js...@ac...> - 2017-04-27 15:02:49
|
The important IR bits are these:
100004bc: 28 4c 20 7d ldbrx r9,0,r9
100004c0: 28 54 40 7d ldbrx r10,0,r10
// t143 is r9 after the load
// t166 is r10 after the load
100004c4: 51 48 6a 7c subf. r3,r10,r9
t69 = Sub64(t143,t166) // r3
t167 = 64to8(CmpORD64S(t69,0x0:I64))
100004c8: 1c 00 82 40 bne 100004e4 <main+0x84>
if (CmpNE32(Xor32(And32(8Uto32(t167),0x2:I32),0x2:I32),0x0:I32))
{ PUT(1296) = 0x100004E4:I64; exit-Boring }
My best guess is, this is a problem with the instrumentation of the
CmpORD64S. What that does is to compare t69 against zero, and produce
a 3 bit value like this:
8 if t69 < 0, 4 if t69 > 0, 2 if t69 == 0
A bit strange but that's how the Power integer condition codes work.
And you can see that the branch condition in the IR tests for 2, meaning
equal.
This is handled by doCmpORD in mc_translate.c. It special cases the case
where the second arg is zero, but only for the < (8) case: that bit is a
copy of the most significant bit of t69, so it doesn't matter if all the
other bits are undefined. But it doesn't special case the == (2) case, and
so that bit is regarded as undefined if any bit in t69 is undefined. But
that's too pessimistic. In fact the "is t69 == 0" question is a defined
"no" if we can find any bit in t69 which is nonzero and defined. The
general logic is explained a bit more in the comment further up, "Accurate
interpretation of CmpEQ/CmpNE."
Applying that to this problem would fix it, I suspect.
J
|
|
From: Carl E. L. <ce...@us...> - 2017-05-01 19:58:40
|
On Thu, 2017-04-27 at 17:02 +0200, Julian Seward wrote:
> The important IR bits are these:
>
> 100004bc: 28 4c 20 7d ldbrx r9,0,r9
> 100004c0: 28 54 40 7d ldbrx r10,0,r10
> // t143 is r9 after the load
> // t166 is r10 after the load
>
> 100004c4: 51 48 6a 7c subf. r3,r10,r9
> t69 = Sub64(t143,t166) // r3
> t167 = 64to8(CmpORD64S(t69,0x0:I64))
>
> 100004c8: 1c 00 82 40 bne 100004e4 <main+0x84>
> if (CmpNE32(Xor32(And32(8Uto32(t167),0x2:I32),0x2:I32),0x0:I32))
> { PUT(1296) = 0x100004E4:I64; exit-Boring }
>
> My best guess is, this is a problem with the instrumentation of the
> CmpORD64S. What that does is to compare t69 against zero, and produce
> a 3 bit value like this:
>
> 8 if t69 < 0, 4 if t69 > 0, 2 if t69 == 0
>
> A bit strange but that's how the Power integer condition codes work.
> And you can see that the branch condition in the IR tests for 2, meaning
> equal.
>
> This is handled by doCmpORD in mc_translate.c. It special cases the case
> where the second arg is zero, but only for the < (8) case: that bit is a
> copy of the most significant bit of t69, so it doesn't matter if all the
> other bits are undefined. But it doesn't special case the == (2) case, and
> so that bit is regarded as undefined if any bit in t69 is undefined. But
> that's too pessimistic. In fact the "is t69 == 0" question is a defined
> "no" if we can find any bit in t69 which is nonzero and defined. The
> general logic is explained a bit more in the comment further up, "Accurate
> interpretation of CmpEQ/CmpNE."
>
> Applying that to this problem would fix it, I suspect.
doCmpORD should be calculating the shadow bit values for the LT, EQ and
GE bits. The LT bit is 1 if the MSB of xxhash is a 1. I figured the EQ
bit should be 1 if all of the xxhash bits are undefined since we don't
have any valid bits in xx to determine if the valid bits in xx are equal
to zero. Similarly, the GT bit would be undefined if there were no
defined bits in xxhash. This implementation generated more warnings
then the original code. So either my understanding of what doCmpORD is
calculating is wrong or my algorithm for setting the EQ and GT bits is
wrong.
Just for grins, I had doCmpORD return all zeros. The comparison with
uninitialized bits goes away but I get the error "Syscall param
exit_group(status) contains uninitialised byte". Not sure why the
syscall now has uninitialized data.
Anyway, am I wrong on what doCmpORD is doing? Thoughts on how to
determine the Eq and GT bits for this case? Thanks for the help.
Carl
|
|
From: Carl E. L. <ce...@us...> - 2017-05-03 16:38:07
|
On Mon, 2017-05-01 at 12:58 -0700, Carl E. Love wrote:
> On Thu, 2017-04-27 at 17:02 +0200, Julian Seward wrote:
> > The important IR bits are these:
> >
> > 100004bc: 28 4c 20 7d ldbrx r9,0,r9
> > 100004c0: 28 54 40 7d ldbrx r10,0,r10
> > // t143 is r9 after the load
> > // t166 is r10 after the load
> >
> > 100004c4: 51 48 6a 7c subf. r3,r10,r9
> > t69 = Sub64(t143,t166) // r3
> > t167 = 64to8(CmpORD64S(t69,0x0:I64))
> >
> > 100004c8: 1c 00 82 40 bne 100004e4 <main+0x84>
> > if (CmpNE32(Xor32(And32(8Uto32(t167),0x2:I32),0x2:I32),0x0:I32))
> > { PUT(1296) = 0x100004E4:I64; exit-Boring }
> >
> > My best guess is, this is a problem with the instrumentation of the
> > CmpORD64S. What that does is to compare t69 against zero, and produce
> > a 3 bit value like this:
> >
> > 8 if t69 < 0, 4 if t69 > 0, 2 if t69 == 0
> >
> > A bit strange but that's how the Power integer condition codes work.
> > And you can see that the branch condition in the IR tests for 2, meaning
> > equal.
> >
> > This is handled by doCmpORD in mc_translate.c. It special cases the case
> > where the second arg is zero, but only for the < (8) case: that bit is a
> > copy of the most significant bit of t69, so it doesn't matter if all the
> > other bits are undefined. But it doesn't special case the == (2) case, and
> > so that bit is regarded as undefined if any bit in t69 is undefined. But
> > that's too pessimistic. In fact the "is t69 == 0" question is a defined
> > "no" if we can find any bit in t69 which is nonzero and defined. The
> > general logic is explained a bit more in the comment further up, "Accurate
> > interpretation of CmpEQ/CmpNE."
> >
> > Applying that to this problem would fix it, I suspect.
>
Here is my patch with two attempts to fix the issue. I think attempt 1
is actually flawed as I think I was returning the value of the condition
code bits not if they were define. I think fix 2 is the right one, but
I still get the same error plus an additional error about a syscall with
uninitialized data.
Carl Love
>From e7ee8ff9ac7efded61e2288698e09781d37e5520 Mon Sep 17 00:00:00 2001
From: Carl Love <ce...@us...>
Date: Wed, 3 May 2017 11:33:22 -0500
Subject: [PATCH] Attempted fix for compare of uninitialized data.
---
memcheck/mc_translate.c | 148 +++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 128 insertions(+), 20 deletions(-)
diff --git a/memcheck/mc_translate.c b/memcheck/mc_translate.c
index 0b91840..5f08acd 100644
--- a/memcheck/mc_translate.c
+++ b/memcheck/mc_translate.c
@@ -1108,8 +1108,13 @@ static IRAtom* doCmpORD ( MCEnv* mce,
{
Bool m64 = cmp_op == Iop_CmpORD64S || cmp_op == Iop_CmpORD64U;
Bool syned = cmp_op == Iop_CmpORD64S || cmp_op == Iop_CmpORD32S;
+ IROp opNOT = m64 ? Iop_Not64 : Iop_Not32;
IROp opOR = m64 ? Iop_Or64 : Iop_Or32;
+ IROp opXOR = m64 ? Iop_Xor64 : Iop_Xor32;
IROp opAND = m64 ? Iop_And64 : Iop_And32;
+ IROp opEQ = m64 ? Iop_CmpEQ64 : Iop_CmpEQ32;
+ IROp opNE = m64 ? Iop_CmpNE64 : Iop_CmpNE32;
+ IROp opWIDEN = m64 ? Iop_1Uto64 : Iop_1Uto32;
IROp opSHL = m64 ? Iop_Shl64 : Iop_Shl32;
IROp opSHR = m64 ? Iop_Shr64 : Iop_Shr32;
IRType ty = m64 ? Ity_I64 : Ity_I32;
@@ -1119,6 +1124,10 @@ static IRAtom* doCmpORD ( MCEnv* mce,
IRAtom* threeLeft1 = NULL;
IRAtom* sevenLeft1 = NULL;
+ IRAtom* lt_bit = NULL;
+ IRAtom* gt_bit = NULL;
+ IRAtom* eq_bit = NULL;
+ IRAtom* tmp_bit = NULL;
tl_assert(isShadowAtom(mce,xxhash));
tl_assert(isShadowAtom(mce,yyhash));
@@ -1139,26 +1148,125 @@ static IRAtom* doCmpORD ( MCEnv* mce,
/* if yy is zero, then it must be fully defined (zero#). */
tl_assert(isZero(yyhash));
threeLeft1 = m64 ? mkU64(3<<1) : mkU32(3<<1);
- return
- binop(
- opOR,
- assignNew(
- 'V', mce,ty,
- binop(
- opAND,
- mkPCastTo(mce,ty, xxhash),
- threeLeft1
- )),
- assignNew(
- 'V', mce,ty,
- binop(
- opSHL,
- assignNew(
- 'V', mce,ty,
- binop(opSHR, xxhash, mkU8(width-1))),
- mkU8(3)
- ))
- );
+#if 0
+ /* Approach 1. What I did here was calculate the value of the eq and gt
+ bits for the comparison versus zero, i.e. set the condition code
+ result bit. We use the xxhash bits as a mask to get the valid xx bits
+ and then compare that with zero. The returned value for EQ is
+ NOT(NOT(xxhash) AND (xx)) EQ 0) return value should be 0 if the bits
+ are defined and equal to zero. The value for GT is
+ NOT(NOT(xxhash) AND (xx)) NE 0). When comparing to zero NE implies GT.
+
+ THIS APPROACH IS REALLY CALCULATING THE ACTUAL CONDITION CODE VALUES
+ NOT IF BITS ARE VALID
+
+ This approach results in 384 condigional jump/move on uninitialized
+ values instead of just 1 for the test case.
+ */
+ tmp_bit = assignNew('V', mce,ty,
+ binop(opAND,
+ assignNew('V', mce,ty,
+ unop( opNOT,
+ assignNew('V', mce,ty,
+ mkPCastTo(mce,ty, xxhash)))),
+ assignNew('V', mce,ty, xx )));
+
+ /* The comparison gives 1 for EQ which implies value is defined. But "defined" says the shadow bit must be zero so we have
+ to NOT the result.
+ */
+ eq_bit = assignNew('V', mce,ty,
+ unop(opNOT,
+ assignNew('V', mce,ty,
+ unop(opWIDEN,
+ assignNew('V', mce,Ity_I1,
+ binop(opEQ, tmp_bit,
+ (m64 ? mkU64(0) : mkU32(0))))))));
+
+ /* since we are comparing with zero, NE implies xx must be GT zero */
+ gt_bit = assignNew('V', mce,ty,
+ unop(opNOT,
+ assignNew('V', mce,ty,
+ unop(opWIDEN,
+ assignNew('V', mce,Ity_I1,
+ binop(opNE, tmp_bit,
+ (m64 ? mkU64(0) : mkU32(0))))))));
+
+ // old gt calc assignNew('V', mce,ty, binop(opXOR, lt_bit, assignNew('V', mce,ty, unop(opNOT, eq_bit))));
+ lt_bit = assignNew('V', mce,ty, binop(opSHR, xxhash, mkU8(width-1)));
+
+ return binop( opOR,
+ assignNew(
+ 'V', mce,ty,
+ binop(
+ opAND,
+ mkPCastTo(mce,ty, xxhash),
+ threeLeft1
+ )),
+ assignNew(
+ 'V', mce,ty,
+ binop( opOR,
+ assignNew(
+ 'V', mce,ty,
+ binop( opOR,
+ assignNew(
+ 'V', mce,ty,
+ binop(
+ opSHL, lt_bit,
+ mkU8(3))),
+ assignNew(
+ 'V', mce,ty,
+ binop(
+ opSHL, eq_bit,
+ mkU8(2))))),
+ assignNew(
+ 'V', mce,ty,
+ binop(
+ opSHL, gt_bit,
+ mkU8(1))))));
+#else
+ /* Approach 2 is to return 0 for EQ AND GT if there are some xx bits that
+ are defined for use in the comparison. Or in other words if everything
+ is undefined, there are no valid bits that can be compared. EQ and GT
+ shadow bits are set to 1, i.e. undefined */
+ lt_bit = assignNew('V', mce,ty, binop(opSHR, xxhash, mkU8(width-1)));
+
+ /* Comparison is 1 if TRUE (i.e. bits are all undefined).
+
+ * THIS APPROACH STILL RESULTS IN 1 CONDITIONAL JUMP/MOVE DEPENDS ON
+ * UNINITIALIZED VALUE INSTEAD PLUS "Syscall param exit_group
+ * (status) contains uninitialized byte" WHICH DID NOT OCCUR
+ * BEFORE
+ */
+ tmp_bit = assignNew('V', mce,ty,
+ unop(opWIDEN,
+ assignNew('V', mce,Ity_I1,
+ binop( opEQ, (m64 ? mkU64(0xFFFFFFFFFFFFFFFF) : mkU32(0xFFFFFFFF)),
+ assignNew('V', mce,ty,
+ mkPCastTo(mce,ty,
+ xxhash))))));
+ return binop( opOR,
+ assignNew(
+ 'V', mce,ty,
+ binop(
+ opAND,
+ mkPCastTo(mce,ty, xxhash),
+ threeLeft1
+ )),
+ assignNew(
+ 'V', mce,ty,
+ binop( opOR,
+ assignNew(
+ 'V', mce,ty,
+ binop( opAND,
+ tmp_bit,
+ (m64 ? mkU64(0x6) : mkU32(0x6)))),
+ assignNew(
+ 'V', mce,ty,
+ binop(
+ opSHL, lt_bit,
+ mkU8(3))))));
+#endif
+
} else {
/* standard interpretation */
sevenLeft1 = m64 ? mkU64(7<<1) : mkU32(7<<1);
--
2.11.0
|
|
From: Julian S. <js...@ac...> - 2017-05-08 14:01:53
|
Hmm, this bug looks suspiciously similar to https://bugs.kde.org/show_bug.cgi?id=352364 Carl, do you know if it is identical? J |
|
From: Carl E. L. <ce...@us...> - 2017-05-08 15:12:20
|
On Mon, 2017-05-08 at 16:01 +0200, Julian Seward wrote: > Hmm, this bug looks suspiciously similar to > https://bugs.kde.org/show_bug.cgi?id=352364 > > Carl, do you know if it is identical? > > J > My understanding is this bug does a comparison against a constant that is not zero. The bug I was working on compares with zero so they take different paths in the CmpORD function. But, the underlying issues seem to be similar. This bug talks about completely redoing the PPC condition code handling to make it more compatible with the other architectures which presumably would fix both issues. Not sure exactly what all would be involved in doing this. Any thoughts on the patch I sent? It didn't seem to work as hoped and not sure why. Carl Love |