You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
1
(15) |
2
(13) |
3
(16) |
4
(12) |
5
(17) |
|
6
(16) |
7
(13) |
8
(15) |
9
(15) |
10
(18) |
11
(5) |
12
(17) |
|
13
(13) |
14
(13) |
15
(5) |
16
(13) |
17
(2) |
18
(19) |
19
(12) |
|
20
|
21
(22) |
22
(23) |
23
(23) |
24
(23) |
25
(20) |
26
(19) |
|
27
(33) |
28
(20) |
29
(15) |
30
(21) |
31
(20) |
|
|
|
From: Nicholas N. <n.n...@gm...> - 2012-05-21 23:25:14
|
On Mon, May 21, 2012 at 7:22 AM, Julian Seward <js...@ac...> wrote: > > I recently added some AVX support to V, and as a result added a new type > of 32-byte values (Ity_V256) to IR. Loads and stores of such values > cause Cachegrind and Callgrind to assert, because the size (32 bytes) > is larger than MIN_LINE_SIZE, which is 16. > > As they currently are, both tools refuse to process memory accesses > bigger than 16, on the basis that the minimum possible line size is 16, > and so a 16 byte access could access 2 adjacent lines, which is a > situation they are prepared to handle. But not 3 lines, which is a > possible case for a 32 byte access w/ 16 byte lines (something for > which I'm sure no hardware actually exists). Can we model the 32-byte accesses correctly and just assert if the 3 line case occurs? (I wrote that simulation code so long ago that I don't have any additional insight compared to someone looking at it afresh...) Nick |
|
From: <sv...@va...> - 2012-05-21 22:53:13
|
sewardj 2012-05-21 23:53:06 +0100 (Mon, 21 May 2012)
New Revision: 12574
Log:
Fix fallout from recent AVX commit (guest_XMMn no longer exist;
use guest_YMMn instead)
Modified files:
trunk/coregrind/m_syswrap/syswrap-amd64-darwin.c
Modified: trunk/coregrind/m_syswrap/syswrap-amd64-darwin.c (+34 -4)
===================================================================
--- trunk/coregrind/m_syswrap/syswrap-amd64-darwin.c 2012-05-21 22:57:05 +01:00 (rev 12573)
+++ trunk/coregrind/m_syswrap/syswrap-amd64-darwin.c 2012-05-21 23:53:06 +01:00 (rev 12574)
@@ -97,8 +97,23 @@
VexGuestAMD64State *vex)
{
// DDD: #warning GrP fixme fp state
-
- VG_(memcpy)(&mach->__fpu_xmm0, &vex->guest_XMM0, 16 * sizeof(mach->__fpu_xmm0));
+ // JRS: what about the YMMHI bits? Are they important?
+ VG_(memcpy)(&mach->__fpu_xmm0, &vex->guest_YMM0, sizeof(mach->__fpu_xmm0));
+ VG_(memcpy)(&mach->__fpu_xmm1, &vex->guest_YMM1, sizeof(mach->__fpu_xmm1));
+ VG_(memcpy)(&mach->__fpu_xmm2, &vex->guest_YMM2, sizeof(mach->__fpu_xmm2));
+ VG_(memcpy)(&mach->__fpu_xmm3, &vex->guest_YMM3, sizeof(mach->__fpu_xmm3));
+ VG_(memcpy)(&mach->__fpu_xmm4, &vex->guest_YMM4, sizeof(mach->__fpu_xmm4));
+ VG_(memcpy)(&mach->__fpu_xmm5, &vex->guest_YMM5, sizeof(mach->__fpu_xmm5));
+ VG_(memcpy)(&mach->__fpu_xmm6, &vex->guest_YMM6, sizeof(mach->__fpu_xmm6));
+ VG_(memcpy)(&mach->__fpu_xmm7, &vex->guest_YMM7, sizeof(mach->__fpu_xmm7));
+ VG_(memcpy)(&mach->__fpu_xmm8, &vex->guest_YMM8, sizeof(mach->__fpu_xmm8));
+ VG_(memcpy)(&mach->__fpu_xmm9, &vex->guest_YMM9, sizeof(mach->__fpu_xmm9));
+ VG_(memcpy)(&mach->__fpu_xmm10, &vex->guest_YMM10, sizeof(mach->__fpu_xmm10));
+ VG_(memcpy)(&mach->__fpu_xmm11, &vex->guest_YMM11, sizeof(mach->__fpu_xmm11));
+ VG_(memcpy)(&mach->__fpu_xmm12, &vex->guest_YMM12, sizeof(mach->__fpu_xmm12));
+ VG_(memcpy)(&mach->__fpu_xmm13, &vex->guest_YMM13, sizeof(mach->__fpu_xmm13));
+ VG_(memcpy)(&mach->__fpu_xmm14, &vex->guest_YMM14, sizeof(mach->__fpu_xmm14));
+ VG_(memcpy)(&mach->__fpu_xmm15, &vex->guest_YMM15, sizeof(mach->__fpu_xmm15));
}
@@ -159,8 +174,23 @@
VexGuestAMD64State *vex)
{
// DDD: #warning GrP fixme fp state
-
- VG_(memcpy)(&vex->guest_XMM0, &mach->__fpu_xmm0, 16 * sizeof(mach->__fpu_xmm0));
+ // JRS: what about the YMMHI bits? Are they important?
+ VG_(memcpy)(&vex->guest_YMM0, &mach->__fpu_xmm0, sizeof(mach->__fpu_xmm0));
+ VG_(memcpy)(&vex->guest_YMM1, &mach->__fpu_xmm1, sizeof(mach->__fpu_xmm1));
+ VG_(memcpy)(&vex->guest_YMM2, &mach->__fpu_xmm2, sizeof(mach->__fpu_xmm2));
+ VG_(memcpy)(&vex->guest_YMM3, &mach->__fpu_xmm3, sizeof(mach->__fpu_xmm3));
+ VG_(memcpy)(&vex->guest_YMM4, &mach->__fpu_xmm4, sizeof(mach->__fpu_xmm4));
+ VG_(memcpy)(&vex->guest_YMM5, &mach->__fpu_xmm5, sizeof(mach->__fpu_xmm5));
+ VG_(memcpy)(&vex->guest_YMM6, &mach->__fpu_xmm6, sizeof(mach->__fpu_xmm6));
+ VG_(memcpy)(&vex->guest_YMM7, &mach->__fpu_xmm7, sizeof(mach->__fpu_xmm7));
+ VG_(memcpy)(&vex->guest_YMM8, &mach->__fpu_xmm8, sizeof(mach->__fpu_xmm8));
+ VG_(memcpy)(&vex->guest_YMM9, &mach->__fpu_xmm9, sizeof(mach->__fpu_xmm9));
+ VG_(memcpy)(&vex->guest_YMM10, &mach->__fpu_xmm10, sizeof(mach->__fpu_xmm10));
+ VG_(memcpy)(&vex->guest_YMM11, &mach->__fpu_xmm11, sizeof(mach->__fpu_xmm11));
+ VG_(memcpy)(&vex->guest_YMM12, &mach->__fpu_xmm12, sizeof(mach->__fpu_xmm12));
+ VG_(memcpy)(&vex->guest_YMM13, &mach->__fpu_xmm13, sizeof(mach->__fpu_xmm13));
+ VG_(memcpy)(&vex->guest_YMM14, &mach->__fpu_xmm14, sizeof(mach->__fpu_xmm14));
+ VG_(memcpy)(&vex->guest_YMM15, &mach->__fpu_xmm15, sizeof(mach->__fpu_xmm15));
}
|
|
From: <sv...@va...> - 2012-05-21 21:57:14
|
sewardj 2012-05-21 22:57:05 +0100 (Mon, 21 May 2012)
New Revision: 12573
Log:
Fix VALGRIND_MINOR/VALGRIND_MAJOR symbols. This got forgotten about
in 3.7.0 (oops).
Modified files:
trunk/NEWS
trunk/include/valgrind.h
Modified: trunk/NEWS (+2 -0)
===================================================================
--- trunk/NEWS 2012-05-21 17:18:23 +01:00 (rev 12572)
+++ trunk/NEWS 2012-05-21 22:57:05 +01:00 (rev 12573)
@@ -1,6 +1,8 @@
Release 3.8.0 (????)
~~~~~~~~~~~~~~~~~~~~
+xxx Don't forget to update VALGRIND_MAJOR/MINOR before release
+
* ================== PLATFORM CHANGES =================
* Support for intel AES instructions (AESKEYGENASSIST, AESENC, AESENCLAST,
Modified: trunk/include/valgrind.h (+1 -1)
===================================================================
--- trunk/include/valgrind.h 2012-05-21 17:18:23 +01:00 (rev 12572)
+++ trunk/include/valgrind.h 2012-05-21 22:57:05 +01:00 (rev 12573)
@@ -89,7 +89,7 @@
|| (__VALGRIND_MAJOR__ == 3 && __VALGRIND_MINOR__ >= 6))
*/
#define __VALGRIND_MAJOR__ 3
-#define __VALGRIND_MINOR__ 6
+#define __VALGRIND_MINOR__ 8
#include <stdarg.h>
|
|
From: <sv...@va...> - 2012-05-21 21:51:44
|
sewardj 2012-05-21 22:51:36 +0100 (Mon, 21 May 2012)
New Revision: 2335
Log:
Enable FCOMS/FCOMPS on amd64. Fixes #300414.
(Eliot Moss, mo...@cs...)
Modified files:
trunk/priv/guest_amd64_toIR.c
Modified: trunk/priv/guest_amd64_toIR.c (+36 -30)
===================================================================
--- trunk/priv/guest_amd64_toIR.c 2012-05-21 17:16:13 +01:00 (rev 2334)
+++ trunk/priv/guest_amd64_toIR.c 2012-05-21 22:51:36 +01:00 (rev 2335)
@@ -4999,37 +4999,43 @@
fp_do_op_mem_ST_0 ( addr, "mul", dis_buf, Iop_MulF64, False );
break;
-//.. case 2: /* FCOM single-real */
-//.. DIP("fcoms %s\n", dis_buf);
-//.. /* This forces C1 to zero, which isn't right. */
-//.. put_C3210(
-//.. binop( Iop_And32,
-//.. binop(Iop_Shl32,
-//.. binop(Iop_CmpF64,
-//.. get_ST(0),
-//.. unop(Iop_F32toF64,
-//.. loadLE(Ity_F32,mkexpr(addr)))),
-//.. mkU8(8)),
-//.. mkU32(0x4500)
-//.. ));
-//.. break;
-//..
-//.. case 3: /* FCOMP single-real */
-//.. DIP("fcomps %s\n", dis_buf);
-//.. /* This forces C1 to zero, which isn't right. */
-//.. put_C3210(
-//.. binop( Iop_And32,
-//.. binop(Iop_Shl32,
-//.. binop(Iop_CmpF64,
-//.. get_ST(0),
-//.. unop(Iop_F32toF64,
-//.. loadLE(Ity_F32,mkexpr(addr)))),
-//.. mkU8(8)),
-//.. mkU32(0x4500)
-//.. ));
-//.. fp_pop();
-//.. break;
+ case 2: /* FCOM single-real */
+ DIP("fcoms %s\n", dis_buf);
+ /* This forces C1 to zero, which isn't right. */
+ /* The AMD documentation suggests that forcing C1 to
+ zero is correct (Eliot Moss) */
+ put_C3210(
+ unop( Iop_32Uto64,
+ binop( Iop_And32,
+ binop(Iop_Shl32,
+ binop(Iop_CmpF64,
+ get_ST(0),
+ unop(Iop_F32toF64,
+ loadLE(Ity_F32,mkexpr(addr)))),
+ mkU8(8)),
+ mkU32(0x4500)
+ )));
+ break;
+ case 3: /* FCOMP single-real */
+ /* The AMD documentation suggests that forcing C1 to
+ zero is correct (Eliot Moss) */
+ DIP("fcomps %s\n", dis_buf);
+ /* This forces C1 to zero, which isn't right. */
+ put_C3210(
+ unop( Iop_32Uto64,
+ binop( Iop_And32,
+ binop(Iop_Shl32,
+ binop(Iop_CmpF64,
+ get_ST(0),
+ unop(Iop_F32toF64,
+ loadLE(Ity_F32,mkexpr(addr)))),
+ mkU8(8)),
+ mkU32(0x4500)
+ )));
+ fp_pop();
+ break;
+
case 4: /* FSUB single-real */
fp_do_op_mem_ST_0 ( addr, "sub", dis_buf, Iop_SubF64, False );
break;
|
|
From: Philippe W. <phi...@sk...> - 2012-05-21 21:11:26
|
On Mon, 2012-05-21 at 17:08 -0400, Florian Krohm wrote: > I used to have access to an x86 but no more... If you have an intel 64 bits, then --enable-only32bit will compile only V in x86, and all the regtests will be run in this mode. Should be equivalent to testing natively on x86 I guess. Philippe |
|
From: Florian K. <br...@ac...> - 2012-05-21 21:08:50
|
On 05/21/2012 04:51 PM, Julian Seward wrote: > The rationale is: > > mc_main.c contains the expensive bit of Memcheck (the helper functions > that get called millions of times per second). On x86 it's important > to build this with -fomit-frame-pointer so as to maximise performance. > > mc_replace_strmem.c runs on the simulated CPU, and it often appears > in stack traces shown to the user. It is built with > -fno-omit-frame-pointer so as to guarantee robust backtraces on x86, > on which CFI based unwinding is not the "normal" case and so is > sometimes fragile. > >> Removing those did not make a difference on amd64, ppc, and s390x. >> Any objections to removing those? > > I'd like to preserve these settings. IIUC, recent changes to > Makefile.all.am (??) cause mc_main.c to be built -fomit-frame-pointer > on all platforms anyway (is that correct? No, only x86, amd64, and s390x have -fomit-frame-pointer universally enabled. When we discussed this on IRC, you weren't sure whether it was the right thing to do for ppc. Also, ARM does not have -fomit-frame-pointer enabled. > I didn't follow all the > details of the discussion on it). So that can be rm'd. Not quite, I guess. > > -fno-omit-frame-pointer for mc_replace_strmem.c, I'd like to > preserve, for the reason above. > Yeah makes sense. I presume you won't object if I add your explanation as a comment in the Makefile :) > It's easier to make sense of this stuff when experimenting with x86, > since that's the one platform on which -fomit-frame-pointer does make > a sometimes dramatic difference (between stack traces and no stack traces). I used to have access to an x86 but no more... Florian |
|
From: Julian S. <js...@ac...> - 2012-05-21 20:53:45
|
On Monday, May 21, 2012, Florian Krohm wrote: > While we're on the topic... > There's one more place where fomit-frame-pointer is fiddled with: > > ./memcheck/Makefile.am:mc_main.o: CFLAGS += -fomit-frame-pointer > ./memcheck/Makefile.am:mc_replace_strmem.o: CFLAGS += > -fno-omit-frame-pointer > > These were added in r3540 and r1774, respectively. With no commentary > that would shed some light as to why it appeared necessary. The rationale is: mc_main.c contains the expensive bit of Memcheck (the helper functions that get called millions of times per second). On x86 it's important to build this with -fomit-frame-pointer so as to maximise performance. mc_replace_strmem.c runs on the simulated CPU, and it often appears in stack traces shown to the user. It is built with -fno-omit-frame-pointer so as to guarantee robust backtraces on x86, on which CFI based unwinding is not the "normal" case and so is sometimes fragile. > Removing those did not make a difference on amd64, ppc, and s390x. > Any objections to removing those? I'd like to preserve these settings. IIUC, recent changes to Makefile.all.am (??) cause mc_main.c to be built -fomit-frame-pointer on all platforms anyway (is that correct? I didn't follow all the details of the discussion on it). So that can be rm'd. -fno-omit-frame-pointer for mc_replace_strmem.c, I'd like to preserve, for the reason above. Yell if my understanding of what happened recently with these changes is wrong .. it could well be. It's easier to make sense of this stuff when experimenting with x86, since that's the one platform on which -fomit-frame-pointer does make a sometimes dramatic difference (between stack traces and no stack traces). J |
|
From: Julian S. <js...@ac...> - 2012-05-21 20:42:42
|
On Monday, May 21, 2012, Eliot Moss wrote: > Dear developers -- I have a program (IBM's Java) > that seems to use FCOMP, yet that instruction is > commented out in guest_amd64-toIR.c. Can anyone > offer insight as to why? (Yet another instruction > it seems that I need to add to get JVMs to work > under valgrind ...) The reason you are seeing this phenomenon (viz, apparently perfectly reasonable instructions are not supported) is because what often happens is that we tend to implement insns on demand. In practice that means that the set of implemented instructions is driven by whatever "dialect" gcc and/or GNU as emit. When you bring along a new compiler (this JVM) and it ventures outside that dialect, things break. iow, this FCOMP variant is not implemented because gcc has never generated it, and no other compiler (till now) has "demanded it". Should be pretty simple to do, basically uncomment the relevant clause (which is from the x86 front end) and wrap the missing unop(Iop_32Uto64, ...) function application around it. Then send the patch this way .. J |
|
From: Florian K. <br...@ac...> - 2012-05-21 19:48:47
|
While we're on the topic... There's one more place where fomit-frame-pointer is fiddled with: ./memcheck/Makefile.am:mc_main.o: CFLAGS += -fomit-frame-pointer ./memcheck/Makefile.am:mc_replace_strmem.o: CFLAGS += -fno-omit-frame-pointer These were added in r3540 and r1774, respectively. With no commentary that would shed some light as to why it appeared necessary. Removing those did not make a difference on amd64, ppc, and s390x. Any objections to removing those? Florian |
|
From: John R. <jr...@bi...> - 2012-05-21 17:50:19
|
> Dear developers -- I have a program (IBM's Java)
> that seems to use FCOMP, yet that instruction is
> commented out in guest_amd64-toIR.c. Can anyone
> offer insight as to why? (Yet another instruction
> it seems that I need to add to get JVMs to work
> under valgrind ...)
The probable cause is the comment:
//.. /* This forces C1 to zero, which isn't right. */
Note that there are 8 instances of FCOMP (namely, {single,double}*
{register,memory}*{x86,amd64}) yet only one of those instances is
commented out, despite the multiple comments about C1 being bad:
----- guest_amd64-toIR.c
//.. case 3: /* FCOMP single-real */
case 0xD8 ... 0xDF: /* FCOMP %st(?),%st(0) */
case 3: /* FCOMP double-real */
case 0xD9: /* FCOMPP %st(0),%st(1) */
-----
----- guest_t_x86_toIR.c
case 3: /* FCOMP single-real */
case 0xD8 ... 0xDF: /* FCOMP %st(?),%st(0) */
case 3: /* FCOMP double-real */
case 0xD9: /* FCOMPP %st(0),%st(1) */
-----
--
|
|
From: Eliot M. <mo...@cs...> - 2012-05-21 17:16:49
|
Dear developers -- I have a program (IBM's Java) that seems to use FCOMP, yet that instruction is commented out in guest_amd64-toIR.c. Can anyone offer insight as to why? (Yet another instruction it seems that I need to add to get JVMs to work under valgrind ...) Regards -- Eliot Moss |
|
From: <sv...@va...> - 2012-05-21 16:18:35
|
florian 2012-05-21 17:18:23 +0100 (Mon, 21 May 2012)
New Revision: 12572
Log:
Add -fomit-frame-pointer for s390. The GCC maintainer was telling me that
this has been the preferred way to compile for quite a while. So let's follow
suit. The perf bucket did not reveal any measurable difference.
Modified files:
trunk/Makefile.all.am
Modified: trunk/Makefile.all.am (+1 -1)
===================================================================
--- trunk/Makefile.all.am 2012-05-21 14:44:54 +01:00 (rev 12571)
+++ trunk/Makefile.all.am 2012-05-21 17:18:23 +01:00 (rev 12572)
@@ -169,7 +169,7 @@
AM_CCASFLAGS_AMD64_DARWIN = -arch x86_64 -g
AM_FLAG_M3264_S390X_LINUX = @FLAG_M64@
-AM_CFLAGS_S390X_LINUX = @FLAG_M64@ $(AM_CFLAGS_BASE)
+AM_CFLAGS_S390X_LINUX = @FLAG_M64@ $(AM_CFLAGS_BASE) -fomit-frame-pointer
AM_CCASFLAGS_S390X_LINUX = @FLAG_M64@ -g -mzarch -march=z900
|
|
From: <sv...@va...> - 2012-05-21 16:16:25
|
sewardj 2012-05-21 17:16:13 +0100 (Mon, 21 May 2012)
New Revision: 2334
Log:
Fix feature recognition on AMD Bulldozer following the recent AVX
commits. Fixes #300389.
Modified files:
trunk/priv/main_main.c
Modified: trunk/priv/main_main.c (+3 -0)
===================================================================
--- trunk/priv/main_main.c 2012-05-21 16:45:34 +01:00 (rev 2333)
+++ trunk/priv/main_main.c 2012-05-21 17:16:13 +01:00 (rev 2334)
@@ -1075,6 +1075,9 @@
case VEX_HWCAPS_AMD64_SSE3 | VEX_HWCAPS_AMD64_CX16
| VEX_HWCAPS_AMD64_AVX:
return "amd64-sse3-cx16-avx";
+ case VEX_HWCAPS_AMD64_SSE3 | VEX_HWCAPS_AMD64_CX16
+ | VEX_HWCAPS_AMD64_LZCNT | VEX_HWCAPS_AMD64_AVX:
+ return "amd64-sse3-cx16-lzcnt-avx";
default:
return NULL;
}
|
|
From: <sv...@va...> - 2012-05-21 15:45:42
|
sewardj 2012-05-21 16:45:34 +0100 (Mon, 21 May 2012)
New Revision: 2333
Log:
Ensure s390x guest state size is 32-byte aligned, as per increase in
alignment requirements resulting from r12569/r2330.
(Christian Borntraeger <bor...@de...>)
Modified files:
trunk/priv/guest_s390_helpers.c
trunk/pub/libvex_guest_s390x.h
Modified: trunk/priv/guest_s390_helpers.c (+2 -0)
===================================================================
--- trunk/priv/guest_s390_helpers.c 2012-05-21 12:21:50 +01:00 (rev 2332)
+++ trunk/priv/guest_s390_helpers.c 2012-05-21 16:45:34 +01:00 (rev 2333)
@@ -141,6 +141,8 @@
state->guest_CC_DEP1 = 0;
state->guest_CC_DEP2 = 0;
state->guest_CC_NDEP = 0;
+ state->padding1 = 0;
+ state->padding2 = 0;
}
Modified: trunk/pub/libvex_guest_s390x.h (+4 -3)
===================================================================
--- trunk/pub/libvex_guest_s390x.h 2012-05-21 12:21:50 +01:00 (rev 2332)
+++ trunk/pub/libvex_guest_s390x.h 2012-05-21 16:45:34 +01:00 (rev 2333)
@@ -149,11 +149,12 @@
/* 424 */ ULong host_EvC_FAILADDR;
/*------------------------------------------------------------*/
-/*--- Force alignment to 16 bytes ---*/
+/*--- Force alignment to 32 bytes ---*/
/*------------------------------------------------------------*/
- /* No padding needed */
+ /* 432 */ ULong padding1;
+ /* 440 */ ULong padding2;
- /* 432 */ /* This is the size of the guest state */
+ /* 448 */ /* This is the size of the guest state */
} VexGuestS390XState;
|
|
From: Julian S. <js...@ac...> - 2012-05-21 14:24:13
|
Hi Nick, Josef,
I recently added some AVX support to V, and as a result added a new type
of 32-byte values (Ity_V256) to IR. Loads and stores of such values
cause Cachegrind and Callgrind to assert, because the size (32 bytes)
is larger than MIN_LINE_SIZE, which is 16.
As they currently are, both tools refuse to process memory accesses
bigger than 16, on the basis that the minimum possible line size is 16,
and so a 16 byte access could access 2 adjacent lines, which is a
situation they are prepared to handle. But not 3 lines, which is a
possible case for a 32 byte access w/ 16 byte lines (something for
which I'm sure no hardware actually exists).
So I was wondering if you had any views on how to fix it properly?
I hacked up the patch shown below (w/ equivalent for Callgrind), but
obviously it's not a good long term solution.
Options, all non-ideal:
* if we process 32 byte accesses "properly" then need to handle the
case of an access going over 3 16-byte lines (clearly would never
happen in real h/w); slow, complex, unrealistic
* or we can change MIN_LINE_SIZE to 32, but then we can't accurately
simulate caches with 16 byte lines
* or we can handle 32 byte accesses as 2 x 16 byte accesses, but then
we get incorrect access count numbers
* or we can change MIN_LINE_SIZE at run time to be 32 on AVX-capable
platforms, but that will slow the simulator down (maybe a lot) since
the shifting/masking can't then be baked in at V-build time
* or we could use the kludge in the patch below, but then we might
miss some misses (joke not intended)
Urrr. Any opinions? Other options?
Also, any opinions on me committing the the kludge temporarily? Current
situation is now that both tools assert any time they hit AVX code, which
is kinda ungood.
Thanks.
J
Index: cachegrind/cg_main.c
===================================================================
--- cachegrind/cg_main.c (revision 12570)
+++ cachegrind/cg_main.c (working copy)
@@ -1030,10 +1030,14 @@
IRExpr* data = st->Ist.WrTmp.data;
if (data->tag == Iex_Load) {
IRExpr* aexpr = data->Iex.Load.addr;
+ Int dataSize = sizeofIRType(data->Iex.Load.ty);
+ /* BEGIN AVX kludge */
+ if (dataSize > MIN_LINE_SIZE)
+ dataSize = MIN_LINE_SIZE;
+ /* END AVX kludge */
// Note also, endianness info is ignored. I guess
// that's not interesting.
- addEvent_Dr( &cgs, curr_inode, sizeofIRType(data-
>Iex.Load.ty),
- aexpr );
+ addEvent_Dr( &cgs, curr_inode, dataSize, aexpr );
}
break;
}
@@ -1041,8 +1045,12 @@
case Ist_Store: {
IRExpr* data = st->Ist.Store.data;
IRExpr* aexpr = st->Ist.Store.addr;
- addEvent_Dw( &cgs, curr_inode,
- sizeofIRType(typeOfIRExpr(tyenv, data)), aexpr );
+ Int dataSize = sizeofIRType(typeOfIRExpr(tyenv, data));
+ /* BEGIN AVX kludge */
+ if (dataSize > MIN_LINE_SIZE)
+ dataSize = MIN_LINE_SIZE;
+ /* END AVX kludge */
+ addEvent_Dw( &cgs, curr_inode, dataSize, aexpr );
break;
}
|
|
From: <sv...@va...> - 2012-05-21 13:45:06
|
sewardj 2012-05-21 14:44:54 +0100 (Mon, 21 May 2012)
New Revision: 12571
Log:
Handle 32-byte loads/stores, as created by recently added AVX support.
Modified files:
trunk/lackey/lk_main.c
Modified: trunk/lackey/lk_main.c (+3 -1)
===================================================================
--- trunk/lackey/lk_main.c 2012-05-21 12:01:35 +01:00 (rev 12570)
+++ trunk/lackey/lk_main.c 2012-05-21 14:44:54 +01:00 (rev 12571)
@@ -301,7 +301,7 @@
/* --- Types --- */
-#define N_TYPES 10
+#define N_TYPES 11
static Int type2index ( IRType ty )
{
@@ -316,6 +316,7 @@
case Ity_F64: return 7;
case Ity_F128: return 8;
case Ity_V128: return 9;
+ case Ity_V256: return 10;
default: tl_assert(0);
}
}
@@ -333,6 +334,7 @@
case 7: return "F64"; break;
case 8: return "F128"; break;
case 9: return "V128"; break;
+ case 10: return "V256"; break;
default: tl_assert(0);
}
}
|
|
From: Christian B. <bor...@de...> - 2012-05-21 12:08:06
|
A similar patch is necessary for s390.
Index: VEX/priv/guest_s390_helpers.c
===================================================================
--- VEX/priv/guest_s390_helpers.c (revision 2332)
+++ VEX/priv/guest_s390_helpers.c (working copy)
@@ -141,6 +141,8 @@
state->guest_CC_DEP1 = 0;
state->guest_CC_DEP2 = 0;
state->guest_CC_NDEP = 0;
+ state->padding1 = 0;
+ state->padding2 = 0;
}
Index: VEX/pub/libvex_guest_s390x.h
===================================================================
--- VEX/pub/libvex_guest_s390x.h (revision 2332)
+++ VEX/pub/libvex_guest_s390x.h (working copy)
@@ -149,11 +149,12 @@
/* 424 */ ULong host_EvC_FAILADDR;
/*------------------------------------------------------------*/
-/*--- Force alignment to 16 bytes ---*/
+/*--- Force alignment to 32 bytes ---*/
/*------------------------------------------------------------*/
- /* No padding needed */
+ /* 432 */ ULong padding1;
+ /* 440 */ ULong padding2;
- /* 432 */ /* This is the size of the guest state */
+ /* 448 */ /* This is the size of the guest state */
} VexGuestS390XState;
|
|
From: <sv...@va...> - 2012-05-21 11:22:00
|
sewardj 2012-05-21 12:21:50 +0100 (Mon, 21 May 2012)
New Revision: 2332
Log:
Ensure arm guest state size is 32-byte aligned, as per increase in
alignment requirements resulting from r12569/r2330.
Modified files:
trunk/priv/guest_arm_helpers.c
trunk/pub/libvex_guest_arm.h
Modified: trunk/priv/guest_arm_helpers.c (+4 -0)
===================================================================
--- trunk/priv/guest_arm_helpers.c 2012-05-21 12:00:41 +01:00 (rev 2331)
+++ trunk/priv/guest_arm_helpers.c 2012-05-21 12:21:50 +01:00 (rev 2332)
@@ -1029,6 +1029,10 @@
vex_state->guest_ITSTATE = 0;
vex_state->padding1 = 0;
+ vex_state->padding2 = 0;
+ vex_state->padding3 = 0;
+ vex_state->padding4 = 0;
+ vex_state->padding5 = 0;
}
Modified: trunk/pub/libvex_guest_arm.h (+5 -1)
===================================================================
--- trunk/pub/libvex_guest_arm.h 2012-05-21 12:00:41 +01:00 (rev 2331)
+++ trunk/pub/libvex_guest_arm.h 2012-05-21 12:21:50 +01:00 (rev 2332)
@@ -194,8 +194,12 @@
*/
UInt guest_ITSTATE;
- /* Padding to make it have an 16-aligned size */
+ /* Padding to make it have an 32-aligned size */
UInt padding1;
+ UInt padding2;
+ UInt padding3;
+ UInt padding4;
+ UInt padding5;
}
VexGuestARMState;
|
|
From: <sv...@va...> - 2012-05-21 11:01:43
|
sewardj 2012-05-21 12:01:35 +0100 (Mon, 21 May 2012)
New Revision: 12570
Log:
Handle increase in ppc64 guest state size resulting from r2331.
Modified files:
trunk/memcheck/mc_main.c
Modified: trunk/memcheck/mc_main.c (+1 -1)
===================================================================
--- trunk/memcheck/mc_main.c 2012-05-21 11:18:10 +01:00 (rev 12569)
+++ trunk/memcheck/mc_main.c 2012-05-21 12:01:35 +01:00 (rev 12570)
@@ -3943,7 +3943,7 @@
static void mc_post_reg_write ( CorePart part, ThreadId tid,
PtrdiffT offset, SizeT size)
{
-# define MAX_REG_WRITE_SIZE 1680
+# define MAX_REG_WRITE_SIZE 1696
UChar area[MAX_REG_WRITE_SIZE];
tl_assert(size <= MAX_REG_WRITE_SIZE);
VG_(memset)(area, V_BITS8_DEFINED, size);
|
|
From: <sv...@va...> - 2012-05-21 11:00:54
|
sewardj 2012-05-21 12:00:41 +0100 (Mon, 21 May 2012)
New Revision: 2331
Log:
Ensure ppc64 guest state size is 32-byte aligned, as per increase in
alignment requirements resulting from r12569/r2330.
Modified files:
trunk/priv/guest_ppc_helpers.c
trunk/pub/libvex_guest_ppc64.h
Modified: trunk/priv/guest_ppc_helpers.c (+2 -0)
===================================================================
--- trunk/priv/guest_ppc_helpers.c 2012-05-21 11:18:49 +01:00 (rev 2330)
+++ trunk/priv/guest_ppc_helpers.c 2012-05-21 12:00:41 +01:00 (rev 2331)
@@ -677,6 +677,8 @@
vex_state->guest_SPRG3_RO = 0;
vex_state->padding2 = 0;
+ vex_state->padding3 = 0;
+ vex_state->padding4 = 0;
}
Modified: trunk/pub/libvex_guest_ppc64.h (+3 -0)
===================================================================
--- trunk/pub/libvex_guest_ppc64.h 2012-05-21 11:18:49 +01:00 (rev 2330)
+++ trunk/pub/libvex_guest_ppc64.h 2012-05-21 12:00:41 +01:00 (rev 2331)
@@ -280,8 +280,11 @@
threading on AIX. */
/* 1648 */ ULong guest_SPRG3_RO;
+ /* offsets in comments are wrong ..*/
/* Padding to make it have an 16-aligned size */
/* 1656 */ ULong padding2;
+ /* 16XX */ ULong padding3;
+ /* 16XX */ ULong padding4;
}
VexGuestPPC64State;
|
|
From: <sv...@va...> - 2012-05-21 10:19:00
|
sewardj 2012-05-21 11:18:49 +0100 (Mon, 21 May 2012)
New Revision: 2330
Log:
Add initial support for Intel AVX instructions (VEX side).
Tracker bug is #273475.
Modified files:
trunk/priv/guest_amd64_helpers.c
trunk/priv/guest_amd64_toIR.c
trunk/priv/guest_x86_helpers.c
trunk/priv/host_amd64_defs.c
trunk/priv/host_amd64_defs.h
trunk/priv/host_amd64_isel.c
trunk/priv/host_generic_reg_alloc2.c
trunk/priv/host_generic_regs.c
trunk/priv/host_generic_regs.h
trunk/priv/ir_defs.c
trunk/priv/ir_opt.c
trunk/priv/main_main.c
trunk/pub/libvex_basictypes.h
trunk/pub/libvex_guest_amd64.h
trunk/pub/libvex_guest_x86.h
trunk/pub/libvex_ir.h
Modified: trunk/pub/libvex_guest_amd64.h (+19 -19)
===================================================================
--- trunk/pub/libvex_guest_amd64.h 2012-05-12 17:14:08 +01:00 (rev 2329)
+++ trunk/pub/libvex_guest_amd64.h 2012-05-21 11:18:49 +01:00 (rev 2330)
@@ -96,28 +96,28 @@
associated with a %fs value of zero. */
/* 200 */ ULong guest_FS_ZERO;
- /* XMM registers. Note that these must be allocated
+ /* YMM registers. Note that these must be allocated
consecutively in order that the SSE4.2 PCMP{E,I}STR{I,M}
- helpers can treat them as an array. XMM16 is a fake reg used
+ helpers can treat them as an array. YMM16 is a fake reg used
as an intermediary in handling aforementioned insns. */
/* 208 */ULong guest_SSEROUND;
- /* 216 */U128 guest_XMM0;
- U128 guest_XMM1;
- U128 guest_XMM2;
- U128 guest_XMM3;
- U128 guest_XMM4;
- U128 guest_XMM5;
- U128 guest_XMM6;
- U128 guest_XMM7;
- U128 guest_XMM8;
- U128 guest_XMM9;
- U128 guest_XMM10;
- U128 guest_XMM11;
- U128 guest_XMM12;
- U128 guest_XMM13;
- U128 guest_XMM14;
- U128 guest_XMM15;
- U128 guest_XMM16;
+ /* 216 */U256 guest_YMM0;
+ U256 guest_YMM1;
+ U256 guest_YMM2;
+ U256 guest_YMM3;
+ U256 guest_YMM4;
+ U256 guest_YMM5;
+ U256 guest_YMM6;
+ U256 guest_YMM7;
+ U256 guest_YMM8;
+ U256 guest_YMM9;
+ U256 guest_YMM10;
+ U256 guest_YMM11;
+ U256 guest_YMM12;
+ U256 guest_YMM13;
+ U256 guest_YMM14;
+ U256 guest_YMM15;
+ U256 guest_YMM16;
/* FPU */
/* Note. Setting guest_FTOP to be ULong messes up the
Modified: trunk/priv/guest_amd64_helpers.c (+59 -58)
===================================================================
--- trunk/priv/guest_amd64_helpers.c 2012-05-12 17:14:08 +01:00 (rev 2329)
+++ trunk/priv/guest_amd64_helpers.c 2012-05-21 11:18:49 +01:00 (rev 2330)
@@ -1723,22 +1723,22 @@
_dst[2] = _src[2]; _dst[3] = _src[3]; } \
while (0)
- COPY_U128( xmm[0], gst->guest_XMM0 );
- COPY_U128( xmm[1], gst->guest_XMM1 );
- COPY_U128( xmm[2], gst->guest_XMM2 );
- COPY_U128( xmm[3], gst->guest_XMM3 );
- COPY_U128( xmm[4], gst->guest_XMM4 );
- COPY_U128( xmm[5], gst->guest_XMM5 );
- COPY_U128( xmm[6], gst->guest_XMM6 );
- COPY_U128( xmm[7], gst->guest_XMM7 );
- COPY_U128( xmm[8], gst->guest_XMM8 );
- COPY_U128( xmm[9], gst->guest_XMM9 );
- COPY_U128( xmm[10], gst->guest_XMM10 );
- COPY_U128( xmm[11], gst->guest_XMM11 );
- COPY_U128( xmm[12], gst->guest_XMM12 );
- COPY_U128( xmm[13], gst->guest_XMM13 );
- COPY_U128( xmm[14], gst->guest_XMM14 );
- COPY_U128( xmm[15], gst->guest_XMM15 );
+ COPY_U128( xmm[0], gst->guest_YMM0 );
+ COPY_U128( xmm[1], gst->guest_YMM1 );
+ COPY_U128( xmm[2], gst->guest_YMM2 );
+ COPY_U128( xmm[3], gst->guest_YMM3 );
+ COPY_U128( xmm[4], gst->guest_YMM4 );
+ COPY_U128( xmm[5], gst->guest_YMM5 );
+ COPY_U128( xmm[6], gst->guest_YMM6 );
+ COPY_U128( xmm[7], gst->guest_YMM7 );
+ COPY_U128( xmm[8], gst->guest_YMM8 );
+ COPY_U128( xmm[9], gst->guest_YMM9 );
+ COPY_U128( xmm[10], gst->guest_YMM10 );
+ COPY_U128( xmm[11], gst->guest_YMM11 );
+ COPY_U128( xmm[12], gst->guest_YMM12 );
+ COPY_U128( xmm[13], gst->guest_YMM13 );
+ COPY_U128( xmm[14], gst->guest_YMM14 );
+ COPY_U128( xmm[15], gst->guest_YMM15 );
# undef COPY_U128
}
@@ -1766,22 +1766,22 @@
_dst[2] = _src[2]; _dst[3] = _src[3]; } \
while (0)
- COPY_U128( gst->guest_XMM0, xmm[0] );
- COPY_U128( gst->guest_XMM1, xmm[1] );
- COPY_U128( gst->guest_XMM2, xmm[2] );
- COPY_U128( gst->guest_XMM3, xmm[3] );
- COPY_U128( gst->guest_XMM4, xmm[4] );
- COPY_U128( gst->guest_XMM5, xmm[5] );
- COPY_U128( gst->guest_XMM6, xmm[6] );
- COPY_U128( gst->guest_XMM7, xmm[7] );
- COPY_U128( gst->guest_XMM8, xmm[8] );
- COPY_U128( gst->guest_XMM9, xmm[9] );
- COPY_U128( gst->guest_XMM10, xmm[10] );
- COPY_U128( gst->guest_XMM11, xmm[11] );
- COPY_U128( gst->guest_XMM12, xmm[12] );
- COPY_U128( gst->guest_XMM13, xmm[13] );
- COPY_U128( gst->guest_XMM14, xmm[14] );
- COPY_U128( gst->guest_XMM15, xmm[15] );
+ COPY_U128( gst->guest_YMM0, xmm[0] );
+ COPY_U128( gst->guest_YMM1, xmm[1] );
+ COPY_U128( gst->guest_YMM2, xmm[2] );
+ COPY_U128( gst->guest_YMM3, xmm[3] );
+ COPY_U128( gst->guest_YMM4, xmm[4] );
+ COPY_U128( gst->guest_YMM5, xmm[5] );
+ COPY_U128( gst->guest_YMM6, xmm[6] );
+ COPY_U128( gst->guest_YMM7, xmm[7] );
+ COPY_U128( gst->guest_YMM8, xmm[8] );
+ COPY_U128( gst->guest_YMM9, xmm[9] );
+ COPY_U128( gst->guest_YMM10, xmm[10] );
+ COPY_U128( gst->guest_YMM11, xmm[11] );
+ COPY_U128( gst->guest_YMM12, xmm[12] );
+ COPY_U128( gst->guest_YMM13, xmm[13] );
+ COPY_U128( gst->guest_YMM14, xmm[14] );
+ COPY_U128( gst->guest_YMM15, xmm[15] );
# undef COPY_U128
@@ -3129,11 +3129,10 @@
// In all cases, the new OSZACP value is the lowest 16 of
// the return value.
if (isxSTRM) {
- /* gst->guest_XMM0 = resV; */ // gcc don't like that
- gst->guest_XMM0[0] = resV.w32[0];
- gst->guest_XMM0[1] = resV.w32[1];
- gst->guest_XMM0[2] = resV.w32[2];
- gst->guest_XMM0[3] = resV.w32[3];
+ gst->guest_YMM0[0] = resV.w32[0];
+ gst->guest_YMM0[1] = resV.w32[1];
+ gst->guest_YMM0[2] = resV.w32[2];
+ gst->guest_YMM0[3] = resV.w32[3];
return resOSZACP & 0x8D5;
} else {
UInt newECX = resV.w32[0] & 0xFFFF;
@@ -3507,29 +3506,31 @@
/* Initialise the simulated FPU */
amd64g_dirtyhelper_FINIT( vex_state );
- /* Initialise the SSE state. */
-# define SSEZERO(_xmm) _xmm[0]=_xmm[1]=_xmm[2]=_xmm[3] = 0;
-
+ /* Initialise the AVX state. */
+# define AVXZERO(_ymm) \
+ do { _ymm[0]=_ymm[1]=_ymm[2]=_ymm[3] = 0; \
+ _ymm[4]=_ymm[5]=_ymm[6]=_ymm[7] = 0; \
+ } while (0)
vex_state->guest_SSEROUND = (ULong)Irrm_NEAREST;
- SSEZERO(vex_state->guest_XMM0);
- SSEZERO(vex_state->guest_XMM1);
- SSEZERO(vex_state->guest_XMM2);
- SSEZERO(vex_state->guest_XMM3);
- SSEZERO(vex_state->guest_XMM4);
- SSEZERO(vex_state->guest_XMM5);
- SSEZERO(vex_state->guest_XMM6);
- SSEZERO(vex_state->guest_XMM7);
- SSEZERO(vex_state->guest_XMM8);
- SSEZERO(vex_state->guest_XMM9);
- SSEZERO(vex_state->guest_XMM10);
- SSEZERO(vex_state->guest_XMM11);
- SSEZERO(vex_state->guest_XMM12);
- SSEZERO(vex_state->guest_XMM13);
- SSEZERO(vex_state->guest_XMM14);
- SSEZERO(vex_state->guest_XMM15);
- SSEZERO(vex_state->guest_XMM16);
+ AVXZERO(vex_state->guest_YMM0);
+ AVXZERO(vex_state->guest_YMM1);
+ AVXZERO(vex_state->guest_YMM2);
+ AVXZERO(vex_state->guest_YMM3);
+ AVXZERO(vex_state->guest_YMM4);
+ AVXZERO(vex_state->guest_YMM5);
+ AVXZERO(vex_state->guest_YMM6);
+ AVXZERO(vex_state->guest_YMM7);
+ AVXZERO(vex_state->guest_YMM8);
+ AVXZERO(vex_state->guest_YMM9);
+ AVXZERO(vex_state->guest_YMM10);
+ AVXZERO(vex_state->guest_YMM11);
+ AVXZERO(vex_state->guest_YMM12);
+ AVXZERO(vex_state->guest_YMM13);
+ AVXZERO(vex_state->guest_YMM14);
+ AVXZERO(vex_state->guest_YMM15);
+ AVXZERO(vex_state->guest_YMM16);
-# undef SSEZERO
+# undef AVXZERO
vex_state->guest_EMWARN = EmWarn_NONE;
Modified: trunk/priv/main_main.c (+1 -0)
===================================================================
--- trunk/priv/main_main.c 2012-05-12 17:14:08 +01:00 (rev 2329)
+++ trunk/priv/main_main.c 2012-05-21 11:18:49 +01:00 (rev 2330)
@@ -1055,6 +1055,7 @@
very stupid. We should add strings independently based on
feature bits, but then it would be hard to return a string that
didn't need deallocating by the caller.) */
+ /* FIXME: show_hwcaps_s390x is a much better way to do this. */
switch (hwcaps) {
case 0:
return "amd64-sse2";
Modified: trunk/priv/ir_opt.c (+2 -1)
===================================================================
--- trunk/priv/ir_opt.c 2012-05-12 17:14:08 +01:00 (rev 2329)
+++ trunk/priv/ir_opt.c 2012-05-21 11:18:49 +01:00 (rev 2330)
@@ -5045,7 +5045,8 @@
case Ity_I1: case Ity_I8: case Ity_I16:
case Ity_I32: case Ity_I64: case Ity_I128:
break;
- case Ity_F32: case Ity_F64: case Ity_F128: case Ity_V128:
+ case Ity_F32: case Ity_F64: case Ity_F128:
+ case Ity_V128: case Ity_V256:
*hasVorFtemps = True;
break;
case Ity_D32: case Ity_D64: case Ity_D128:
Modified: trunk/pub/libvex_guest_x86.h (+2 -2)
===================================================================
--- trunk/pub/libvex_guest_x86.h 2012-05-12 17:14:08 +01:00 (rev 2329)
+++ trunk/pub/libvex_guest_x86.h 2012-05-21 11:18:49 +01:00 (rev 2330)
@@ -221,8 +221,8 @@
been interrupted by a signal. */
UInt guest_IP_AT_SYSCALL;
- /* Padding to make it have an 16-aligned size */
- UInt padding1;
+ /* Padding to make it have an 32-aligned size */
+ UInt padding[5];
}
VexGuestX86State;
Modified: trunk/priv/host_amd64_isel.c (+279 -526)
===================================================================
--- trunk/priv/host_amd64_isel.c 2012-05-12 17:14:08 +01:00 (rev 2329)
+++ trunk/priv/host_amd64_isel.c 2012-05-21 11:18:49 +01:00 (rev 2330)
@@ -163,8 +163,8 @@
return env->vregmap[tmp];
}
-static void lookupIRTemp128 ( HReg* vrHI, HReg* vrLO,
- ISelEnv* env, IRTemp tmp )
+static void lookupIRTempPair ( HReg* vrHI, HReg* vrLO,
+ ISelEnv* env, IRTemp tmp )
{
vassert(tmp >= 0);
vassert(tmp < env->n_vregmap);
@@ -189,13 +189,6 @@
return reg;
}
-//.. static HReg newVRegF ( ISelEnv* env )
-//.. {
-//.. HReg reg = mkHReg(env->vreg_ctr, HRcFlt64, True/*virtual reg*/);
-//.. env->vreg_ctr++;
-//.. return reg;
-//.. }
-
static HReg newVRegV ( ISelEnv* env )
{
HReg reg = mkHReg(env->vreg_ctr, HRcVec128, True/*virtual reg*/);
@@ -203,7 +196,14 @@
return reg;
}
+static HReg newVRegDV ( ISelEnv* env )
+{
+ HReg reg = mkHReg(env->vreg_ctr, HRcVec256, True/*virtual reg*/);
+ env->vreg_ctr++;
+ return reg;
+}
+
/*---------------------------------------------------------*/
/*--- ISEL: Forward declarations ---*/
/*---------------------------------------------------------*/
@@ -229,9 +229,9 @@
static AMD64AMode* iselIntExpr_AMode_wrk ( ISelEnv* env, IRExpr* e );
static AMD64AMode* iselIntExpr_AMode ( ISelEnv* env, IRExpr* e );
-static void iselInt128Expr_wrk ( HReg* rHi, HReg* rLo,
+static void iselInt128Expr_wrk ( /*OUT*/HReg* rHi, HReg* rLo,
ISelEnv* env, IRExpr* e );
-static void iselInt128Expr ( HReg* rHi, HReg* rLo,
+static void iselInt128Expr ( /*OUT*/HReg* rHi, HReg* rLo,
ISelEnv* env, IRExpr* e );
static AMD64CondCode iselCondCode_wrk ( ISelEnv* env, IRExpr* e );
@@ -246,7 +246,15 @@
static HReg iselVecExpr_wrk ( ISelEnv* env, IRExpr* e );
static HReg iselVecExpr ( ISelEnv* env, IRExpr* e );
+static HReg iselV256Expr_wrk ( ISelEnv* env, IRExpr* e );
+static HReg iselV256Expr ( ISelEnv* env, IRExpr* e );
+static void iselDVecExpr_wrk ( /*OUT*/HReg* rHi, HReg* rLo,
+ ISelEnv* env, IRExpr* e );
+static void iselDVecExpr ( /*OUT*/HReg* rHi, HReg* rLo,
+ ISelEnv* env, IRExpr* e );
+
+
/*---------------------------------------------------------*/
/*--- ISEL: Misc helpers ---*/
/*---------------------------------------------------------*/
@@ -308,7 +316,7 @@
return AMD64Instr_Alu64R(Aalu_MOV, AMD64RMI_Reg(src), dst);
}
-/* Make a vector reg-reg move. */
+/* Make a vector (128 bit) reg-reg move. */
static AMD64Instr* mk_vMOVsd_RR ( HReg src, HReg dst )
{
@@ -317,6 +325,15 @@
return AMD64Instr_SseReRg(Asse_MOV, src, dst);
}
+/* Make a double-vector (256 bit) reg-reg move. */
+
+static AMD64Instr* mk_dvMOVsd_RR ( HReg src, HReg dst )
+{
+ vassert(hregClass(src) == HRcVec256);
+ vassert(hregClass(dst) == HRcVec256);
+ return AMD64Instr_AvxReRg(Asse_MOV, src, dst);
+}
+
/* Advance/retreat %rsp by n. */
static void add_to_rsp ( ISelEnv* env, Int n )
@@ -350,47 +367,7 @@
}
}
-//.. /* Given an amode, return one which references 4 bytes further
-//.. along. */
-//..
-//.. static X86AMode* advance4 ( X86AMode* am )
-//.. {
-//.. X86AMode* am4 = dopyX86AMode(am);
-//.. switch (am4->tag) {
-//.. case Xam_IRRS:
-//.. am4->Xam.IRRS.imm += 4; break;
-//.. case Xam_IR:
-//.. am4->Xam.IR.imm += 4; break;
-//.. default:
-//.. vpanic("advance4(x86,host)");
-//.. }
-//.. return am4;
-//.. }
-//..
-//..
-//.. /* Push an arg onto the host stack, in preparation for a call to a
-//.. helper function of some kind. Returns the number of 32-bit words
-//.. pushed. */
-//..
-//.. static Int pushArg ( ISelEnv* env, IRExpr* arg )
-//.. {
-//.. IRType arg_ty = typeOfIRExpr(env->type_env, arg);
-//.. if (arg_ty == Ity_I32) {
-//.. addInstr(env, X86Instr_Push(iselIntExpr_RMI(env, arg)));
-//.. return 1;
-//.. } else
-//.. if (arg_ty == Ity_I64) {
-//.. HReg rHi, rLo;
-//.. iselInt64Expr(&rHi, &rLo, env, arg);
-//.. addInstr(env, X86Instr_Push(X86RMI_Reg(rHi)));
-//.. addInstr(env, X86Instr_Push(X86RMI_Reg(rLo)));
-//.. return 2;
-//.. }
-//.. ppIRExpr(arg);
-//.. vpanic("pushArg(x86): can't handle arg of this type");
-//.. }
-
/* Used only in doHelperCall. If possible, produce a single
instruction which computes 'e' into 'dst'. If not possible, return
NULL. */
@@ -579,11 +556,11 @@
/* SLOW SCHEME; move via temporaries */
slowscheme:
-#if 0
-if (n_args > 0) {for (i = 0; args[i]; i++) {
-ppIRExpr(args[i]); vex_printf(" "); }
-vex_printf("\n");}
-#endif
+# if 0 /* debug only */
+ if (n_args > 0) {for (i = 0; args[i]; i++) {
+ ppIRExpr(args[i]); vex_printf(" "); }
+ vex_printf("\n");}
+# endif
argreg = 0;
if (passBBP) {
@@ -819,23 +796,6 @@
}
-//.. /* Round an x87 FPU value to 53-bit-mantissa precision, to be used
-//.. after most non-simple FPU operations (simple = +, -, *, / and
-//.. sqrt).
-//..
-//.. This could be done a lot more efficiently if needed, by loading
-//.. zero and adding it to the value to be rounded (fldz ; faddp?).
-//.. */
-//.. static void roundToF64 ( ISelEnv* env, HReg reg )
-//.. {
-//.. X86AMode* zero_esp = X86AMode_IR(0, hregX86_ESP());
-//.. sub_from_esp(env, 8);
-//.. addInstr(env, X86Instr_FpLdSt(False/*store*/, 8, reg, zero_esp));
-//.. addInstr(env, X86Instr_FpLdSt(True/*load*/, 8, reg, zero_esp));
-//.. add_to_esp(env, 8);
-//.. }
-
-
/*---------------------------------------------------------*/
/*--- ISEL: Integer expressions (64/32/16/8 bit) ---*/
/*---------------------------------------------------------*/
@@ -1325,68 +1285,6 @@
return dst;
}
-//.. if (e->Iex.Binop.op == Iop_F64toI32 || e->Iex.Binop.op == Iop_F64toI16) {
-//.. Int sz = e->Iex.Binop.op == Iop_F64toI16 ? 2 : 4;
-//.. HReg rf = iselDblExpr(env, e->Iex.Binop.arg2);
-//.. HReg dst = newVRegI(env);
-//..
-//.. /* Used several times ... */
-//.. X86AMode* zero_esp = X86AMode_IR(0, hregX86_ESP());
-//..
-//.. /* rf now holds the value to be converted, and rrm holds the
-//.. rounding mode value, encoded as per the IRRoundingMode
-//.. enum. The first thing to do is set the FPU's rounding
-//.. mode accordingly. */
-//..
-//.. /* Create a space for the format conversion. */
-//.. /* subl $4, %esp */
-//.. sub_from_esp(env, 4);
-//..
-//.. /* Set host rounding mode */
-//.. set_FPU_rounding_mode( env, e->Iex.Binop.arg1 );
-//..
-//.. /* gistw/l %rf, 0(%esp) */
-//.. addInstr(env, X86Instr_FpLdStI(False/*store*/, sz, rf, zero_esp));
-//..
-//.. if (sz == 2) {
-//.. /* movzwl 0(%esp), %dst */
-//.. addInstr(env, X86Instr_LoadEX(2,False,zero_esp,dst));
-//.. } else {
-//.. /* movl 0(%esp), %dst */
-//.. vassert(sz == 4);
-//.. addInstr(env, X86Instr_Alu32R(
-//.. Xalu_MOV, X86RMI_Mem(zero_esp), dst));
-//.. }
-//..
-//.. /* Restore default FPU rounding. */
-//.. set_FPU_rounding_default( env );
-//..
-//.. /* addl $4, %esp */
-//.. add_to_esp(env, 4);
-//.. return dst;
-//.. }
-//..
-//.. /* C3210 flags following FPU partial remainder (fprem), both
-//.. IEEE compliant (PREM1) and non-IEEE compliant (PREM). */
-//.. if (e->Iex.Binop.op == Iop_PRemC3210F64
-//.. || e->Iex.Binop.op == Iop_PRem1C3210F64) {
-//.. HReg junk = newVRegF(env);
-//.. HReg dst = newVRegI(env);
-//.. HReg srcL = iselDblExpr(env, e->Iex.Binop.arg1);
-//.. HReg srcR = iselDblExpr(env, e->Iex.Binop.arg2);
-//.. addInstr(env, X86Instr_FpBinary(
-//.. e->Iex.Binop.op==Iop_PRemC3210F64
-//.. ? Xfp_PREM : Xfp_PREM1,
-//.. srcL,srcR,junk
-//.. ));
-//.. /* The previous pseudo-insn will have left the FPU's C3210
-//.. flags set correctly. So bag them. */
-//.. addInstr(env, X86Instr_FpStSW_AX());
-//.. addInstr(env, mk_iMOVsd_RR(hregX86_EAX(), dst));
-//.. addInstr(env, X86Instr_Alu32R(Xalu_AND, X86RMI_Imm(0x4700), dst));
-//.. return dst;
-//.. }
-
break;
}
@@ -1523,16 +1421,6 @@
addInstr(env, AMD64Instr_Unary64(Aun_NOT,dst));
return dst;
}
-//.. case Iop_64HIto32: {
-//.. HReg rHi, rLo;
-//.. iselInt64Expr(&rHi,&rLo, env, e->Iex.Unop.arg);
-//.. return rHi; /* and abandon rLo .. poor wee thing :-) */
-//.. }
-//.. case Iop_64to32: {
-//.. HReg rHi, rLo;
-//.. iselInt64Expr(&rHi,&rLo, env, e->Iex.Unop.arg);
-//.. return rLo; /* similar stupid comment to the above ... */
-//.. }
case Iop_16HIto8:
case Iop_32HIto16:
case Iop_64HIto32: {
@@ -1640,19 +1528,45 @@
/* V128{HI}to64 */
case Iop_V128HIto64:
case Iop_V128to64: {
- Int off = e->Iex.Unop.op==Iop_V128HIto64 ? 8 : 0;
HReg dst = newVRegI(env);
+ Int off = e->Iex.Unop.op==Iop_V128HIto64 ? -8 : -16;
+ HReg rsp = hregAMD64_RSP();
HReg vec = iselVecExpr(env, e->Iex.Unop.arg);
- AMD64AMode* rsp0 = AMD64AMode_IR(0, hregAMD64_RSP());
- AMD64AMode* rspN = AMD64AMode_IR(off, hregAMD64_RSP());
- sub_from_rsp(env, 16);
- addInstr(env, AMD64Instr_SseLdSt(False/*store*/, 16, vec, rsp0));
+ AMD64AMode* m16_rsp = AMD64AMode_IR(-16, rsp);
+ AMD64AMode* off_rsp = AMD64AMode_IR(off, rsp);
+ addInstr(env, AMD64Instr_SseLdSt(False/*store*/,
+ 16, vec, m16_rsp));
addInstr(env, AMD64Instr_Alu64R( Aalu_MOV,
- AMD64RMI_Mem(rspN), dst ));
- add_to_rsp(env, 16);
+ AMD64RMI_Mem(off_rsp), dst ));
return dst;
}
+ case Iop_V256to64_0: case Iop_V256to64_1:
+ case Iop_V256to64_2: case Iop_V256to64_3: {
+ HReg vHi, vLo, vec;
+ iselDVecExpr(&vHi, &vLo, env, e->Iex.Unop.arg);
+ /* Do the first part of the selection by deciding which of
+ the 128 bit registers do look at, and second part using
+ the same scheme as for V128{HI}to64 above. */
+ Int off = 0;
+ switch (e->Iex.Unop.op) {
+ case Iop_V256to64_0: vec = vLo; off = -16; break;
+ case Iop_V256to64_1: vec = vLo; off = -8; break;
+ case Iop_V256to64_2: vec = vHi; off = -16; break;
+ case Iop_V256to64_3: vec = vHi; off = -8; break;
+ default: vassert(0);
+ }
+ HReg dst = newVRegI(env);
+ HReg rsp = hregAMD64_RSP();
+ AMD64AMode* m16_rsp = AMD64AMode_IR(-16, rsp);
+ AMD64AMode* off_rsp = AMD64AMode_IR(off, rsp);
+ addInstr(env, AMD64Instr_SseLdSt(False/*store*/,
+ 16, vec, m16_rsp));
+ addInstr(env, AMD64Instr_Alu64R( Aalu_MOV,
+ AMD64RMI_Mem(off_rsp), dst ));
+ return dst;
+ }
+
/* ReinterpF64asI64(e) */
/* Given an IEEE754 double, produce an I64 with the same bit
pattern. */
@@ -2388,95 +2302,15 @@
static void iselInt128Expr_wrk ( HReg* rHi, HReg* rLo,
ISelEnv* env, IRExpr* e )
{
-//.. HWord fn = 0; /* helper fn for most SIMD64 stuff */
vassert(e);
vassert(typeOfIRExpr(env->type_env,e) == Ity_I128);
-//.. /* 64-bit literal */
-//.. if (e->tag == Iex_Const) {
-//.. ULong w64 = e->Iex.Const.con->Ico.U64;
-//.. UInt wHi = ((UInt)(w64 >> 32)) & 0xFFFFFFFF;
-//.. UInt wLo = ((UInt)w64) & 0xFFFFFFFF;
-//.. HReg tLo = newVRegI(env);
-//.. HReg tHi = newVRegI(env);
-//.. vassert(e->Iex.Const.con->tag == Ico_U64);
-//.. addInstr(env, X86Instr_Alu32R(Xalu_MOV, X86RMI_Imm(wHi), tHi));
-//.. addInstr(env, X86Instr_Alu32R(Xalu_MOV, X86RMI_Imm(wLo), tLo));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-
/* read 128-bit IRTemp */
if (e->tag == Iex_RdTmp) {
- lookupIRTemp128( rHi, rLo, env, e->Iex.RdTmp.tmp);
+ lookupIRTempPair( rHi, rLo, env, e->Iex.RdTmp.tmp);
return;
}
-//.. /* 64-bit load */
-//.. if (e->tag == Iex_LDle) {
-//.. HReg tLo, tHi;
-//.. X86AMode *am0, *am4;
-//.. vassert(e->Iex.LDle.ty == Ity_I64);
-//.. tLo = newVRegI(env);
-//.. tHi = newVRegI(env);
-//.. am0 = iselIntExpr_AMode(env, e->Iex.LDle.addr);
-//.. am4 = advance4(am0);
-//.. addInstr(env, X86Instr_Alu32R( Xalu_MOV, X86RMI_Mem(am0), tLo ));
-//.. addInstr(env, X86Instr_Alu32R( Xalu_MOV, X86RMI_Mem(am4), tHi ));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-//..
-//.. /* 64-bit GET */
-//.. if (e->tag == Iex_Get) {
-//.. X86AMode* am = X86AMode_IR(e->Iex.Get.offset, hregX86_EBP());
-//.. X86AMode* am4 = advance4(am);
-//.. HReg tLo = newVRegI(env);
-//.. HReg tHi = newVRegI(env);
-//.. addInstr(env, X86Instr_Alu32R( Xalu_MOV, X86RMI_Mem(am), tLo ));
-//.. addInstr(env, X86Instr_Alu32R( Xalu_MOV, X86RMI_Mem(am4), tHi ));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-//..
-//.. /* 64-bit GETI */
-//.. if (e->tag == Iex_GetI) {
-//.. X86AMode* am
-//.. = genGuestArrayOffset( env, e->Iex.GetI.descr,
-//.. e->Iex.GetI.ix, e->Iex.GetI.bias );
-//.. X86AMode* am4 = advance4(am);
-//.. HReg tLo = newVRegI(env);
-//.. HReg tHi = newVRegI(env);
-//.. addInstr(env, X86Instr_Alu32R( Xalu_MOV, X86RMI_Mem(am), tLo ));
-//.. addInstr(env, X86Instr_Alu32R( Xalu_MOV, X86RMI_Mem(am4), tHi ));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-//..
-//.. /* 64-bit Mux0X */
-//.. if (e->tag == Iex_Mux0X) {
-//.. HReg e0Lo, e0Hi, eXLo, eXHi, r8;
-//.. HReg tLo = newVRegI(env);
-//.. HReg tHi = newVRegI(env);
-//.. iselInt64Expr(&e0Hi, &e0Lo, env, e->Iex.Mux0X.expr0);
-//.. iselInt64Expr(&eXHi, &eXLo, env, e->Iex.Mux0X.exprX);
-//.. addInstr(env, mk_iMOVsd_RR(eXHi, tHi));
-//.. addInstr(env, mk_iMOVsd_RR(eXLo, tLo));
-//.. r8 = iselIntExpr_R(env, e->Iex.Mux0X.cond);
-//.. addInstr(env, X86Instr_Test32(X86RI_Imm(0xFF), X86RM_Reg(r8)));
-//.. /* This assumes the first cmov32 doesn't trash the condition
-//.. codes, so they are still available for the second cmov32 */
-//.. addInstr(env, X86Instr_CMov32(Xcc_Z,X86RM_Reg(e0Hi),tHi));
-//.. addInstr(env, X86Instr_CMov32(Xcc_Z,X86RM_Reg(e0Lo),tLo));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-
/* --------- BINARY ops --------- */
if (e->tag == Iex_Binop) {
switch (e->Iex.Binop.op) {
@@ -2528,276 +2362,11 @@
*rLo = iselIntExpr_R(env, e->Iex.Binop.arg2);
return;
-//.. /* Or64/And64/Xor64 */
-//.. case Iop_Or64:
-//.. case Iop_And64:
-//.. case Iop_Xor64: {
-//.. HReg xLo, xHi, yLo, yHi;
-//.. HReg tLo = newVRegI(env);
-//.. HReg tHi = newVRegI(env);
-//.. X86AluOp op = e->Iex.Binop.op==Iop_Or64 ? Xalu_OR
-//.. : e->Iex.Binop.op==Iop_And64 ? Xalu_AND
-//.. : Xalu_XOR;
-//.. iselInt64Expr(&xHi, &xLo, env, e->Iex.Binop.arg1);
-//.. addInstr(env, mk_iMOVsd_RR(xHi, tHi));
-//.. addInstr(env, mk_iMOVsd_RR(xLo, tLo));
-//.. iselInt64Expr(&yHi, &yLo, env, e->Iex.Binop.arg2);
-//.. addInstr(env, X86Instr_Alu32R(op, X86RMI_Reg(yHi), tHi));
-//.. addInstr(env, X86Instr_Alu32R(op, X86RMI_Reg(yLo), tLo));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-//..
-//.. /* Add64/Sub64 */
-//.. case Iop_Add64:
-//.. case Iop_Sub64: {
-//.. HReg xLo, xHi, yLo, yHi;
-//.. HReg tLo = newVRegI(env);
-//.. HReg tHi = newVRegI(env);
-//.. iselInt64Expr(&xHi, &xLo, env, e->Iex.Binop.arg1);
-//.. addInstr(env, mk_iMOVsd_RR(xHi, tHi));
-//.. addInstr(env, mk_iMOVsd_RR(xLo, tLo));
-//.. iselInt64Expr(&yHi, &yLo, env, e->Iex.Binop.arg2);
-//.. if (e->Iex.Binop.op==Iop_Add64) {
-//.. addInstr(env, X86Instr_Alu32R(Xalu_ADD, X86RMI_Reg(yLo), tLo));
-//.. addInstr(env, X86Instr_Alu32R(Xalu_ADC, X86RMI_Reg(yHi), tHi));
-//.. } else {
-//.. addInstr(env, X86Instr_Alu32R(Xalu_SUB, X86RMI_Reg(yLo), tLo));
-//.. addInstr(env, X86Instr_Alu32R(Xalu_SBB, X86RMI_Reg(yHi), tHi));
-//.. }
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-//..
-//.. /* 32HLto64(e1,e2) */
-//.. case Iop_32HLto64:
-//.. *rHi = iselIntExpr_R(env, e->Iex.Binop.arg1);
-//.. *rLo = iselIntExpr_R(env, e->Iex.Binop.arg2);
-//.. return;
-//..
-//.. /* 64-bit shifts */
-//.. case Iop_Shl64: {
-//.. /* We use the same ingenious scheme as gcc. Put the value
-//.. to be shifted into %hi:%lo, and the shift amount into
-//.. %cl. Then (dsts on right, a la ATT syntax):
-//..
-//.. shldl %cl, %lo, %hi -- make %hi be right for the
-//.. -- shift amt %cl % 32
-//.. shll %cl, %lo -- make %lo be right for the
-//.. -- shift amt %cl % 32
-//..
-//.. Now, if (shift amount % 64) is in the range 32 .. 63,
-//.. we have to do a fixup, which puts the result low half
-//.. into the result high half, and zeroes the low half:
-//..
-//.. testl $32, %ecx
-//..
-//.. cmovnz %lo, %hi
-//.. movl $0, %tmp -- sigh; need yet another reg
-//.. cmovnz %tmp, %lo
-//.. */
-//.. HReg rAmt, sHi, sLo, tHi, tLo, tTemp;
-//.. tLo = newVRegI(env);
-//.. tHi = newVRegI(env);
-//.. tTemp = newVRegI(env);
-//.. rAmt = iselIntExpr_R(env, e->Iex.Binop.arg2);
-//.. iselInt64Expr(&sHi,&sLo, env, e->Iex.Binop.arg1);
-//.. addInstr(env, mk_iMOVsd_RR(rAmt, hregX86_ECX()));
-//.. addInstr(env, mk_iMOVsd_RR(sHi, tHi));
-//.. addInstr(env, mk_iMOVsd_RR(sLo, tLo));
-//.. /* Ok. Now shift amt is in %ecx, and value is in tHi/tLo
-//.. and those regs are legitimately modifiable. */
-//.. addInstr(env, X86Instr_Sh3232(Xsh_SHL, 0/*%cl*/, tLo, tHi));
-//.. addInstr(env, X86Instr_Sh32(Xsh_SHL, 0/*%cl*/, X86RM_Reg(tLo)));
-//.. addInstr(env, X86Instr_Test32(X86RI_Imm(32),
-//.. X86RM_Reg(hregX86_ECX())));
-//.. addInstr(env, X86Instr_CMov32(Xcc_NZ, X86RM_Reg(tLo), tHi));
-//.. addInstr(env, X86Instr_Alu32R(Xalu_MOV, X86RMI_Imm(0), tTemp));
-//.. addInstr(env, X86Instr_CMov32(Xcc_NZ, X86RM_Reg(tTemp), tLo));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-//..
-//.. case Iop_Shr64: {
-//.. /* We use the same ingenious scheme as gcc. Put the value
-//.. to be shifted into %hi:%lo, and the shift amount into
-//.. %cl. Then:
-//..
-//.. shrdl %cl, %hi, %lo -- make %lo be right for the
-//.. -- shift amt %cl % 32
-//.. shrl %cl, %hi -- make %hi be right for the
-//.. -- shift amt %cl % 32
-//..
-//.. Now, if (shift amount % 64) is in the range 32 .. 63,
-//.. we have to do a fixup, which puts the result high half
-//.. into the result low half, and zeroes the high half:
-//..
-//.. testl $32, %ecx
-//..
-//.. cmovnz %hi, %lo
-//.. movl $0, %tmp -- sigh; need yet another reg
-//.. cmovnz %tmp, %hi
-//.. */
-//.. HReg rAmt, sHi, sLo, tHi, tLo, tTemp;
-//.. tLo = newVRegI(env);
-//.. tHi = newVRegI(env);
-//.. tTemp = newVRegI(env);
-//.. rAmt = iselIntExpr_R(env, e->Iex.Binop.arg2);
-//.. iselInt64Expr(&sHi,&sLo, env, e->Iex.Binop.arg1);
-//.. addInstr(env, mk_iMOVsd_RR(rAmt, hregX86_ECX()));
-//.. addInstr(env, mk_iMOVsd_RR(sHi, tHi));
-//.. addInstr(env, mk_iMOVsd_RR(sLo, tLo));
-//.. /* Ok. Now shift amt is in %ecx, and value is in tHi/tLo
-//.. and those regs are legitimately modifiable. */
-//.. addInstr(env, X86Instr_Sh3232(Xsh_SHR, 0/*%cl*/, tHi, tLo));
-//.. addInstr(env, X86Instr_Sh32(Xsh_SHR, 0/*%cl*/, X86RM_Reg(tHi)));
-//.. addInstr(env, X86Instr_Test32(X86RI_Imm(32),
-//.. X86RM_Reg(hregX86_ECX())));
-//.. addInstr(env, X86Instr_CMov32(Xcc_NZ, X86RM_Reg(tHi), tLo));
-//.. addInstr(env, X86Instr_Alu32R(Xalu_MOV, X86RMI_Imm(0), tTemp));
-//.. addInstr(env, X86Instr_CMov32(Xcc_NZ, X86RM_Reg(tTemp), tHi));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-//..
-//.. /* F64 -> I64 */
-//.. /* Sigh, this is an almost exact copy of the F64 -> I32/I16
-//.. case. Unfortunately I see no easy way to avoid the
-//.. duplication. */
-//.. case Iop_F64toI64: {
-//.. HReg rf = iselDblExpr(env, e->Iex.Binop.arg2);
-//.. HReg tLo = newVRegI(env);
-//.. HReg tHi = newVRegI(env);
-//..
-//.. /* Used several times ... */
-//.. /* Careful ... this sharing is only safe because
-//.. zero_esp/four_esp do not hold any registers which the
-//.. register allocator could attempt to swizzle later. */
-//.. X86AMode* zero_esp = X86AMode_IR(0, hregX86_ESP());
-//.. X86AMode* four_esp = X86AMode_IR(4, hregX86_ESP());
-//..
-//.. /* rf now holds the value to be converted, and rrm holds
-//.. the rounding mode value, encoded as per the
-//.. IRRoundingMode enum. The first thing to do is set the
-//.. FPU's rounding mode accordingly. */
-//..
-//.. /* Create a space for the format conversion. */
-//.. /* subl $8, %esp */
-//.. sub_from_esp(env, 8);
-//..
-//.. /* Set host rounding mode */
-//.. set_FPU_rounding_mode( env, e->Iex.Binop.arg1 );
-//..
-//.. /* gistll %rf, 0(%esp) */
-//.. addInstr(env, X86Instr_FpLdStI(False/*store*/, 8, rf, zero_esp));
-//..
-//.. /* movl 0(%esp), %dstLo */
-//.. /* movl 4(%esp), %dstHi */
-//.. addInstr(env, X86Instr_Alu32R(
-//.. Xalu_MOV, X86RMI_Mem(zero_esp), tLo));
-//.. addInstr(env, X86Instr_Alu32R(
-//.. Xalu_MOV, X86RMI_Mem(four_esp), tHi));
-//..
-//.. /* Restore default FPU rounding. */
-//.. set_FPU_rounding_default( env );
-//..
-//.. /* addl $8, %esp */
-//.. add_to_esp(env, 8);
-//..
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-//..
default:
break;
}
} /* if (e->tag == Iex_Binop) */
-
-//.. /* --------- UNARY ops --------- */
-//.. if (e->tag == Iex_Unop) {
-//.. switch (e->Iex.Unop.op) {
-//..
-//.. /* 32Sto64(e) */
-//.. case Iop_32Sto64: {
-//.. HReg tLo = newVRegI(env);
-//.. HReg tHi = newVRegI(env);
-//.. HReg src = iselIntExpr_R(env, e->Iex.Unop.arg);
-//.. addInstr(env, mk_iMOVsd_RR(src,tHi));
-//.. addInstr(env, mk_iMOVsd_RR(src,tLo));
-//.. addInstr(env, X86Instr_Sh32(Xsh_SAR, 31, X86RM_Reg(tHi)));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-//..
-//.. /* 32Uto64(e) */
-//.. case Iop_32Uto64: {
-//.. HReg tLo = newVRegI(env);
-//.. HReg tHi = newVRegI(env);
-//.. HReg src = iselIntExpr_R(env, e->Iex.Unop.arg);
-//.. addInstr(env, mk_iMOVsd_RR(src,tLo));
-//.. addInstr(env, X86Instr_Alu32R(Xalu_MOV, X86RMI_Imm(0), tHi));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-
-//.. /* could do better than this, but for now ... */
-//.. case Iop_1Sto64: {
-//.. HReg tLo = newVRegI(env);
-//.. HReg tHi = newVRegI(env);
-//.. X86CondCode cond = iselCondCode(env, e->Iex.Unop.arg);
-//.. addInstr(env, X86Instr_Set32(cond,tLo));
-//.. addInstr(env, X86Instr_Sh32(Xsh_SHL, 31, X86RM_Reg(tLo)));
-//.. addInstr(env, X86Instr_Sh32(Xsh_SAR, 31, X86RM_Reg(tLo)));
-//.. addInstr(env, mk_iMOVsd_RR(tLo, tHi));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-//..
-//.. /* Not64(e) */
-//.. case Iop_Not64: {
-//.. HReg tLo = newVRegI(env);
-//.. HReg tHi = newVRegI(env);
-//.. HReg sHi, sLo;
-//.. iselInt64Expr(&sHi, &sLo, env, e->Iex.Unop.arg);
-//.. addInstr(env, mk_iMOVsd_RR(sHi, tHi));
-//.. addInstr(env, mk_iMOVsd_RR(sLo, tLo));
-//.. addInstr(env, X86Instr_Unary32(Xun_NOT,X86RM_Reg(tHi)));
-//.. addInstr(env, X86Instr_Unary32(Xun_NOT,X86RM_Reg(tLo)));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-//..
-//.. default:
-//.. break;
-//.. }
-//.. } /* if (e->tag == Iex_Unop) */
-//..
-//..
-//.. /* --------- CCALL --------- */
-//.. if (e->tag == Iex_CCall) {
-//.. HReg tLo = newVRegI(env);
-//.. HReg tHi = newVRegI(env);
-//..
-//.. /* Marshal args, do the call, clear stack. */
-//.. doHelperCall( env, False, NULL, e->Iex.CCall.cee, e->Iex.CCall.args );
-//..
-//.. addInstr(env, mk_iMOVsd_RR(hregX86_EDX(), tHi));
-//.. addInstr(env, mk_iMOVsd_RR(hregX86_EAX(), tLo));
-//.. *rHi = tHi;
-//.. *rLo = tLo;
-//.. return;
-//.. }
-
ppIRExpr(e);
vpanic("iselInt128Expr");
}
@@ -3379,8 +2948,6 @@
return dst;
}
-//.. case Iop_Recip64Fx2: op = Xsse_RCPF; goto do_64Fx2_unary;
-//.. case Iop_RSqrt64Fx2: op = Asse_RSQRTF; goto do_64Fx2_unary;
case Iop_Sqrt64Fx2: op = Asse_SQRTF; goto do_64Fx2_unary;
do_64Fx2_unary:
{
@@ -3408,8 +2975,6 @@
return dst;
}
-//.. case Iop_Recip64F0x2: op = Xsse_RCPF; goto do_64F0x2_unary;
-//.. case Iop_RSqrt64F0x2: op = Xsse_RSQRTF; goto do_64F0x2_unary;
case Iop_Sqrt64F0x2: op = Asse_SQRTF; goto do_64F0x2_unary;
do_64F0x2_unary:
{
@@ -3453,6 +3018,7 @@
if (e->tag == Iex_Binop) {
switch (e->Iex.Binop.op) {
+ /* FIXME: could we generate MOVQ here? */
case Iop_SetV128lo64: {
HReg dst = newVRegV(env);
HReg srcV = iselVecExpr(env, e->Iex.Binop.arg1);
@@ -3464,6 +3030,7 @@
return dst;
}
+ /* FIXME: could we generate MOVD here? */
case Iop_SetV128lo32: {
HReg dst = newVRegV(env);
HReg srcV = iselVecExpr(env, e->Iex.Binop.arg1);
@@ -3476,13 +3043,16 @@
}
case Iop_64HLtoV128: {
- AMD64AMode* rsp = AMD64AMode_IR(0, hregAMD64_RSP());
+ HReg rsp = hregAMD64_RSP();
+ AMD64AMode* m8_rsp = AMD64AMode_IR(-8, rsp);
+ AMD64AMode* m16_rsp = AMD64AMode_IR(-16, rsp);
+ AMD64RI* qHi = iselIntExpr_RI(env, e->Iex.Binop.arg1);
+ AMD64RI* qLo = iselIntExpr_RI(env, e->Iex.Binop.arg2);
+ addInstr(env, AMD64Instr_Alu64M(Aalu_MOV, qHi, m8_rsp));
+ addInstr(env, AMD64Instr_Alu64M(Aalu_MOV, qLo, m16_rsp));
HReg dst = newVRegV(env);
- /* do this via the stack (easy, convenient, etc) */
- addInstr(env, AMD64Instr_Push(iselIntExpr_RMI(env, e->Iex.Binop.arg1)));
- addInstr(env, AMD64Instr_Push(iselIntExpr_RMI(env, e->Iex.Binop.arg2)));
- addInstr(env, AMD64Instr_SseLdSt(True/*load*/, 16, dst, rsp));
- add_to_rsp(env, 16);
+ /* One store-forwarding stall coming up, oh well :-( */
+ addInstr(env, AMD64Instr_SseLdSt(True/*load*/, 16, dst, m16_rsp));
return dst;
}
@@ -3811,6 +3381,153 @@
/*---------------------------------------------------------*/
+/*--- ISEL: SIMD (V256) expressions, 256 bit. ---*/
+/*---------------------------------------------------------*/
+
+static HReg iselV256Expr ( ISelEnv* env, IRExpr* e )
+{
+ HReg r = iselV256Expr_wrk( env, e );
+# if 0
+ vex_printf("\n"); ppIRExpr(e); vex_printf("\n");
+# endif
+ vassert(hregClass(r) == HRcVec256);
+ vassert(hregIsVirtual(r));
+ return r;
+}
+
+
+/* DO NOT CALL THIS DIRECTLY */
+static HReg iselV256Expr_wrk ( ISelEnv* env, IRExpr* e )
+{
+ //HWord fn = 0; /* address of helper fn, if required */
+ //Bool arg1isEReg = False;
+ //AMD64SseOp op = Asse_INVALID;
+ IRType ty = typeOfIRExpr(env->type_env,e);
+ vassert(e);
+ vassert(ty == Ity_V256);
+#if 0
+ if (e->tag == Iex_RdTmp) {
+ return lookupIRTemp(env, e->Iex.RdTmp.tmp);
+ }
+
+ if (e->tag == Iex_Get) {
+ HReg dst = newVRegDV(env);
+ addInstr(env, AMD64Instr_AvxLdSt(
+ True/*load*/,
+ dst,
+ AMD64AMode_IR(e->Iex.Get.offset, hregAMD64_RBP())
+ )
+ );
+ return dst;
+ }
+
+ if (e->tag == Iex_Load && e->Iex.Load.end == Iend_LE) {
+ HReg dst = newVRegDV(env);
+ AMD64AMode* am = iselIntExpr_AMode(env, e->Iex.Load.addr);
+ addInstr(env, AMD64Instr_AvxLdSt( True/*load*/, dst, am ));
+ return dst;
+ }
+#endif
+ //avx_fail:
+ vex_printf("iselV256Expr (amd64, subarch = %s): can't reduce\n",
+ LibVEX_ppVexHwCaps(VexArchAMD64, env->hwcaps));
+ ppIRExpr(e);
+ vpanic("iselV256Expr_wrk");
+}
+
+
+/*---------------------------------------------------------*/
+/*--- ISEL: SIMD (V256) expressions, into 2 XMM regs. --*/
+/*---------------------------------------------------------*/
+
+static void iselDVecExpr ( /*OUT*/HReg* rHi, HReg* rLo,
+ ISelEnv* env, IRExpr* e )
+{
+ iselDVecExpr_wrk( rHi, rLo, env, e );
+# if 0
+ vex_printf("\n"); ppIRExpr(e); vex_printf("\n");
+# endif
+ vassert(hregClass(*rHi) == HRcVec128);
+ vassert(hregClass(*rLo) == HRcVec128);
+ vassert(hregIsVirtual(*rHi));
+ vassert(hregIsVirtual(*rLo));
+}
+
+
+/* DO NOT CALL THIS DIRECTLY */
+static void iselDVecExpr_wrk ( /*OUT*/HReg* rHi, HReg* rLo,
+ ISelEnv* env, IRExpr* e )
+{
+ vassert(e);
+ IRType ty = typeOfIRExpr(env->type_env,e);
+ vassert(ty == Ity_V256);
+
+ /* read 256-bit IRTemp */
+ if (e->tag == Iex_RdTmp) {
+ lookupIRTempPair( rHi, rLo, env, e->Iex.RdTmp.tmp);
+ return;
+ }
+
+ if (e->tag == Iex_Get) {
+ HReg vHi = newVRegV(env);
+ HReg vLo = newVRegV(env);
+ HReg rbp = hregAMD64_RBP();
+ AMD64AMode* am0 = AMD64AMode_IR(e->Iex.Get.offset + 0, rbp);
+ AMD64AMode* am16 = AMD64AMode_IR(e->Iex.Get.offset + 16, rbp);
+ addInstr(env, AMD64Instr_SseLdSt(True/*load*/, 16, vLo, am0));
+ addInstr(env, AMD64Instr_SseLdSt(True/*load*/, 16, vHi, am16));
+ *rHi = vHi;
+ *rLo = vLo;
+ return;
+ }
+
+ if (e->tag == Iex_Load) {
+ HReg vHi = newVRegV(env);
+ HReg vLo = newVRegV(env);
+ HReg rA = iselIntExpr_R(env, e->Iex.Load.addr);
+ AMD64AMode* am0 = AMD64AMode_IR(0, rA);
+ AMD64AMode* am16 = AMD64AMode_IR(16, rA);
+ addInstr(env, AMD64Instr_SseLdSt(True/*load*/, 16, vLo, am0));
+ addInstr(env, AMD64Instr_SseLdSt(True/*load*/, 16, vHi, am16));
+ *rHi = vHi;
+ *rLo = vLo;
+ return;
+ }
+
+ if (e->tag == Iex_Qop && e->Iex.Qop.op == Iop_64x4toV256) {
+ HReg rsp = hregAMD64_RSP();
+ HReg vHi = newVRegV(env);
+ HReg vLo = newVRegV(env);
+ AMD64AMode* m8_rsp = AMD64AMode_IR(-8, rsp);
+ AMD64AMode* m16_rsp = AMD64AMode_IR(-16, rsp);
+ /* arg1 is the most significant (Q3), arg4 the least (Q0) */
+ /* Get all the args into regs, before messing with the stack. */
+ AMD64RI* q3 = iselIntExpr_RI(env, e->Iex.Qop.arg1);
+ AMD64RI* q2 = iselIntExpr_RI(env, e->Iex.Qop.arg2);
+ AMD64RI* q1 = iselIntExpr_RI(env, e->Iex.Qop.arg3);
+ AMD64RI* q0 = iselIntExpr_RI(env, e->Iex.Qop.arg4);
+ /* less significant lane (Q2) at the lower address (-16(rsp)) */
+ addInstr(env, AMD64Instr_Alu64M(Aalu_MOV, q3, m8_rsp));
+ addInstr(env, AMD64Instr_Alu64M(Aalu_MOV, q2, m16_rsp));
+ addInstr(env, AMD64Instr_SseLdSt(True/*load*/, 16, vHi, m16_rsp));
+ /* and then the lower half .. */
+ addInstr(env, AMD64Instr_Alu64M(Aalu_MOV, q1, m8_rsp));
+ addInstr(env, AMD64Instr_Alu64M(Aalu_MOV, q0, m16_rsp));
+ addInstr(env, AMD64Instr_SseLdSt(True/*load*/, 16, vLo, m16_rsp));
+ *rHi = vHi;
+ *rLo = vLo;
+ return;
+ }
+
+ //avx_fail:
+ vex_printf("iselDVecExpr (amd64, subarch = %s): can't reduce\n",
+ LibVEX_ppVexHwCaps(VexArchAMD64, env->hwcaps));
+ ppIRExpr(e);
+ vpanic("iselDVecExpr_wrk");
+}
+
+
+/*---------------------------------------------------------*/
/*--- ISEL: Statements ---*/
/*---------------------------------------------------------*/
@@ -3865,6 +3582,16 @@
addInstr(env, AMD64Instr_SseLdSt(False/*store*/, 16, r, am));
return;
}
+ if (tyd == Ity_V256) {
+ HReg rA = iselIntExpr_R(env, stmt->Ist.Store.addr);
+ AMD64AMode* am0 = AMD64AMode_IR(0, rA);
+ AMD64AMode* am16 = AMD64AMode_IR(16, rA);
+ HReg vHi, vLo;
+ iselDVecExpr(&vHi, &vLo, env, stmt->Ist.Store.data);
+ addInstr(env, AMD64Instr_SseLdSt(False/*store*/, 16, vLo, am0));
+ addInstr(env, AMD64Instr_SseLdSt(False/*store*/, 16, vHi, am16));
+ return;
+ }
break;
}
@@ -3893,13 +3620,6 @@
hregAMD64_RBP())));
return;
}
- if (ty == Ity_V128) {
- HReg vec = iselVecExpr(env, stmt->Ist.Put.data);
- AMD64AMode* am = AMD64AMode_IR(stmt->Ist.Put.offset,
- hregAMD64_RBP());
- addInstr(env, AMD64Instr_SseLdSt(False/*store*/, 16, vec, am));
- return;
- }
if (ty == Ity_F32) {
HReg f32 = iselFltExpr(env, stmt->Ist.Put.data);
AMD64AMode* am = AMD64AMode_IR(stmt->Ist.Put.offset, hregAMD64_RBP());
@@ -3914,6 +3634,23 @@
addInstr(env, AMD64Instr_SseLdSt( False/*store*/, 8, f64, am ));
return;
}
+ if (ty == Ity_V128) {
+ HReg vec = iselVecExpr(env, stmt->Ist.Put.data);
+ AMD64AMode* am = AMD64AMode_IR(stmt->Ist.Put.offset,
+ hregAMD64_RBP());
+ addInstr(env, AMD64Instr_SseLdSt(False/*store*/, 16, vec, am));
+ return;
+ }
+ if (ty == Ity_V256) {
+ HReg vHi, vLo;
+ iselDVecExpr(&vHi, &vLo, env, stmt->Ist.Put.data);
+ HReg rbp = hregAMD64_RBP();
+ AMD64AMode* am0 = AMD64AMode_IR(stmt->Ist.Put.offset + 0, rbp);
+ AMD64AMode* am16 = AMD64AMode_IR(stmt->Ist.Put.offset + 16, rbp);
+ addInstr(env, AMD64Instr_SseLdSt(False/*store*/, 16, vLo, am0));
+ addInstr(env, AMD64Instr_SseLdSt(False/*store*/, 16, vHi, am16));
+ return;
+ }
break;
}
@@ -3981,7 +3718,7 @@
if (ty == Ity_I128) {
HReg rHi, rLo, dstHi, dstLo;
iselInt128Expr(&rHi,&rLo, env, stmt->Ist.WrTmp.data);
- lookupIRTemp128( &dstHi, &dstLo, env, tmp);
+ lookupIRTempPair( &dstHi, &dstLo, env, tmp);
addInstr(env, mk_iMOVsd_RR(rHi,dstHi) );
addInstr(env, mk_iMOVsd_RR(rLo,dstLo) );
return;
@@ -4010,6 +3747,14 @@
addInstr(env, mk_vMOVsd_RR(src, dst));
return;
}
+ if (ty == Ity_V256) {
+ HReg rHi, rLo, dstHi, dstLo;
+ iselDVecExpr(&rHi,&rLo, env, stmt->Ist.WrTmp.data);
+ lookupIRTempPair( &dstHi, &dstLo, env, tmp);
+ addInstr(env, mk_vMOVsd_RR(rHi,dstHi) );
+ addInstr(env, mk_vMOVsd_RR(rLo,dstLo) );
+ return;
+ }
break;
}
@@ -4358,17 +4103,25 @@
hregHI = hreg = INVALID_HREG;
switch (bb->tyenv->types[i]) {
case Ity_I1:
- case Ity_I8:
- case Ity_I16:
- case Ity_I32:
- case Ity_I64: hreg = mkHReg(j++, HRcInt64, True); break;
- case Ity_I128: hreg = mkHReg(j++, HRcInt64, True);
- hregHI = mkHReg(j++, HRcInt64, True); break;
+ case Ity_I8: case Ity_I16: case Ity_I32: case Ity_I64:
+ hreg = mkHReg(j++, HRcInt64, True);
+ break;
+ case Ity_I128:
+ hreg = mkHReg(j++, HRcInt64, True);
+ hregHI = mkHReg(j++, HRcInt64, True);
+ break;
case Ity_F32:
case Ity_F64:
- case Ity_V128: hreg = mkHReg(j++, HRcVec128, True); break;
- default: ppIRType(bb->tyenv->types[i]);
- vpanic("iselBB(amd64): IRTemp type");
+ case Ity_V128:
+ hreg = mkHReg(j++, HRcVec128, True);
+ break;
+ case Ity_V256:
+ hreg = mkHReg(j++, HRcVec128, True);
+ hregHI = mkHReg(j++, HRcVec128, True);
+ break;
+ default:
+ ppIRType(bb->tyenv->types[i]);
+ vpanic("iselBB(amd64): IRTemp type");
}
env->vregmap[i] = hreg;
env->vregmapHI[i] = hregHI;
Modified: trunk/pub/libvex_ir.h (+14 -2)
===================================================================
--- trunk/pub/libvex_ir.h 2012-05-12 17:14:08 +01:00 (rev 2329)
+++ trunk/pub/libvex_ir.h 2012-05-21 11:18:49 +01:00 (rev 2330)
@@ -231,7 +231,8 @@
Ity_D64, /* 64-bit Decimal floating point */
Ity_D128, /* 128-bit Decimal floating point */
Ity_F128, /* 128-bit floating point; implementation defined */
- Ity_V128 /* 128-bit SIMD */
+ Ity_V128, /* 128-bit SIMD */
+ Ity_V256 /* 256-bit SIMD */
}
IRType;
@@ -1407,7 +1408,18 @@
/* Vector Reciprocal Estimate and Vector Reciprocal Square Root Estimate
See floating-point equiwalents for details. */
- Iop_Recip32x4, Iop_Rsqrte32x4
+ Iop_Recip32x4, Iop_Rsqrte32x4,
+
+ /* ------------------ 256-bit SIMD Integer. ------------------ */
+
+ /* Pack/unpack */
+ Iop_V256to64_0, // V256 -> I64, extract least sigificant lane
+ Iop_V256to64_1,
+ Iop_V256to64_2,
+ Iop_V256to64_3, // V256 -> I64, extract most sigificant lane
+
+ Iop_64x4toV256 // (I64,I64,I64,I64)->V256
+ // first arg is most significant lane
}
IROp;
Modified: trunk/priv/host_amd64_defs.h (+20 -4)
===================================================================
--- trunk/priv/host_amd64_defs.h 2012-05-12 17:14:08 +01:00 (rev 2329)
+++ trunk/priv/host_amd64_defs.h 2012-05-21 11:18:49 +01:00 (rev 2330)
@@ -71,7 +71,6 @@
extern HReg hregAMD64_XMM0 ( void );
extern HReg hregAMD64_XMM1 ( void );
-extern HReg hregAMD64_XMM2 ( void );
extern HReg hregAMD64_XMM3 ( void );
extern HReg hregAMD64_XMM4 ( void );
extern HReg hregAMD64_XMM5 ( void );
@@ -82,11 +81,13 @@
extern HReg hregAMD64_XMM10 ( void );
extern HReg hregAMD64_XMM11 ( void );
extern HReg hregAMD64_XMM12 ( void );
-extern HReg hregAMD64_XMM13 ( void );
-extern HReg hregAMD64_XMM14 ( void );
-extern HReg hregAMD64_XMM15 ( void );
+extern HReg hregAMD64_YMM2 ( void );
+extern HReg hregAMD64_YMM13 ( void );
+extern HReg hregAMD64_YMM14 ( void );
+extern HReg hregAMD64_YMM15 ( void );
+
/* --------- Condition codes, AMD encoding. --------- */
typedef
@@ -399,6 +400,9 @@
Ain_SseReRg, /* SSE binary general reg-reg, Re, Rg */
Ain_SseCMov, /* SSE conditional move */
Ain_SseShuf, /* SSE2 shuffle (pshufd) */
+ Ain_AvxLdSt, /* AVX load/store 256 bits,
+ no alignment constraints */
+ Ain_AvxReRg, /* AVX binary general reg-reg, Re, Rg */
Ain_EvCheck, /* Event check */
Ain_ProfInc /* 64-bit profile counter increment */
}
@@ -665,6 +669,16 @@
HReg dst;
} SseShuf;
struct {
+ Bool isLoad;
+ HReg reg;
+ AMD64AMode* addr;
+ } AvxLdSt;
+ struct {
+ AMD64SseOp op;
+ HReg src;
+ HReg dst;
+ } AvxReRg;
+ struct {
AMD64AMode* amCounter;
AMD64AMode* amFailAddr;
} EvCheck;
@@ -726,6 +740,8 @@
extern AMD64Instr* AMD64Instr_SseReRg ( AMD64SseOp, HReg, HReg );
extern AMD64Instr* AMD64Instr_SseCMov ( AMD64CondCode, HReg src, HReg dst );
extern AMD64Instr* AMD64Instr_SseShuf ( Int order, HReg src, HReg dst );
+extern AMD64Instr* AMD64Instr_AvxLdSt ( Bool isLoad, HReg, AMD64AMode* );
+extern AMD64Instr* AMD64Instr_AvxReRg ( AMD64SseOp, HReg, HReg );
extern AMD64Instr* AMD64Instr_EvCheck ( AMD64AMode* amCounter,
AMD64AMode* amFailAddr );
extern AMD64Instr* AMD64Instr_ProfInc ( void );
Modified: trunk/priv/host_amd64_defs.c (+206 -26)
===================================================================
--- trunk/priv/host_amd64_defs.c 2012-05-12 17:14:08 +01:00 (rev 2329)
+++ trunk/priv/host_amd64_defs.c 2012-05-21 11:18:49 +01:00 (rev 2330)
@@ -72,6 +72,11 @@
vassert(r >= 0 && r < 16);
vex_printf("%%xmm%d", r);
return;
+ case HRcVec256:
+ r = hregNumber(reg);
+ vassert(r >= 0 && r < 16);
+ vex_printf("%%ymm%d", r);
+ return;
default:
vpanic("ppHRegAMD64");
}
@@ -120,7 +125,6 @@
HReg hregAMD64_XMM0 ( void ) { return mkHReg( 0, HRcVec128, False); }
HReg hregAMD64_XMM1 ( void ) { return mkHReg( 1, HRcVec128, False); }
-HReg hregAMD64_XMM2 ( void ) { return mkHReg( 2, HRcVec128, False); }
HReg hregAMD64_XMM3 ( void ) { return mkHReg( 3, HRcVec128, False); }
HReg hregAMD64_XMM4 ( void ) { return mkHReg( 4, HRcVec128, False); }
HReg hregAMD64_XMM5 ( void ) { return mkHReg( 5, HRcVec128, False); }
@@ -131,11 +135,13 @@
HReg hregAMD64_XMM10 ( void ) { return mkHReg(10, HRcVec128, False); }
HReg hregAMD64_XMM11 ( void ) { return mkHReg(11, HRcVec128, False); }
HReg hregAMD64_XMM12 ( void ) { return mkHReg(12, HRcVec128, False); }
-HReg hregAMD64_XMM13 ( void ) { return mkHReg(13, HRcVec128, False); }
-HReg hregAMD64_XMM14 ( void ) { return mkHReg(14, HRcVec128, False); }
-HReg hregAMD64_XMM15 ( void ) { return mkHReg(15, HRcVec128, False); }
+HReg hregAMD64_YMM2 ( void ) { return mkHReg( 2, HRcVec256, False); }
+HReg hregAMD64_YMM13 ( void ) { return mkHReg(13, HRcVec256, False); }
+HReg hregAMD64_YMM14 ( void ) { return mkHReg(14, HRcVec256, False); }
+HReg hregAMD64_YMM15 ( void ) { return mkHReg(15, HRcVec256, False); }
+
void getAllocableRegs_AMD64 ( Int* nregs, HReg** arr )
{
#if 0
@@ -980,6 +986,23 @@
vassert(order >= 0 && order <= 0xFF);
return i;
}
+AMD64Instr* AMD64Instr_AvxLdSt ( Bool isLoad,
+ HReg reg, AMD64AMode* addr ) {
+ AMD64Instr* i = LibVEX_Alloc(sizeof(AMD64Instr));
+ i->tag = Ain_AvxLdSt;
+ i->Ain.AvxLdSt.isLoad = isLoad;
+ i->Ain.AvxLdSt.reg = reg;
+ i->Ain.AvxLdSt.addr = addr;
+ return i;
+}
+AMD64Instr* AMD64Instr_AvxReRg ( AMD64SseOp op, HReg re, HReg rg ) {
+ AMD64Instr* i = LibVEX_Alloc(sizeof(AMD64Instr));
+ i->tag = Ain_AvxReRg;
+ i->Ain.AvxReRg.op = op;
+ i->Ain.AvxReRg.src = re;
+ i->Ain.AvxReRg.dst = rg;
+ return i;
+}
AMD64Instr* AMD64Instr_EvCheck ( AMD64AMode* amCounter,
AMD64AMode* amFailAddr ) {
AMD64Instr* i = LibVEX_Alloc(sizeof(AMD64Instr));
@@ -1275,6 +1298,25 @@
vex_printf(",");
ppHRegAMD64(i->Ain.SseShuf.dst);
return;
+
+ case Ain_AvxLdSt:
+ vex_printf("vmovups ");
+ if (i->Ain.AvxLdSt.isLoad) {
+ ppAMD64AMode(i->Ain.AvxLdSt.addr);
+ vex_printf(",");
+ ppHRegAMD64(i->Ain.AvxLdSt.reg);
+ } else {
+ ppHRegAMD64(i->Ain.AvxLdSt.reg);
+ vex_printf(",");
+ ppAMD64AMode(i->Ain.AvxLdSt.addr);
+ }
+ return;
+ case Ain_AvxReRg:
+ vex_printf("v%s ", showAMD64SseOp(i->Ain.SseReRg.op));
+ ppHRegAMD64(i->Ain.AvxReRg.src);
+ vex_printf(",");
+ ppHRegAMD64(i->Ain.AvxReRg.dst);
+ return;
case Ain_EvCheck:
vex_printf("(evCheck) decl ");
ppAMD64AMode(i->Ain.EvCheck.amCounter);
@@ -1360,7 +1402,7 @@
/* First off, claim it trashes all the caller-saved regs
which fall within the register allocator's jurisdiction.
These I believe to be: rax rcx rdx rsi rdi r8 r9 r10 r11
- and all the xmm registers.
+ and all the xmm/ymm registers.
*/
addHRegUse(u, HRmWrite, hregAMD64_RAX());
addHRegUse(u, HRmWrite, hregAMD64_RCX());
@@ -1373,7 +1415,6 @@
addHRegUse(u, HRmWrite, hregAMD64_R11());
addHRegUse(u, HRmWrite, hregAMD64_XMM0());
addHRegUse(u, HRmWrite, hregAMD64_XMM1());
- addHRegUse(u, HRmWrite, hregAMD64_XMM2());
addHRegUse(u, HRmWrite, hregAMD64_XMM3());
addHRegUse(u, HRmWrite, hregAMD64_XMM4());
addHRegUse(u, HRmWrite, hregAMD64_XMM5());
@@ -1384,9 +1425,10 @@
addHRegUse(u, HRmWrite, hregAMD64_XMM10());
addHRegUse(u, HRmWrite, hregAMD64_XMM11());
addHRegUse(u, HRmWrite, hregAMD64_XMM12());
- addHRegUse(u, HRmWrite, hregAMD64_XMM13());
- addHRegUse(u, HRmWrite, hregAMD64_XMM14());
- addHRegUse(u, HRmWrite, hregAMD64_XMM15());
+ addHRegUse(u, HRmWrite, hregAMD64_YMM2());
+ addHRegUse(u, HRmWrite, hregAMD64_YMM13());
+ addHRegUse(u, HRmWrite, hregAMD64_YMM14());
+ addHRegUse(u, HRmWrite, hregAMD64_YMM15());
/* Now we have to state any parameter-carrying registers
which might be read. This depends on the regparmness. */
@@ -1567,6 +1609,24 @@
addHRegUse(u, HRmRead, i->Ain.SseShuf.src);
addHRegUse(u, HRmWrite, i->Ain.SseShuf.dst);
return;
+ case Ain_AvxLdSt:
+ addRegUsage_AMD64AMode(u, i->Ain.AvxLdSt.addr);
+ addHRegUse(u, i->Ain.AvxLdSt.isLoad ? HRmWrite : HRmRead,
+ i->Ain.AvxLdSt.reg);
+ return;
+ case Ain_AvxReRg:
+ if ( (i->Ain.AvxReRg.op == Asse_XOR
+ || i->Ain.AvxReRg.op == Asse_CMPEQ32)
+ && i->Ain.AvxReRg.src == i->Ain.AvxReRg.dst) {
+ /* See comments on the case for Ain_SseReRg. */
+ addHRegUse(u, HRmWrite, i->Ain.AvxReRg.dst);
+ } else {
+ addHRegUse(u, HRmRead, i->Ain.AvxReRg.src);
+ addHRegUse(u, i->Ain.AvxReRg.op == Asse_MOV
+ ? HRmWrite : HRmModify,
+ i->Ain.AvxReRg.dst);
+ }
+ return;
case Ain_EvCheck:
/* We expect both amodes only to mention %rbp, so this is in
fact pointless, since %rbp isn't allocatable, but anyway.. */
@@ -1742,6 +1802,14 @@
mapReg(m, &i->Ain.SseShuf.src);
mapReg(m, &i->Ain.SseShuf.dst);
return;
+ case Ain_AvxLdSt:
+ mapReg(m, &i->Ain.AvxLdSt.reg);
+ mapRegs_AMD64AMode(m, i->Ain.AvxLdSt.addr);
+ break;
+ case Ain_AvxReRg:
+ mapReg(m, &i->Ain.AvxReRg.src);
+ mapReg(m, &i->Ain.AvxReRg.dst);
+ return;
case Ain_EvCheck:
/* We expect both amodes only to mention %rbp, so this is in
fact pointless, since %rbp isn't allocatable, but anyway.. */
@@ -1763,25 +1831,34 @@
*/
Bool isMove_AMD64Instr ( AMD64Instr* i, HReg* src, HReg* dst )
{
- /* Moves between integer regs */
- if (i->tag == Ain_Alu64R) {
- if (i->Ain.Alu64R.op != Aalu_MOV)
+ switch (i->tag) {
+ case Ain_Alu64R:
+ /* Moves between integer regs */
+ if (i->Ain.Alu64R.op != Aalu_MOV)
+ ret...
[truncated message content] |
|
From: <sv...@va...> - 2012-05-21 10:18:24
|
sewardj 2012-05-21 11:18:10 +0100 (Mon, 21 May 2012)
New Revision: 12569
Log:
Add initial support for Intel AVX instructions (Valgrind side).
Tracker bug is #273475.
Added files:
trunk/docs/internals/avx-notes.txt
trunk/none/tests/amd64/avx-1.c
Modified files:
trunk/coregrind/m_coredump/coredump-elf.c
trunk/coregrind/m_gdbserver/valgrind-low-amd64.c
trunk/coregrind/m_scheduler/scheduler.c
trunk/coregrind/pub_core_threadstate.h
trunk/docs/Makefile.am
trunk/memcheck/mc_include.h
trunk/memcheck/mc_machine.c
trunk/memcheck/mc_main.c
trunk/memcheck/mc_translate.c
Modified: trunk/coregrind/m_coredump/coredump-elf.c (+2 -1)
===================================================================
--- trunk/coregrind/m_coredump/coredump-elf.c 2012-05-18 17:48:20 +01:00 (rev 12568)
+++ trunk/coregrind/m_coredump/coredump-elf.c 2012-05-21 11:18:10 +01:00 (rev 12569)
@@ -417,7 +417,8 @@
//:: fpu->mxcsr_mask = ?;
//:: fpu->st_space = ?;
-# define DO(n) VG_(memcpy)(fpu->xmm_space + n * 4, &arch->vex.guest_XMM##n, sizeof(arch->vex.guest_XMM##n))
+# define DO(n) VG_(memcpy)(fpu->xmm_space + n * 4, \
+ &arch->vex.guest_YMM##n[0], 16)
DO(0); DO(1); DO(2); DO(3); DO(4); DO(5); DO(6); DO(7);
DO(8); DO(9); DO(10); DO(11); DO(12); DO(13); DO(14); DO(15);
# undef DO
Modified: trunk/memcheck/mc_machine.c (+17 -17)
===================================================================
--- trunk/memcheck/mc_machine.c 2012-05-18 17:48:20 +01:00 (rev 12568)
+++ trunk/memcheck/mc_machine.c 2012-05-21 11:18:10 +01:00 (rev 12569)
@@ -611,23 +611,23 @@
if (o == GOF(FC3210) && szB == 8) return -1;
/* XMM registers */
- if (o >= GOF(XMM0) && o+sz <= GOF(XMM0) +SZB(XMM0)) return GOF(XMM0);
- if (o >= GOF(XMM1) && o+sz <= GOF(XMM1) +SZB(XMM1)) return GOF(XMM1);
- if (o >= GOF(XMM2) && o+sz <= GOF(XMM2) +SZB(XMM2)) return GOF(XMM2);
- if (o >= GOF(XMM3) && o+sz <= GOF(XMM3) +SZB(XMM3)) return GOF(XMM3);
- if (o >= GOF(XMM4) && o+sz <= GOF(XMM4) +SZB(XMM4)) return GOF(XMM4);
- if (o >= GOF(XMM5) && o+sz <= GOF(XMM5) +SZB(XMM5)) return GOF(XMM5);
- if (o >= GOF(XMM6) && o+sz <= GOF(XMM6) +SZB(XMM6)) return GOF(XMM6);
- if (o >= GOF(XMM7) && o+sz <= GOF(XMM7) +SZB(XMM7)) return GOF(XMM7);
- if (o >= GOF(XMM8) && o+sz <= GOF(XMM8) +SZB(XMM8)) return GOF(XMM8);
- if (o >= GOF(XMM9) && o+sz <= GOF(XMM9) +SZB(XMM9)) return GOF(XMM9);
- if (o >= GOF(XMM10) && o+sz <= GOF(XMM10)+SZB(XMM10)) return GOF(XMM10);
- if (o >= GOF(XMM11) && o+sz <= GOF(XMM11)+SZB(XMM11)) return GOF(XMM11);
- if (o >= GOF(XMM12) && o+sz <= GOF(XMM12)+SZB(XMM12)) return GOF(XMM12);
- if (o >= GOF(XMM13) && o+sz <= GOF(XMM13)+SZB(XMM13)) return GOF(XMM13);
- if (o >= GOF(XMM14) && o+sz <= GOF(XMM14)+SZB(XMM14)) return GOF(XMM14);
- if (o >= GOF(XMM15) && o+sz <= GOF(XMM15)+SZB(XMM15)) return GOF(XMM15);
- if (o >= GOF(XMM16) && o+sz <= GOF(XMM16)+SZB(XMM16)) return GOF(XMM16);
+ if (o >= GOF(YMM0) && o+sz <= GOF(YMM0) +SZB(YMM0)) return GOF(YMM0);
+ if (o >= GOF(YMM1) && o+sz <= GOF(YMM1) +SZB(YMM1)) return GOF(YMM1);
+ if (o >= GOF(YMM2) && o+sz <= GOF(YMM2) +SZB(YMM2)) return GOF(YMM2);
+ if (o >= GOF(YMM3) && o+sz <= GOF(YMM3) +SZB(YMM3)) return GOF(YMM3);
+ if (o >= GOF(YMM4) && o+sz <= GOF(YMM4) +SZB(YMM4)) return GOF(YMM4);
+ if (o >= GOF(YMM5) && o+sz <= GOF(YMM5) +SZB(YMM5)) return GOF(YMM5);
+ if (o >= GOF(YMM6) && o+sz <= GOF(YMM6) +SZB(YMM6)) return GOF(YMM6);
+ if (o >= GOF(YMM7) && o+sz <= GOF(YMM7) +SZB(YMM7)) return GOF(YMM7);
+ if (o >= GOF(YMM8) && o+sz <= GOF(YMM8) +SZB(YMM8)) return GOF(YMM8);
+ if (o >= GOF(YMM9) && o+sz <= GOF(YMM9) +SZB(YMM9)) return GOF(YMM9);
+ if (o >= GOF(YMM10) && o+sz <= GOF(YMM10)+SZB(YMM10)) return GOF(YMM10);
+ if (o >= GOF(YMM11) && o+sz <= GOF(YMM11)+SZB(YMM11)) return GOF(YMM11);
+ if (o >= GOF(YMM12) && o+sz <= GOF(YMM12)+SZB(YMM12)) return GOF(YMM12);
+ if (o >= GOF(YMM13) && o+sz <= GOF(YMM13)+SZB(YMM13)) return GOF(YMM13);
+ if (o >= GOF(YMM14) && o+sz <= GOF(YMM14)+SZB(YMM14)) return GOF(YMM14);
+ if (o >= GOF(YMM15) && o+sz <= GOF(YMM15)+SZB(YMM15)) return GOF(YMM15);
+ if (o >= GOF(YMM16) && o+sz <= GOF(YMM16)+SZB(YMM16)) return GOF(YMM16);
/* MMX accesses to FP regs. Need to allow for 32-bit references
due to dirty helpers for frstor etc, which reference the entire
Modified: trunk/memcheck/mc_main.c (+17 -0)
===================================================================
--- trunk/memcheck/mc_main.c 2012-05-18 17:48:20 +01:00 (rev 12568)
+++ trunk/memcheck/mc_main.c 2012-05-21 11:18:10 +01:00 (rev 12569)
@@ -5860,7 +5860,17 @@
return (UWord)oBoth;
}
+UWord VG_REGPARM(1) MC_(helperc_b_load32)( Addr a ) {
+ UInt oQ0 = (UInt)MC_(helperc_b_load8)( a + 0 );
+ UInt oQ1 = (UInt)MC_(helperc_b_load8)( a + 8 );
+ UInt oQ2 = (UInt)MC_(helperc_b_load8)( a + 16 );
+ UInt oQ3 = (UInt)MC_(helperc_b_load8)( a + 24 );
+ UInt oAll = merge_origins(merge_origins(oQ0, oQ1),
+ merge_origins(oQ2, oQ3));
+ return (UWord)oAll;
+}
+
/*--------------------------------------------*/
/*--- Origin tracking: store handlers ---*/
/*--------------------------------------------*/
@@ -5972,7 +5982,14 @@
MC_(helperc_b_store8)( a + 8, d32 );
}
+void VG_REGPARM(2) MC_(helperc_b_store32)( Addr a, UWord d32 ) {
+ MC_(helperc_b_store8)( a + 0, d32 );
+ MC_(helperc_b_store8)( a + 8, d32 );
+ MC_(helperc_b_store8)( a + 16, d32 );
+ MC_(helperc_b_store8)( a + 24, d32 );
+}
+
/*--------------------------------------------*/
/*--- Origin tracking: sarp handlers ---*/
/*--------------------------------------------*/
Modified: trunk/coregrind/m_gdbserver/valgrind-low-amd64.c (+16 -16)
===================================================================
--- trunk/coregrind/m_gdbserver/valgrind-low-amd64.c 2012-05-18 17:48:20 +01:00 (rev 12568)
+++ trunk/coregrind/m_gdbserver/valgrind-low-amd64.c 2012-05-21 11:18:10 +01:00 (rev 12569)
@@ -251,22 +251,22 @@
case 37: *mod = False; break; // GDBTD ??? equivalent of foseg
case 38: *mod = False; break; // GDBTD ??? equivalent of fooff
case 39: *mod = False; break; // GDBTD ??? equivalent of fop
- case 40: VG_(transfer) (&amd64->guest_XMM0, buf, dir, size, mod); break;
- case 41: VG_(transfer) (&amd64->guest_XMM1, buf, dir, size, mod); break;
- case 42: VG_(transfer) (&amd64->guest_XMM2, buf, dir, size, mod); break;
- case 43: VG_(transfer) (&amd64->guest_XMM3, buf, dir, size, mod); break;
- case 44: VG_(transfer) (&amd64->guest_XMM4, buf, dir, size, mod); break;
- case 45: VG_(transfer) (&amd64->guest_XMM5, buf, dir, size, mod); break;
- case 46: VG_(transfer) (&amd64->guest_XMM6, buf, dir, size, mod); break;
- case 47: VG_(transfer) (&amd64->guest_XMM7, buf, dir, size, mod); break;
- case 48: VG_(transfer) (&amd64->guest_XMM8, buf, dir, size, mod); break;
- case 49: VG_(transfer) (&amd64->guest_XMM9, buf, dir, size, mod); break;
- case 50: VG_(transfer) (&amd64->guest_XMM10, buf, dir, size, mod); break;
- case 51: VG_(transfer) (&amd64->guest_XMM11, buf, dir, size, mod); break;
- case 52: VG_(transfer) (&amd64->guest_XMM12, buf, dir, size, mod); break;
- case 53: VG_(transfer) (&amd64->guest_XMM13, buf, dir, size, mod); break;
- case 54: VG_(transfer) (&amd64->guest_XMM14, buf, dir, size, mod); break;
- case 55: VG_(transfer) (&amd64->guest_XMM15, buf, dir, size, mod); break;
+ case 40: VG_(transfer) (&amd64->guest_YMM0[0], buf, dir, size, mod); break;
+ case 41: VG_(transfer) (&amd64->guest_YMM1[0], buf, dir, size, mod); break;
+ case 42: VG_(transfer) (&amd64->guest_YMM2[0], buf, dir, size, mod); break;
+ case 43: VG_(transfer) (&amd64->guest_YMM3[0], buf, dir, size, mod); break;
+ case 44: VG_(transfer) (&amd64->guest_YMM4[0], buf, dir, size, mod); break;
+ case 45: VG_(transfer) (&amd64->guest_YMM5[0], buf, dir, size, mod); break;
+ case 46: VG_(transfer) (&amd64->guest_YMM6[0], buf, dir, size, mod); break;
+ case 47: VG_(transfer) (&amd64->guest_YMM7[0], buf, dir, size, mod); break;
+ case 48: VG_(transfer) (&amd64->guest_YMM8[0], buf, dir, size, mod); break;
+ case 49: VG_(transfer) (&amd64->guest_YMM9[0], buf, dir, size, mod); break;
+ case 50: VG_(transfer) (&amd64->guest_YMM10[0], buf, dir, size, mod); break;
+ case 51: VG_(transfer) (&amd64->guest_YMM11[0], buf, dir, size, mod); break;
+ case 52: VG_(transfer) (&amd64->guest_YMM12[0], buf, dir, size, mod); break;
+ case 53: VG_(transfer) (&amd64->guest_YMM13[0], buf, dir, size, mod); break;
+ case 54: VG_(transfer) (&amd64->guest_YMM14[0], buf, dir, size, mod); break;
+ case 55: VG_(transfer) (&amd64->guest_YMM15[0], buf, dir, size, mod); break;
case 56:
if (dir == valgrind_to_gdbserver) {
// vex only models the rounding bits (see libvex_guest_x86.h)
Modified: trunk/memcheck/mc_translate.c (+108 -11)
===================================================================
--- trunk/memcheck/mc_translate.c 2012-05-18 17:48:20 +01:00 (rev 12568)
+++ trunk/memcheck/mc_translate.c 2012-05-21 11:18:10 +01:00 (rev 12569)
@@ -358,7 +358,7 @@
/* Shadow state is always accessed using integer types. This returns
an integer type with the same size (as per sizeofIRType) as the
given type. The only valid shadow types are Bit, I8, I16, I32,
- I64, I128, V128. */
+ I64, I128, V128, V256. */
static IRType shadowTypeV ( IRType ty )
{
@@ -376,6 +376,7 @@
case Ity_F128: return Ity_I128;
case Ity_D128: return Ity_I128;
case Ity_V128: return Ity_V128;
+ case Ity_V256: return Ity_V256;
default: ppIRType(ty);
VG_(tool_panic)("memcheck:shadowTypeV");
}
@@ -461,15 +462,18 @@
/*------------------------------------------------------------*/
/*--- Helper functions for 128-bit ops ---*/
/*------------------------------------------------------------*/
+
static IRExpr *i128_const_zero(void)
{
- return binop(Iop_64HLto128, IRExpr_Const(IRConst_U64(0)),
- IRExpr_Const(IRConst_U64(0)));
+ IRAtom* z64 = IRExpr_Const(IRConst_U64(0));
+ return binop(Iop_64HLto128, z64, z64);
}
-/* There are no 128-bit loads and/or stores. So we do not need to worry
- about that in expr2vbits_Load */
+/* There are no I128-bit loads and/or stores [as generated by any
+ current front ends]. So we do not need to worry about that in
+ expr2vbits_Load */
+
/*------------------------------------------------------------*/
/*--- Constructing definedness primitive ops ---*/
/*------------------------------------------------------------*/
@@ -3716,7 +3720,6 @@
IREndness end, IRType ty,
IRAtom* addr, UInt bias )
{
- IRAtom *v64hi, *v64lo;
tl_assert(end == Iend_LE || end == Iend_BE);
switch (shadowTypeV(ty)) {
case Ity_I8:
@@ -3724,17 +3727,33 @@
case Ity_I32:
case Ity_I64:
return expr2vbits_Load_WRK(mce, end, ty, addr, bias);
- case Ity_V128:
+ case Ity_V128: {
+ IRAtom *v64hi, *v64lo;
if (end == Iend_LE) {
- v64lo = expr2vbits_Load_WRK(mce, end, Ity_I64, addr, bias);
+ v64lo = expr2vbits_Load_WRK(mce, end, Ity_I64, addr, bias+0);
v64hi = expr2vbits_Load_WRK(mce, end, Ity_I64, addr, bias+8);
} else {
- v64hi = expr2vbits_Load_WRK(mce, end, Ity_I64, addr, bias);
+ v64hi = expr2vbits_Load_WRK(mce, end, Ity_I64, addr, bias+0);
v64lo = expr2vbits_Load_WRK(mce, end, Ity_I64, addr, bias+8);
}
return assignNew( 'V', mce,
Ity_V128,
binop(Iop_64HLtoV128, v64hi, v64lo));
+ }
+ case Ity_V256: {
+ /* V256-bit case -- phrased in terms of 64 bit units (Qs),
+ with Q3 being the most significant lane. */
+ if (end == Iend_BE) goto unhandled;
+ IRAtom* v64Q0 = expr2vbits_Load_WRK(mce, end, Ity_I64, addr, bias+0);
+ IRAtom* v64Q1 = expr2vbits_Load_WRK(mce, end, Ity_I64, addr, bias+8);
+ IRAtom* v64Q2 = expr2vbits_Load_WRK(mce, end, Ity_I64, addr, bias+16);
+ IRAtom* v64Q3 = expr2vbits_Load_WRK(mce, end, Ity_I64, addr, bias+24);
+ return assignNew( 'V', mce,
+ Ity_V256,
+ IRExpr_Qop(Iop_64x4toV256,
+ v64Q3, v64Q2, v64Q1, v64Q0));
+ }
+ unhandled:
default:
VG_(tool_panic)("expr2vbits_Load");
}
@@ -3934,7 +3953,8 @@
// shadow computation ops that precede it.
if (MC_(clo_mc_level) == 1) {
switch (ty) {
- case Ity_V128: // V128 weirdness
+ case Ity_V256: // V256 weirdness -- used four times
+ case Ity_V128: // V128 weirdness -- used twice
c = IRConst_V128(V_BITS16_DEFINED); break;
case Ity_I64: c = IRConst_U64 (V_BITS64_DEFINED); break;
case Ity_I32: c = IRConst_U32 (V_BITS32_DEFINED); break;
@@ -3953,6 +3973,7 @@
bits into shadow memory. */
if (end == Iend_LE) {
switch (ty) {
+ case Ity_V256: /* we'll use the helper four times */
case Ity_V128: /* we'll use the helper twice */
case Ity_I64: helper = &MC_(helperc_STOREV64le);
hname = "MC_(helperc_STOREV64le)";
@@ -3983,12 +4004,82 @@
case Ity_I8: helper = &MC_(helperc_STOREV8);
hname = "MC_(helperc_STOREV8)";
break;
+ /* Note, no V256 case here, because no big-endian target that
+ we support, has 256 vectors. */
default: VG_(tool_panic)("memcheck:do_shadow_Store(BE)");
}
}
- if (ty == Ity_V128) {
+ if (UNLIKELY(ty == Ity_V256)) {
+ /* V256-bit case -- phrased in terms of 64 bit units (Qs), with
+ Q3 being the most significant lane. */
+ /* These are the offsets of the Qs in memory. */
+ Int offQ0, offQ1, offQ2, offQ3;
+
+ /* Various bits for constructing the 4 lane helper calls */
+ IRDirty *diQ0, *diQ1, *diQ2, *diQ3;
+ IRAtom *addrQ0, *addrQ1, *addrQ2, *addrQ3;
+ IRAtom *vdataQ0, *vdataQ1, *vdataQ2, *vdataQ3;
+ IRAtom *eBiasQ0, *eBiasQ1, *eBiasQ2, *eBiasQ3;
+
+ if (end == Iend_LE) {
+ offQ0 = 0; offQ1 = 8; offQ2 = 16; offQ3 = 24;
+ } else {
+ offQ3 = 0; offQ2 = 8; offQ1 = 16; offQ0 = 24;
+ }
+
+ eBiasQ0 = tyAddr==Ity_I32 ? mkU32(bias+offQ0) : mkU64(bias+offQ0);
+ addrQ0 = assignNew('V', mce, tyAddr, binop(mkAdd, addr, eBiasQ0) );
+ vdataQ0 = assignNew('V', mce, Ity_I64, unop(Iop_V256to64_0, vdata));
+ diQ0 = unsafeIRDirty_0_N(
+ 1/*regparms*/,
+ hname, VG_(fnptr_to_fnentry)( helper ),
+ mkIRExprVec_2( addrQ0, vdataQ0 )
+ );
+
+ eBiasQ1 = tyAddr==Ity_I32 ? mkU32(bias+offQ1) : mkU64(bias+offQ1);
+ addrQ1 = assignNew('V', mce, tyAddr, binop(mkAdd, addr, eBiasQ1) );
+ vdataQ1 = assignNew('V', mce, Ity_I64, unop(Iop_V256to64_1, vdata));
+ diQ1 = unsafeIRDirty_0_N(
+ 1/*regparms*/,
+ hname, VG_(fnptr_to_fnentry)( helper ),
+ mkIRExprVec_2( addrQ1, vdataQ1 )
+ );
+
+ eBiasQ2 = tyAddr==Ity_I32 ? mkU32(bias+offQ2) : mkU64(bias+offQ2);
+ addrQ2 = assignNew('V', mce, tyAddr, binop(mkAdd, addr, eBiasQ2) );
+ vdataQ2 = assignNew('V', mce, Ity_I64, unop(Iop_V256to64_2, vdata));
+ diQ2 = unsafeIRDirty_0_N(
+ 1/*regparms*/,
+ hname, VG_(fnptr_to_fnentry)( helper ),
+ mkIRExprVec_2( addrQ2, vdataQ2 )
+ );
+
+ eBiasQ3 = tyAddr==Ity_I32 ? mkU32(bias+offQ3) : mkU64(bias+offQ3);
+ addrQ3 = assignNew('V', mce, tyAddr, binop(mkAdd, addr, eBiasQ3) );
+ vdataQ3 = assignNew('V', mce, Ity_I64, unop(Iop_V256to64_3, vdata));
+ diQ3 = unsafeIRDirty_0_N(
+ 1/*regparms*/,
+ hname, VG_(fnptr_to_fnentry)( helper ),
+ mkIRExprVec_2( addrQ3, vdataQ3 )
+ );
+
+ if (guard)
+ diQ0->guard = diQ1->guard = diQ2->guard = diQ3->guard = guard;
+
+ setHelperAnns( mce, diQ0 );
+ setHelperAnns( mce, diQ1 );
+ setHelperAnns( mce, diQ2 );
+ setHelperAnns( mce, diQ3 );
+ stmt( 'V', mce, IRStmt_Dirty(diQ0) );
+ stmt( 'V', mce, IRStmt_Dirty(diQ1) );
+ stmt( 'V', mce, IRStmt_Dirty(diQ2) );
+ stmt( 'V', mce, IRStmt_Dirty(diQ3) );
+
+ }
+ else if (UNLIKELY(ty == Ity_V128)) {
+
/* V128-bit case */
/* See comment in next clause re 64-bit regparms */
/* also, need to be careful about endianness */
@@ -5449,6 +5540,9 @@
case 16: hFun = (void*)&MC_(helperc_b_load16);
hName = "MC_(helperc_b_load16)";
break;
+ case 32: hFun = (void*)&MC_(helperc_b_load32);
+ hName = "MC_(helperc_b_load32)";
+ break;
default:
VG_(printf)("mc_translate.c: gen_load_b: unhandled szB == %d\n", szB);
tl_assert(0);
@@ -5511,6 +5605,9 @@
case 16: hFun = (void*)&MC_(helperc_b_store16);
hName = "MC_(helperc_b_store16)";
break;
+ case 32: hFun = (void*)&MC_(helperc_b_store32);
+ hName = "MC_(helperc_b_store32)";
+ break;
default:
tl_assert(0);
}
Modified: trunk/memcheck/mc_include.h (+2 -0)
===================================================================
--- trunk/memcheck/mc_include.h 2012-05-18 17:48:20 +01:00 (rev 12568)
+++ trunk/memcheck/mc_include.h 2012-05-21 11:18:10 +01:00 (rev 12569)
@@ -549,11 +549,13 @@
VG_REGPARM(2) void MC_(helperc_b_store4) ( Addr a, UWord d32 );
VG_REGPARM(2) void MC_(helperc_b_store8) ( Addr a, UWord d32 );
VG_REGPARM(2) void MC_(helperc_b_store16)( Addr a, UWord d32 );
+VG_REGPARM(2) void MC_(helperc_b_store32)( Addr a, UWord d32 );
VG_REGPARM(1) UWord MC_(helperc_b_load1) ( Addr a );
VG_REGPARM(1) UWord MC_(helperc_b_load2) ( Addr a );
VG_REGPARM(1) UWord MC_(helperc_b_load4) ( Addr a );
VG_REGPARM(1) UWord MC_(helperc_b_load8) ( Addr a );
VG_REGPARM(1) UWord MC_(helperc_b_load16)( Addr a );
+VG_REGPARM(1) UWord MC_(helperc_b_load32)( Addr a );
/* Functions defined in mc_translate.c */
IRSB* MC_(instrument) ( VgCallbackClosure* closure,
Added: trunk/none/tests/amd64/avx-1.c (+344 -0)
===================================================================
--- trunk/none/tests/amd64/avx-1.c 2012-05-18 17:48:20 +01:00 (rev 12568)
+++ trunk/none/tests/amd64/avx-1.c 2012-05-21 11:18:10 +01:00 (rev 12569)
@@ -0,0 +1,344 @@
+ /* VMOVSD m64, xmm1 = VEX.LIG.F2.0F.WIG 10 /r */
+ /* VMOVSS m32, xmm1 = VEX.LIG.F3.0F.WIG 10 /r */
+ /* VMOVSD xmm1, m64 = VEX.LIG.F2.0F.WIG 11 /r */
+ /* VMOVSS xmm1, m64 = VEX.LIG.F3.0F.WIG 11 /r */
+ /* VMOVUPD xmm1, xmm2/m128 = VEX.128.66.0F.WIG 11 /r */
+ /* VMOVAPD xmm2/m128, xmm1 = VEX.128.66.0F.WIG 28 /r */
+ /* VMOVAPD ymm2/m256, ymm1 = VEX.256.66.0F.WIG 28 /r */
+ /* VMOVAPS xmm2/m128, xmm1 = VEX.128.0F.WIG 28 /r */
+ /* VMOVAPS xmm1, xmm2/m128 = VEX.128.0F.WIG 29 /r */
+ /* VMOVAPD xmm1, xmm2/m128 = VEX.128.66.0F.WIG 29 /r */
+
+/* . VCVTSI2SD r/m32, xmm2, xmm1 = VEX.NDS.LIG.F2.0F.W0 2A /r */
+/* . VCVTSI2SD r/m64, xmm2, xmm1 = VEX.NDS.LIG.F2.0F.W1 2A /r */
+/* . VCVTSI2SS r/m64, xmm2, xmm1 = VEX.NDS.LIG.F3.0F.W1 2A /r */
+/* . VCVTTSD2SI xmm1/m64, r32 = VEX.LIG.F2.0F.W0 2C /r */
+/* VCVTTSD2SI xmm1/m64, r64 = VEX.LIG.F2.0F.W1 2C /r */
+/* VUCOMISD xmm2/m64, xmm1 = VEX.LIG.66.0F.WIG 2E /r */
+/* VUCOMISS xmm2/m32, xmm1 = VEX.LIG.0F.WIG 2E /r */
+/* . VSQRTSD xmm3/m64(E), xmm2(V), xmm1(G) = VEX.NDS.LIG.F2.0F.WIG 51 /r */
+/* VANDPD r/m, rV, r ::: r = rV & r/m (MVR format) */
+/* VANDNPD r/m, rV, r ::: r = (not rV) & r/m (MVR format) */
+/* VORPD r/m, rV, r ::: r = rV ^ r/m (MVR format) */
+/* VXORPD r/m, rV, r ::: r = rV ^ r/m (MVR format) */
+/* VXORPS r/m, rV, r ::: r = rV ^ r/m (MVR format) */
+/* VADDSD xmm3/m64, xmm2, xmm1 = VEX.NDS.LIG.F0.0F.WIG 58 /r */
+/* VMULSD xmm3/m64, xmm2, xmm1 = VEX.NDS.LIG.F0.0F.WIG 59 /r */
+/* VCVTPS2PD xmm2/m64, xmm1 = VEX.128.0F.WIG 5A /r */
+/* VSUBSD xmm3/m64, xmm2, xmm1 = VEX.NDS.LIG.F2.0F.WIG 5C /r */
+/* VMINSD xmm3/m64, xmm2, xmm1 = VEX.NDS.LIG.F2.0F.WIG 5D /r */
+/* VDIVSD xmm3/m64, xmm2, xmm1 = VEX.NDS.LIG.F2.0F.WIG 5E /r */
+/* VMAXSD xmm3/m64, xmm2, xmm1 = VEX.NDS.LIG.F2.0F.WIG 5F /r */
+
+ /* VMOVD r32/m32, xmm1 = VEX.128.66.0F.W0 6E */
+ /* VMOVDQA ymm2/m256, ymm1 = VEX.256.66.0F.WIG 6F */
+ /* VMOVDQA xmm2/m128, xmm1 = VEX.128.66.0F.WIG 6F */
+ /* VMOVDQU xmm2/m128, xmm1 = VEX.128.F3.0F.WIG 6F */
+
+/* VPSHUFD imm8, xmm2/m128, xmm1 = VEX.128.66.0F.WIG 70 /r ib */
+/* VPSLLD imm8, xmm2, xmm1 = VEX.128.66.0F.WIG 72 /6 ib */
+/* VPSRLDQ VEX.NDD.128.66.0F.WIG 73 /3 ib */
+/* VPCMPEQD r/m, rV, r ::: r = rV `eq-by-32s` r/m (MVR format) */
+
+ /* VMOVDQA ymm1, ymm2/m256 = VEX.256.66.0F.WIG 7F */
+ /* VMOVDQA xmm1, xmm2/m128 = VEX.128.66.0F.WIG 7F */
+ /* VMOVDQU xmm1, xmm2/m128 = VEX.128.F3.0F.WIG 7F */
+
+/* . VCMPSD xmm3/m64(E=argL), xmm2(V=argR), xmm1(G) */
+/* . VPOR = VEX.NDS.128.66.0F.WIG EB /r */
+/* . VPXOR = VEX.NDS.128.66.0F.WIG EF /r */
+/* . VPSUBB = VEX.NDS.128.66.0F.WIG EF /r */
+/* . VPSUBD = VEX.NDS.128.66.0F.WIG FE /r */
+/* . VPADDD = VEX.NDS.128.66.0F.WIG FE /r */
+/* . VPSHUFB r/m, rV, r ::: r = shuf(rV, r/m) (MVR format) */
+/* . VPMOVZXBW = VEX.128.66.0F38.WIG 30 /r */
+/* . VPMOVZXWD = VEX.128.66.0F38.WIG 33 /r */
+/* . VPMINSD = VEX.NDS.128.66.0F38.WIG 39 /r */
+/* . VPMAXSD = VEX.NDS.128.66.0F38.WIG 3D /r */
+ /* VPEXTRD imm8, r32/m32, xmm2 */
+ /* VINSERTF128 r/m, rV, rD */
+ /* VEXTRACTF128 rS, r/m */
+
+/* . VPBLENDVB xmmG, xmmE/memE, xmmV, xmmIS4 */
+ /* VEX.128.F2.0F.WIG /12 r = MOVDDUP xmm2/m64, xmm1 */
+ /* VCVTPD2PS xmm2/m128, xmm1 = VEX.128.66.0F.WIG 5A /r */
+/* . VMULSS xmm3/m32, xmm2, xmm1 = VEX.NDS.LIG.F3.0F.WIG 59 /r */
+/* . VSUBSS xmm3/m32, xmm2, xmm1 = VEX.NDS.LIG.F3.0F.WIG 5C /r */
+/* . VADDSS xmm3/m32, xmm2, xmm1 = VEX.NDS.LIG.F3.0F.WIG 58 /r */
+/* . VDIVSS xmm3/m32, xmm2, xmm1 = VEX.NDS.LIG.F3.0F.WIG 5E /r */
+/* . VUNPCKLPS xmm3/m128, xmm2, xmm1 = VEX.NDS.128.0F.WIG 14 /r */
+/* . VCVTSI2SS r/m32, xmm2, xmm1 = VEX.NDS.LIG.F3.0F.W0 2A /r */
+/* . VANDPS = VEX.NDS.128.0F.WIG 54 /r */
+/* . VMINSS xmm3/m32, xmm2, xmm1 = VEX.NDS.LIG.F3.0F.WIG 5D /r */
+/* . VMAXSS xmm3/m32, xmm2, xmm1 = VEX.NDS.LIG.F3.0F.WIG 5F /r */
+
+/* really needs testing -- Intel docs don't make sense */
+/* VMOVQ xmm2/m64, xmm1 = VEX.128.F3.0F.W0 */
+
+/* really needs testing -- Intel docs don't make sense */
+/* of the form vmovq %xmm0,-0x8(%rsp) */
+
+/* VCMPSS xmm3/m32(E=argL), xmm2(V=argR), xmm1(G) */
+/* . VANDNPS = VEX.NDS.128.0F.WIG 55 /r */
+/* . VORPS = VEX.NDS.128.0F.WIG 56 /r */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <malloc.h>
+
+typedef unsigned char UChar;
+typedef unsigned int UInt;
+typedef unsigned long int UWord;
+typedef unsigned long long int ULong;
+
+#define IS_32_ALIGNED(_ptr) (0 == (0x1F & (UWord)(_ptr)))
+
+typedef union { UChar u8[32]; UInt u32[8]; } YMM;
+
+typedef struct { YMM a1; YMM a2; YMM a3; YMM a4; ULong u64; } Block;
+
+void showYMM ( YMM* vec )
+{
+ int i;
+ assert(IS_32_ALIGNED(vec));
+ for (i = 31; i >= 0; i--) {
+ printf("%02x", (UInt)vec->u8[i]);
+ if (i > 0 && 0 == ((i+0) & 7)) printf(".");
+ }
+}
+
+void showBlock ( char* msg, Block* block )
+{
+ printf(" %s\n", msg);
+ printf(" "); showYMM(&block->a1); printf("\n");
+ printf(" "); showYMM(&block->a2); printf("\n");
+ printf(" "); showYMM(&block->a3); printf("\n");
+ printf(" "); showYMM(&block->a4); printf("\n");
+ printf(" %016llx\n", block->u64);
+}
+
+UChar randUChar ( void )
+{
+ static UInt seed = 80021;
+ seed = 1103515245 * seed + 12345;
+ return (seed >> 17) & 0xFF;
+}
+
+void randBlock ( Block* b )
+{
+ int i;
+ UChar* p = (UChar*)b;
+ for (i = 0; i < sizeof(Block); i++)
+ p[i] = randUChar();
+}
+
+
+/* Generate a function test_NAME, that tests the given insn, in both
+ its mem and reg forms. The reg form of the insn may mention, as
+ operands only %ymm6, %ymm7, %ymm8, %ymm9 and %r14. The mem form of
+ the insn may mention as operands only (%rax), %ymm7, %ymm8, %ymm9
+ and %r14. */
+
+#define GEN_test_RandM(_name, _reg_form, _mem_form) \
+ \
+ static void test_##_name ( void ) \
+ { \
+ Block* b = memalign(32, sizeof(Block)); \
+ randBlock(b); \
+ printf("%s(reg)\n", #_name); \
+ showBlock("before", b); \
+ __asm__ __volatile__( \
+ "vmovdqa 0(%0),%%ymm7" "\n\t" \
+ "vmovdqa 32(%0),%%ymm8" "\n\t" \
+ "vmovdqa 64(%0),%%ymm6" "\n\t" \
+ "vmovdqa 96(%0),%%ymm9" "\n\t" \
+ "movq 128(%0),%%r14" "\n\t" \
+ _reg_form "\n\t" \
+ "vmovdqa %%ymm7, 0(%0)" "\n\t" \
+ "vmovdqa %%ymm8, 32(%0)" "\n\t" \
+ "vmovdqa %%ymm6, 64(%0)" "\n\t" \
+ "vmovdqa %%ymm9, 96(%0)" "\n\t" \
+ "movq %%r14, 128(%0)" "\n\t" \
+ : /*OUT*/ \
+ : /*IN*/"r"(b) \
+ : /*TRASH*/"xmm7","xmm8","xmm6","xmm9","r14","memory","cc" \
+ ); \
+ showBlock("after", b); \
+ randBlock(b); \
+ printf("%s(mem)\n", #_name); \
+ showBlock("before", b); \
+ __asm__ __volatile__( \
+ "leaq 0(%0),%%rax" "\n\t" \
+ "vmovdqa 32(%0),%%ymm8" "\n\t" \
+ "vmovdqa 64(%0),%%ymm7" "\n\t" \
+ "vmovdqa 96(%0),%%ymm9" "\n\t" \
+ "movq 128(%0),%%r14" "\n\t" \
+ _mem_form "\n\t" \
+ "vmovdqa %%ymm8, 32(%0)" "\n\t" \
+ "vmovdqa %%ymm7, 64(%0)" "\n\t" \
+ "vmovdqa %%ymm9, 96(%0)" "\n\t" \
+ "movq %%r14, 128(%0)" "\n\t" \
+ : /*OUT*/ \
+ : /*IN*/"r"(b) \
+ : /*TRASH*/"xmm8","xmm7","xmm9","r14","rax","memory","cc" \
+ ); \
+ showBlock("after", b); \
+ printf("\n"); \
+ free(b); \
+ }
+
+GEN_test_RandM(VPOR_128,
+ "vpor %%xmm6, %%xmm8, %%xmm7",
+ "vpor (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VPXOR_128,
+ "vpxor %%xmm6, %%xmm8, %%xmm7",
+ "vpxor (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VPSUBB_128,
+ "vpsubb %%xmm6, %%xmm8, %%xmm7",
+ "vpsubb (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VPSUBD_128,
+ "vpsubd %%xmm6, %%xmm8, %%xmm7",
+ "vpsubd (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VPADDD_128,
+ "vpaddd %%xmm6, %%xmm8, %%xmm7",
+ "vpaddd (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VPMOVZXWD_128,
+ "vpmovzxwd %%xmm6, %%xmm8",
+ "vpmovzxwd (%%rax), %%xmm8")
+
+GEN_test_RandM(VPMOVZXBW_128,
+ "vpmovzxbw %%xmm6, %%xmm8",
+ "vpmovzxbw (%%rax), %%xmm8")
+
+GEN_test_RandM(VPBLENDVB_128,
+ "vpblendvb %%xmm9, %%xmm6, %%xmm8, %%xmm7",
+ "vpblendvb %%xmm9, (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VPMINSD_128,
+ "vpminsd %%xmm6, %%xmm8, %%xmm7",
+ "vpminsd (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VPMAXSD_128,
+ "vpmaxsd %%xmm6, %%xmm8, %%xmm7",
+ "vpmaxsd (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VANDPD_128,
+ "vandpd %%xmm6, %%xmm8, %%xmm7",
+ "vandpd (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VCVTSI2SD_32,
+ "vcvtsi2sdl %%r14d, %%xmm8, %%xmm7",
+ "vcvtsi2sdl (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VCVTSI2SD_64,
+ "vcvtsi2sdq %%r14, %%xmm8, %%xmm7",
+ "vcvtsi2sdq (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VCVTSI2SS_64,
+ "vcvtsi2ssq %%r14, %%xmm8, %%xmm7",
+ "vcvtsi2ssq (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VCVTTSD2SI_32,
+ "vcvttsd2si %%xmm8, %%r14d",
+ "vcvttsd2si (%%rax), %%r14d")
+
+GEN_test_RandM(VPSHUFB_128,
+ "vpshufb %%xmm6, %%xmm8, %%xmm7",
+ "vpshufb (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VCMPSD_128_0x0,
+ "vcmpsd $0, %%xmm6, %%xmm8, %%xmm7",
+ "vcmpsd $0, (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VCMPSD_128_0xD,
+ "vcmpsd $0xd, %%xmm6, %%xmm8, %%xmm7",
+ "vcmpsd $0xd, (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VSQRTSD_128,
+ "vsqrtsd %%xmm6, %%xmm8, %%xmm7",
+ "vsqrtsd (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VORPS_128,
+ "vorps %%xmm6, %%xmm8, %%xmm7",
+ "vorps (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VANDNPS_128,
+ "vandnps %%xmm6, %%xmm8, %%xmm7",
+ "vandnps (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VMAXSS_128,
+ "vmaxss %%xmm6, %%xmm8, %%xmm7",
+ "vmaxss (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VMINSS_128,
+ "vminss %%xmm6, %%xmm8, %%xmm7",
+ "vminss (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VANDPS_128,
+ "vandps %%xmm6, %%xmm8, %%xmm7",
+ "vandps (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VCVTSI2SS_128,
+ "vcvtsi2ssl %%r14d, %%xmm8, %%xmm7",
+ "vcvtsi2ssl (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VUNPCKLPS_128,
+ "vunpcklps %%xmm6, %%xmm8, %%xmm7",
+ "vunpcklps (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VDIVSS_128,
+ "vdivss %%xmm6, %%xmm8, %%xmm7",
+ "vdivss (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VADDSS_128,
+ "vaddss %%xmm6, %%xmm8, %%xmm7",
+ "vaddss (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VSUBSS_128,
+ "vsubss %%xmm6, %%xmm8, %%xmm7",
+ "vsubss (%%rax), %%xmm8, %%xmm7")
+
+GEN_test_RandM(VMULSS_128,
+ "vmulss %%xmm6, %%xmm8, %%xmm7",
+ "vmulss (%%rax), %%xmm8, %%xmm7")
+
+int main ( void )
+{
+ test_VMULSS_128();
+ test_VSUBSS_128();
+ test_VADDSS_128();
+ test_VDIVSS_128();
+ test_VUNPCKLPS_128();
+ test_VCVTSI2SS_128();
+ test_VANDPS_128();
+ test_VMINSS_128();
+ test_VMAXSS_128();
+ test_VANDNPS_128();
+ test_VORPS_128();
+ test_VSQRTSD_128();
+ // test_VCMPSD_128_0xD(); BORKED
+ test_VCMPSD_128_0x0();
+ test_VPSHUFB_128();
+ test_VCVTTSD2SI_32();
+ test_VCVTSI2SS_64();
+ test_VCVTSI2SD_64();
+ test_VCVTSI2SD_32();
+ test_VPOR_128();
+ test_VPXOR_128();
+ test_VPSUBB_128();
+ test_VPSUBD_128();
+ test_VPADDD_128();
+ test_VPMOVZXBW_128();
+ test_VPMOVZXWD_128();
+ test_VPBLENDVB_128();
+ test_VPMINSD_128();
+ test_VPMAXSD_128();
+ test_VANDPD_128();
+ return 0;
+}
Added: trunk/docs/internals/avx-notes.txt (+28 -0)
===================================================================
--- trunk/docs/internals/avx-notes.txt 2012-05-18 17:48:20 +01:00 (rev 12568)
+++ trunk/docs/internals/avx-notes.txt 2012-05-21 11:18:10 +01:00 (rev 12569)
@@ -0,0 +1,28 @@
+
+Cleanups
+~~~~~~~~
+
+FXSAVE/FXRSTOR: can no longer say (w.r.t the guest state
+effects declaration) that the SSE regs are written/read
+in one single block. Instead need to make a declaration
+for each bottom-half independently :-(
+
+in fact, re-check everything that assumes the XMM regs form
+an array, because they no longer do. Done: PCMPISTRI et al,
+Also AESENC et al.
+
+* guest state alignment, all targets -- will probably fail now
+
+* FXSAVE/FXRSTOR on amd64, as noted above
+
+* tools other than memcheck -- now fail w/ AVX insns
+
+* remove regclass HRc256
+
+* disable Avx insns in backend (or rm this code, will we
+ ever need it?)
+
+* change amd64 getAllocableRegs back to what it was originally
+ [DONE]
+
+* fix up none/tests/amd64/avx-1.c
Modified: trunk/coregrind/m_scheduler/scheduler.c (+13 -13)
===================================================================
--- trunk/coregrind/m_scheduler/scheduler.c 2012-05-18 17:48:20 +01:00 (rev 12568)
+++ trunk/coregrind/m_scheduler/scheduler.c 2012-05-21 11:18:10 +01:00 (rev 12569)
@@ -699,15 +699,15 @@
(void*)a_vexsh2, sz_vexsh2,
(void*)a_spill, sz_spill );
- vg_assert(VG_IS_16_ALIGNED(sz_vex));
- vg_assert(VG_IS_16_ALIGNED(sz_vexsh1));
- vg_assert(VG_IS_16_ALIGNED(sz_vexsh2));
- vg_assert(VG_IS_16_ALIGNED(sz_spill));
+ vg_assert(VG_IS_32_ALIGNED(sz_vex));
+ vg_assert(VG_IS_32_ALIGNED(sz_vexsh1));
+ vg_assert(VG_IS_32_ALIGNED(sz_vexsh2));
+ vg_assert(VG_IS_32_ALIGNED(sz_spill));
- vg_assert(VG_IS_16_ALIGNED(a_vex));
- vg_assert(VG_IS_16_ALIGNED(a_vexsh1));
- vg_assert(VG_IS_16_ALIGNED(a_vexsh2));
- vg_assert(VG_IS_16_ALIGNED(a_spill));
+ vg_assert(VG_IS_32_ALIGNED(a_vex));
+ vg_assert(VG_IS_32_ALIGNED(a_vexsh1));
+ vg_assert(VG_IS_32_ALIGNED(a_vexsh2));
+ vg_assert(VG_IS_32_ALIGNED(a_spill));
/* Check that the guest state and its two shadows have the same
size, and that there are no holes in between. The latter is
@@ -739,14 +739,14 @@
# endif
# if defined(VGA_amd64)
- /* amd64 XMM regs must form an array, ie, have no holes in
+ /* amd64 YMM regs must form an array, ie, have no holes in
between. */
vg_assert(
- (offsetof(VexGuestAMD64State,guest_XMM16)
- - offsetof(VexGuestAMD64State,guest_XMM0))
- == (17/*#regs*/-1) * 16/*bytes per reg*/
+ (offsetof(VexGuestAMD64State,guest_YMM16)
+ - offsetof(VexGuestAMD64State,guest_YMM0))
+ == (17/*#regs*/-1) * 32/*bytes per reg*/
);
- vg_assert(VG_IS_16_ALIGNED(offsetof(VexGuestAMD64State,guest_XMM0)));
+ vg_assert(VG_IS_32_ALIGNED(offsetof(VexGuestAMD64State,guest_YMM0)));
vg_assert(VG_IS_8_ALIGNED(offsetof(VexGuestAMD64State,guest_FPREG)));
vg_assert(16 == offsetof(VexGuestAMD64State,guest_RAX));
vg_assert(VG_IS_8_ALIGNED(offsetof(VexGuestAMD64State,guest_RAX)));
Modified: trunk/docs/Makefile.am (+1 -0)
===================================================================
--- trunk/docs/Makefile.am 2012-05-18 17:48:20 +01:00 (rev 12568)
+++ trunk/docs/Makefile.am 2012-05-21 11:18:10 +01:00 (rev 12569)
@@ -26,6 +26,7 @@
internals/3_4_BUGSTATUS.txt \
internals/3_5_BUGSTATUS.txt \
internals/arm_thumb_notes_gdbserver.txt \
+ internals/avx-notes.txt \
internals/BIG_APP_NOTES.txt \
internals/Darwin-notes.txt \
internals/SPEC-notes.txt \
Modified: trunk/coregrind/pub_core_threadstate.h (+5 -5)
===================================================================
--- trunk/coregrind/pub_core_threadstate.h 2012-05-18 17:48:20 +01:00 (rev 12568)
+++ trunk/coregrind/pub_core_threadstate.h 2012-05-21 11:18:10 +01:00 (rev 12569)
@@ -102,19 +102,19 @@
/* Note that for code generation reasons, we require that the
guest state area, its two shadows, and the spill area, are
- 16-aligned and have 16-aligned sizes, and there are no holes
+ 32-aligned and have 32-aligned sizes, and there are no holes
in between. This is checked by do_pre_run_checks() in
scheduler.c. */
/* Saved machine context. */
- VexGuestArchState vex __attribute__((aligned(16)));
+ VexGuestArchState vex __attribute__((aligned(32)));
/* Saved shadow context (2 copies). */
- VexGuestArchState vex_shadow1 __attribute__((aligned(16)));
- VexGuestArchState vex_shadow2 __attribute__((aligned(16)));
+ VexGuestArchState vex_shadow1 __attribute__((aligned(32)));
+ VexGuestArchState vex_shadow2 __attribute__((aligned(32)));
/* Spill area. */
- UChar vex_spill[LibVEX_N_SPILL_BYTES] __attribute__((aligned(16)));
+ UChar vex_spill[LibVEX_N_SPILL_BYTES] __attribute__((aligned(32)));
/* --- END vex-mandated guest state --- */
}
|