You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
1
|
2
|
3
|
4
|
|
5
|
6
|
7
(2) |
8
(16) |
9
|
10
|
11
(1) |
|
12
(19) |
13
(12) |
14
(15) |
15
(5) |
16
(15) |
17
(1) |
18
(15) |
|
19
|
20
(3) |
21
(1) |
22
(12) |
23
(4) |
24
(5) |
25
(15) |
|
26
(4) |
27
(12) |
28
|
29
(2) |
30
(13) |
31
(13) |
|
Author: sewardj
Date: Sun Jan 26 19:11:14 2014
New Revision: 2810
Log:
Improve front and back end support for SIMD instructions on Arm64.
Implement the following instructions -- some but not necessarily
all laneage combinations:
LD1 {vT.2d}, [Xn|SP]
ST1 {vT.2d}, [Xn|SP]
LD1 {vT.4s}, [Xn|SP]
ST1 {vT.4s}, [Xn|SP]
LD1 {vT.8h}, [Xn|SP]
ST1 {vT.8h}, [Xn|SP]
LD1 {vT.16b}, [Xn|SP]
ST1 {vT.16b}, [Xn|SP]
LD1 {vT.1d}, [Xn|SP]
ST1 {vT.1d}, [Xn|SP]
LD1 {vT.2s}, [Xn|SP]
ST1 {vT.2s}, [Xn|SP]
LD1 {vT.4h}, [Xn|SP]
ST1 {vT.4h}, [Xn|SP]
LD1 {vT.8b}, [Xn|SP]
ST1 {vT.8b}, [Xn|SP]
ST1 {vT.2d}, [xN|SP], #16
LD1 {vT.2d}, [xN|SP], #16
ST1 {vT.4s}, [xN|SP], #16
ST1 {vT.8h}, [xN|SP], #16
ST1 {vT.2s}, [xN|SP], #8
SCVTF Vd, Vn
UCVTF Vd, Vn
FADD Vd,Vn,Vm 1
FSUB Vd,Vn,Vm 2
FMUL Vd,Vn,Vm 3
FDIV Vd,Vn,Vm 4
FMLA Vd,Vn,Vm 5
FMLS Vd,Vn,Vm 6
ADD Vd.T, Vn.T, Vm.T
SUB Vd.T, Vn.T, Vm.T
XTN {,2}
DUP Vd.T, Vn.Ts[index]
Modified:
trunk/priv/guest_arm64_toIR.c
trunk/priv/host_arm64_defs.c
trunk/priv/host_arm64_defs.h
trunk/priv/host_arm64_isel.c
trunk/priv/ir_defs.c
trunk/pub/libvex_ir.h
Modified: trunk/priv/guest_arm64_toIR.c
==============================================================================
--- trunk/priv/guest_arm64_toIR.c (original)
+++ trunk/priv/guest_arm64_toIR.c Sun Jan 26 19:11:14 2014
@@ -896,7 +896,6 @@
}
}
-
/* Write to a complete Qreg. */
static void putQReg128 ( UInt qregNo, IRExpr* e )
{
@@ -929,54 +928,61 @@
}
}
-/* Find the offset of the szB'th least significant bytes of the given
- Qreg. This requires knowing the endianness of the host. */
-static Int offsetQReg ( UInt szB, UInt qregNo )
+/* Find the offset of the laneNo'th lane of type laneTy in the given
+ Qreg. Since the host is little-endian, the least significant lane
+ has the lowest offset. */
+static Int offsetQRegLane ( UInt qregNo, IRType laneTy, UInt laneNo )
{
vassert(!host_is_bigendian);
Int base = offsetQReg128(qregNo);
- /* Since we're dealing with a little-endian host, all of the
- sub-parts will have the same offset as the base register. But
- we still need to check that szB is valid. */
- switch (szB) {
- case 1: case 2: case 4: case 8: case 16: break;
- default: vassert(0);
- }
- return base;
+ /* Since the host is little-endian, the least significant lane
+ will be at the lowest address. */
+ /* Restrict this to known types, so as to avoid silently accepting
+ stupid types. */
+ UInt laneSzB = 0;
+ switch (laneTy) {
+ case Ity_F32: case Ity_I32: laneSzB = 4; break;
+ case Ity_F64: case Ity_I64: laneSzB = 8; break;
+ case Ity_V128: laneSzB = 16; break;
+ default: break;
+ }
+ vassert(laneSzB > 0);
+ UInt minOff = laneNo * laneSzB;
+ UInt maxOff = minOff + laneSzB - 1;
+ vassert(maxOff < 16);
+ return base + minOff;
}
-static void putQReg ( UInt qregNo, IRExpr* e )
+/* Put to the least significant lane of a Qreg. */
+static void putQRegLO ( UInt qregNo, IRExpr* e )
{
IRType ty = typeOfIRExpr(irsb->tyenv, e);
- Int off = offsetQReg(sizeofIRType(ty), qregNo);
+ Int off = offsetQRegLane(qregNo, ty, 0);
switch (ty) {
- case Ity_I8: break;
- case Ity_I16: break;
- case Ity_I32: break;
- case Ity_F32: break;
- case Ity_I64: break;
- case Ity_F64: break;
- case Ity_V128: break;
- default: vassert(0); // Other cases are ATC
+ case Ity_I8: case Ity_I16: case Ity_I32: case Ity_I64:
+ case Ity_F32: case Ity_F64: case Ity_V128:
+ break;
+ default:
+ vassert(0); // Other cases are probably invalid
}
stmt(IRStmt_Put(off, e));
}
-static IRExpr* getQReg ( IRType ty, UInt qregNo )
+/* Get from the least significant lane of a Qreg. */
+static IRExpr* getQRegLO ( UInt qregNo, IRType ty )
{
- Int off = offsetQReg(sizeofIRType(ty), qregNo);
+ Int off = offsetQRegLane(qregNo, ty, 0);
switch (ty) {
- case Ity_I32: break;
- case Ity_F32: break;
- case Ity_I64: break;
- case Ity_F64: break;
- case Ity_V128: break;
- default: vassert(0); // Other cases are ATC
+ case Ity_I32: case Ity_I64:
+ case Ity_F32: case Ity_F64: case Ity_V128:
+ break;
+ default:
+ vassert(0); // Other cases are ATC
}
return IRExpr_Get(off, ty);
}
-static const HChar* nameQReg ( UInt szB, UInt qregNo )
+static const HChar* nameQRegLO ( UInt qregNo, IRType laneTy )
{
static const HChar* namesQ[32]
= { "q0", "q1", "q2", "q3", "q4", "q5", "q6", "q7",
@@ -1004,7 +1010,7 @@
"b16", "b17", "b18", "b19", "b20", "b21", "b22", "b23",
"b24", "b25", "b26", "b27", "b28", "b29", "b30", "b31" };
vassert(qregNo < 32);
- switch (szB) {
+ switch (sizeofIRType(laneTy)) {
case 1: return namesB[qregNo];
case 2: return namesH[qregNo];
case 4: return namesS[qregNo];
@@ -1015,34 +1021,64 @@
/*NOTREACHED*/
}
+static const HChar* nameQReg128 ( UInt qregNo )
+{
+ return nameQRegLO(qregNo, Ity_V128);
+}
+
/* Find the offset of the most significant half (8 bytes) of the given
Qreg. This requires knowing the endianness of the host. */
-static Int offsetQReg64HI ( UInt qregNo )
+static Int offsetQRegHI64 ( UInt qregNo )
{
- vassert(!host_is_bigendian);
- Int base = offsetQReg128(qregNo);
- /* Since the host is little endian, the least significant half is
- at the lower offset. So add 8 to get the MS half offset. */
- return base+8;
+ return offsetQRegLane(qregNo, Ity_I64, 1);
}
-static IRExpr* getQReg64HI ( UInt qregNo )
+static IRExpr* getQRegHI64 ( UInt qregNo )
{
- return IRExpr_Get(offsetQReg64HI(qregNo), Ity_I64);
+ return IRExpr_Get(offsetQRegHI64(qregNo), Ity_I64);
}
-static void putQReg64HI ( UInt qregNo, IRExpr* e )
+static void putQRegHI64 ( UInt qregNo, IRExpr* e )
{
IRType ty = typeOfIRExpr(irsb->tyenv, e);
- Int off = offsetQReg64HI(qregNo);
+ Int off = offsetQRegHI64(qregNo);
switch (ty) {
- case Ity_I64: break;
- case Ity_F64: break;
- default: vassert(0); // Other cases are plain wrong
+ case Ity_I64: case Ity_F64:
+ break;
+ default:
+ vassert(0); // Other cases are plain wrong
}
stmt(IRStmt_Put(off, e));
}
+/* Put to a specified lane of a Qreg. */
+static void putQRegLane ( UInt qregNo, UInt laneNo, IRExpr* e )
+{
+ IRType laneTy = typeOfIRExpr(irsb->tyenv, e);
+ Int off = offsetQRegLane(qregNo, laneTy, laneNo);
+ switch (laneTy) {
+ case Ity_F64: case Ity_I64:
+ break;
+ default:
+ vassert(0); // Other cases are ATC
+ }
+ stmt(IRStmt_Put(off, e));
+}
+
+/* Get from the least significant lane of a Qreg. */
+static IRExpr* getQRegLane ( UInt qregNo, UInt laneNo, IRType laneTy )
+{
+ Int off = offsetQRegLane(qregNo, laneTy, laneNo);
+ switch (laneTy) {
+ case Ity_I64: case Ity_I32:
+ break;
+ default:
+ vassert(0); // Other cases are ATC
+ }
+ return IRExpr_Get(off, laneTy);
+}
+
+
//ZZ /* ---------------- Misc registers ---------------- */
//ZZ
//ZZ static void putMiscReg32 ( UInt gsoffset,
@@ -1533,6 +1569,45 @@
}
+/* Duplicates the bits at the bottom of the given word to fill the
+ whole word. src :: Ity_I64 is assumed to have zeroes everywhere
+ except for the bottom bits. */
+static IRTemp math_DUP_TO_64 ( IRTemp src, IRType srcTy )
+{
+ if (srcTy == Ity_I8) {
+ IRTemp t16 = newTemp(Ity_I64);
+ assign(t16, binop(Iop_Or64, mkexpr(src),
+ binop(Iop_Shl64, mkexpr(src), mkU8(8))));
+ IRTemp t32 = newTemp(Ity_I64);
+ assign(t32, binop(Iop_Or64, mkexpr(t16),
+ binop(Iop_Shl64, mkexpr(t16), mkU8(16))));
+ IRTemp t64 = newTemp(Ity_I64);
+ assign(t64, binop(Iop_Or64, mkexpr(t32),
+ binop(Iop_Shl64, mkexpr(t32), mkU8(32))));
+ return t64;
+ }
+ if (srcTy == Ity_I16) {
+ IRTemp t32 = newTemp(Ity_I64);
+ assign(t32, binop(Iop_Or64, mkexpr(src),
+ binop(Iop_Shl64, mkexpr(src), mkU8(16))));
+ IRTemp t64 = newTemp(Ity_I64);
+ assign(t64, binop(Iop_Or64, mkexpr(t32),
+ binop(Iop_Shl64, mkexpr(t32), mkU8(32))));
+ return t64;
+ }
+ if (srcTy == Ity_I32) {
+ IRTemp t64 = newTemp(Ity_I64);
+ assign(t64, binop(Iop_Or64, mkexpr(src),
+ binop(Iop_Shl64, mkexpr(src), mkU8(32))));
+ return t64;
+ }
+ if (srcTy == Ity_I64) {
+ return src;
+ }
+ vassert(0);
+}
+
+
/*------------------------------------------------------------*/
/*--- FP comparison helpers ---*/
/*------------------------------------------------------------*/
@@ -3535,15 +3610,15 @@
}
if (isLD) {
- putQReg(tt1,
- loadLE(ty, binop(Iop_Add64, mkexpr(tTA), mkU64(0))));
- putQReg(tt2,
- loadLE(ty, binop(Iop_Add64, mkexpr(tTA), mkU64(szB))));
+ putQRegLO(tt1,
+ loadLE(ty, binop(Iop_Add64, mkexpr(tTA), mkU64(0))));
+ putQRegLO(tt2,
+ loadLE(ty, binop(Iop_Add64, mkexpr(tTA), mkU64(szB))));
} else {
storeLE(binop(Iop_Add64, mkexpr(tTA), mkU64(0)),
- getQReg(ty, tt1));
+ getQRegLO(tt1, ty));
storeLE(binop(Iop_Add64, mkexpr(tTA), mkU64(szB)),
- getQReg(ty, tt2));
+ getQRegLO(tt2, ty));
}
if (wBack)
@@ -3564,7 +3639,7 @@
vassert(0);
}
DIP(fmt_str, isLD ? "ld" : "st",
- nameQReg(szB, tt1), nameQReg(szB, tt2),
+ nameQRegLO(tt1, ty), nameQRegLO(tt2, ty),
nameIReg64orSP(nn), simm7);
return True;
}
@@ -3598,43 +3673,43 @@
case 0: /* 8 bit */
if (isLD) {
putQReg128(tt, mkV128(0x0000));
- putQReg(tt, loadLE(Ity_I8, mkexpr(ea)));
- DIP("ldr %s, %s\n", nameQReg(1, tt), dis_buf);
+ putQRegLO(tt, loadLE(Ity_I8, mkexpr(ea)));
+ DIP("ldr %s, %s\n", nameQRegLO(tt, Ity_I8), dis_buf);
} else {
vassert(0); //ATC
- storeLE(mkexpr(ea), getQReg(Ity_I8, tt));
- DIP("str %s, %s\n", nameQReg(1, tt), dis_buf);
+ storeLE(mkexpr(ea), getQRegLO(tt, Ity_I8));
+ DIP("str %s, %s\n", nameQRegLO(tt, Ity_I8), dis_buf);
}
break;
case 1:
if (isLD) {
putQReg128(tt, mkV128(0x0000));
- putQReg(tt, loadLE(Ity_I16, mkexpr(ea)));
- DIP("ldr %s, %s\n", nameQReg(2, tt), dis_buf);
+ putQRegLO(tt, loadLE(Ity_I16, mkexpr(ea)));
+ DIP("ldr %s, %s\n", nameQRegLO(tt, Ity_I16), dis_buf);
} else {
vassert(0); //ATC
- storeLE(mkexpr(ea), getQReg(Ity_I16, tt));
- DIP("str %s, %s\n", nameQReg(2, tt), dis_buf);
+ storeLE(mkexpr(ea), getQRegLO(tt, Ity_I16));
+ DIP("str %s, %s\n", nameQRegLO(tt, Ity_I16), dis_buf);
}
break;
case 2: /* 32 bit */
if (isLD) {
putQReg128(tt, mkV128(0x0000));
- putQReg(tt, loadLE(Ity_I32, mkexpr(ea)));
- DIP("ldr %s, %s\n", nameQReg(4, tt), dis_buf);
+ putQRegLO(tt, loadLE(Ity_I32, mkexpr(ea)));
+ DIP("ldr %s, %s\n", nameQRegLO(tt, Ity_I32), dis_buf);
} else {
- storeLE(mkexpr(ea), getQReg(Ity_I32, tt));
- DIP("str %s, %s\n", nameQReg(4, tt), dis_buf);
+ storeLE(mkexpr(ea), getQRegLO(tt, Ity_I32));
+ DIP("str %s, %s\n", nameQRegLO(tt, Ity_I32), dis_buf);
}
break;
case 3: /* 64 bit */
if (isLD) {
putQReg128(tt, mkV128(0x0000));
- putQReg(tt, loadLE(Ity_I64, mkexpr(ea)));
- DIP("ldr %s, %s\n", nameQReg(8, tt), dis_buf);
+ putQRegLO(tt, loadLE(Ity_I64, mkexpr(ea)));
+ DIP("ldr %s, %s\n", nameQRegLO(tt, Ity_I64), dis_buf);
} else {
- storeLE(mkexpr(ea), getQReg(Ity_I64, tt));
- DIP("str %s, %s\n", nameQReg(8, tt), dis_buf);
+ storeLE(mkexpr(ea), getQRegLO(tt, Ity_I64));
+ DIP("str %s, %s\n", nameQRegLO(tt, Ity_I64), dis_buf);
}
break;
case 4: return False; //ATC
@@ -3727,13 +3802,13 @@
if (szLg2 < 4) {
putQReg128(tt, mkV128(0x0000));
}
- putQReg(tt, loadLE(ty, mkexpr(tEA)));
+ putQRegLO(tt, loadLE(ty, mkexpr(tEA)));
} else {
- storeLE(mkexpr(tEA), getQReg(ty, tt));
+ storeLE(mkexpr(tEA), getQRegLO(tt, ty));
}
DIP("%s %s, [%s, #%u]\n",
isLD ? "ldr" : "str",
- nameQReg(1 << szLg2, tt), nameIReg64orSP(nn), pimm12);
+ nameQRegLO(tt, ty), nameIReg64orSP(nn), pimm12);
return True;
}
@@ -3778,14 +3853,14 @@
if (szLg2 < 4) {
putQReg128(tt, mkV128(0x0000));
}
- putQReg(tt, loadLE(ty, mkexpr(tTA)));
+ putQRegLO(tt, loadLE(ty, mkexpr(tTA)));
} else {
- storeLE(mkexpr(tTA), getQReg(ty, tt));
+ storeLE(mkexpr(tTA), getQRegLO(tt, ty));
}
putIReg64orSP(nn, mkexpr(tEA));
DIP(atRN ? "%s %s, [%s], #%lld\n" : "%s %s, [%s, #%lld]!\n",
isLD ? "ldr" : "str",
- nameQReg(1 << szLg2, tt), nameIReg64orSP(nn), simm9);
+ nameQRegLO(tt, ty), nameIReg64orSP(nn), simm9);
return True;
}
@@ -3816,16 +3891,16 @@
IRType ty = preferredVectorSubTypeFromSize(1 << szLg2);
assign(tEA, binop(Iop_Add64, getIReg64orSP(nn), mkU64(simm9)));
if (isLD) {
- if (szLg2 < 4) {
- putQReg128(tt, mkV128(0x0000));
- }
- putQReg(tt, loadLE(ty, mkexpr(tEA)));
+ if (szLg2 < 4) {
+ putQReg128(tt, mkV128(0x0000));
+ }
+ putQRegLO(tt, loadLE(ty, mkexpr(tEA)));
} else {
- storeLE(mkexpr(tEA), getQReg(ty, tt));
+ storeLE(mkexpr(tEA), getQRegLO(tt, ty));
}
DIP("%s %s, [%s, #%lld]\n",
isLD ? "ldur" : "stur",
- nameQReg(1 << szLg2, tt), nameIReg64orSP(nn), (Long)simm9);
+ nameQRegLO(tt, ty), nameIReg64orSP(nn), (Long)simm9);
return True;
}
@@ -3841,49 +3916,98 @@
UInt tt = INSN(4,0);
ULong ea = guest_PC_curr_instr + sx_to_64(imm19 << 2, 21);
IRType ty = preferredVectorSubTypeFromSize(szB);
- putQReg(tt, loadLE(ty, mkU64(ea)));
- DIP("ldr %s, 0x%llx (literal)\n", nameQReg(szB, tt), ea);
+ putQReg128(tt, mkV128(0x0000));
+ putQRegLO(tt, loadLE(ty, mkU64(ea)));
+ DIP("ldr %s, 0x%llx (literal)\n", nameQRegLO(tt, ty), ea);
return True;
}
- /* FIXME Temporary hacks to get through ld.so FIXME */
-
- /* ------------------ ST1 variants ------------------ */
- /* st1 {vT.2d}, [<xN|SP>], #16.
- Note that #16 is implied and cannot be set to any
- other value.
- 0100 1100 1001 1111 0111 11 N T
- FIXME doesn't this assume that the host is little endian?
+ /* ---------- LD1/ST1 (single structure, no offset) ---------- */
+ /* 31 23
+ 0100 1100 0100 0000 0111 11 N T LD1 {vT.2d}, [Xn|SP]
+ 0100 1100 0000 0000 0111 11 N T ST1 {vT.2d}, [Xn|SP]
+ 0100 1100 0100 0000 0111 10 N T LD1 {vT.4s}, [Xn|SP]
+ 0100 1100 0000 0000 0111 10 N T ST1 {vT.4s}, [Xn|SP]
+ 0100 1100 0100 0000 0111 01 N T LD1 {vT.8h}, [Xn|SP]
+ 0100 1100 0000 0000 0111 01 N T ST1 {vT.8h}, [Xn|SP]
+ 0100 1100 0100 0000 0111 00 N T LD1 {vT.16b}, [Xn|SP]
+ 0100 1100 0000 0000 0111 00 N T ST1 {vT.16b}, [Xn|SP]
+ FIXME does this assume that the host is little endian?
*/
- if ((insn & 0xFFFFFC00) == 0x4C9F7C00) {
- UInt rN = INSN(9,5);
- UInt vT = INSN(4,0);
- IRTemp tEA = newTemp(Ity_I64);
+ if ( (insn & 0xFFFFF000) == 0x4C407000 // LD1 cases
+ || (insn & 0xFFFFF000) == 0x4C007000 // ST1 cases
+ ) {
+ Bool isLD = INSN(22,22) == 1;
+ UInt rN = INSN(9,5);
+ UInt vT = INSN(4,0);
+ IRTemp tEA = newTemp(Ity_I64);
+ const HChar* names[4] = { "2d", "4s", "8h", "16b" };
+ const HChar* name = names[INSN(11,10)];
assign(tEA, getIReg64orSP(rN));
if (rN == 31) { /* FIXME generate stack alignment check */ }
- storeLE(mkexpr(tEA), getQReg128(vT));
- putIReg64orSP(rN, binop(Iop_Add64, mkexpr(tEA), mkU64(16)));
- DIP("st1 {v%u.2d}, [%s], #16\n", vT, nameIReg64orSP(rN));
+ if (isLD) {
+ putQReg128(vT, loadLE(Ity_V128, mkexpr(tEA)));
+ } else {
+ storeLE(mkexpr(tEA), getQReg128(vT));
+ }
+ DIP("%s {v%u.%s}, [%s]\n", isLD ? "ld1" : "st1",
+ vT, name, nameIReg64orSP(rN));
return True;
}
- /* ------------------ LD1 variants ------------------ */
/* 31 23
- 0100 1100 0100 0000 0111 11 N T LD1 {vT.2d}, [Xn|SP]
- 0100 1100 0000 0000 0111 11 N T ST1 {vT.2d}, [Xn|SP]
- 0100 1100 0100 0000 0111 00 N T LD1 {vT.16b}, [Xn|SP]
- 0100 1100 0000 0000 0111 00 N T ST1 {vT.16b}, [Xn|SP]
- FIXME doesn't this assume that the host is little endian?
+ 0000 1100 0100 0000 0111 11 N T LD1 {vT.1d}, [Xn|SP]
+ 0000 1100 0000 0000 0111 11 N T ST1 {vT.1d}, [Xn|SP]
+ 0000 1100 0100 0000 0111 10 N T LD1 {vT.2s}, [Xn|SP]
+ 0000 1100 0000 0000 0111 10 N T ST1 {vT.2s}, [Xn|SP]
+ 0000 1100 0100 0000 0111 01 N T LD1 {vT.4h}, [Xn|SP]
+ 0000 1100 0000 0000 0111 01 N T ST1 {vT.4h}, [Xn|SP]
+ 0000 1100 0100 0000 0111 00 N T LD1 {vT.8b}, [Xn|SP]
+ 0000 1100 0000 0000 0111 00 N T ST1 {vT.8b}, [Xn|SP]
+ FIXME does this assume that the host is little endian?
*/
- if ( (insn & 0xFFFFFC00) == 0x4C407C00 // LD1 {vT.2d}, [Xn|SP]
- || (insn & 0xFFFFFC00) == 0x4C007C00 // ST1 {vT.2d}, [Xn|SP]
- || (insn & 0xFFFFFC00) == 0x4C407000 // LD1 {vT.16b}, [Xn|SP]
- || (insn & 0xFFFFFC00) == 0x4C007000 // ST1 {vT.16b}, [Xn|SP]
+ if ( (insn & 0xFFFFF000) == 0x0C407000 // LD1 cases
+ || (insn & 0xFFFFF000) == 0x0C007000 // ST1 cases
) {
Bool isLD = INSN(22,22) == 1;
UInt rN = INSN(9,5);
UInt vT = INSN(4,0);
IRTemp tEA = newTemp(Ity_I64);
+ const HChar* names[4] = { "1d", "2s", "4h", "8b" };
+ const HChar* name = names[INSN(11,10)];
+ assign(tEA, getIReg64orSP(rN));
+ if (rN == 31) { /* FIXME generate stack alignment check */ }
+ if (isLD) {
+ putQRegLane(vT, 0, loadLE(Ity_I64, mkexpr(tEA)));
+ putQRegLane(vT, 1, mkU64(0));
+ } else {
+ storeLE(mkexpr(tEA), getQRegLane(vT, 0, Ity_I64));
+ }
+ DIP("%s {v%u.%s}, [%s]\n", isLD ? "ld1" : "st1",
+ vT, name, nameIReg64orSP(rN));
+ return True;
+ }
+
+ /* ---------- LD1/ST1 (single structure, post index) ---------- */
+ /* 31 23
+ 0100 1100 1001 1111 0111 11 N T ST1 {vT.2d}, [xN|SP], #16
+ 0100 1100 1101 1111 0111 11 N T LD1 {vT.2d}, [xN|SP], #16
+ 0100 1100 1001 1111 0111 10 N T ST1 {vT.4s}, [xN|SP], #16
+ 0100 1100 1001 1111 0111 01 N T ST1 {vT.8h}, [xN|SP], #16
+ Note that #16 is implied and cannot be any other value.
+ FIXME does this assume that the host is little endian?
+ */
+ if ( (insn & 0xFFFFFC00) == 0x4C9F7C00 // ST1 {vT.2d}, [xN|SP], #16
+ || (insn & 0xFFFFFC00) == 0x4CDF7C00 // LD1 {vT.2d}, [xN|SP], #16
+ || (insn & 0xFFFFFC00) == 0x4C9F7800 // ST1 {vT.4s}, [xN|SP], #16
+ || (insn & 0xFFFFFC00) == 0x4C9F7400 // ST1 {vT.8h}, [xN|SP], #16
+ ) {
+ Bool isLD = INSN(22,22) == 1;
+ UInt rN = INSN(9,5);
+ UInt vT = INSN(4,0);
+ IRTemp tEA = newTemp(Ity_I64);
+ const HChar* names[4] = { "2d", "4s", "8h", "16b" };
+ const HChar* name = names[INSN(11,10)];
assign(tEA, getIReg64orSP(rN));
if (rN == 31) { /* FIXME generate stack alignment check */ }
if (isLD) {
@@ -3891,12 +4015,34 @@
} else {
storeLE(mkexpr(tEA), getQReg128(vT));
}
- DIP("%s {v%u.%s}, [%s]\n", isLD ? "ld1" : "st1",
- vT, INSN(11,10) == BITS2(0,0) ? "16b" : "2d",
- nameIReg64orSP(rN));
+ putIReg64orSP(rN, binop(Iop_Add64, mkexpr(tEA), mkU64(16)));
+ DIP("%s {v%u.%s}, [%s], #16\n", isLD ? "ld1" : "st1",
+ vT, name, nameIReg64orSP(rN));
return True;
}
+ /*
+ 0000 1100 1001 1111 0111 10 N T ST1 {vT.2s}, [xN|SP], #8
+ Note that #8 is implied and cannot be any other value.
+ FIXME does this assume that the host is little endian?
+ */
+ if ( (insn & 0xFFFFFC00) == 0x0C9F7800 // st1 {vT.2s}, [xN|SP], #8
+ ) {
+ UInt rN = INSN(9,5);
+ UInt vT = INSN(4,0);
+ IRTemp tEA = newTemp(Ity_I64);
+ const HChar* names[4] = { "1d", "2s", "4h", "8b" };
+ const HChar* name = names[INSN(11,10)];
+ assign(tEA, getIReg64orSP(rN));
+ if (rN == 31) { /* FIXME generate stack alignment check */ }
+ storeLE(mkexpr(tEA), getQRegLane(vT, 0, Ity_I64));
+ putIReg64orSP(rN, binop(Iop_Add64, mkexpr(tEA), mkU64(8)));
+ DIP("st1 {v%u.%s}, [%s], #8\n", vT, name, nameIReg64orSP(rN));
+ return True;
+ }
+
+ /* FIXME Temporary hacks to get through ld.so FIXME */
+
/* -------------------- LD{A}XR -------------------- */
/* FIXME: this is a hack; needs real atomicity stuff. */
/* 31 29 20 19 9 4
@@ -4216,36 +4362,102 @@
/* Generate N copies of |bit| in the bottom of a ULong. */
static ULong Replicate ( ULong bit, Int N )
{
- vassert(bit <= 1 && N >= 1 && N < 64);
- if (bit == 0) {
- return 0;
- } else {
- /* Careful. This won't work for N == 64. */
- return (1ULL << N) - 1;
- }
+ vassert(bit <= 1 && N >= 1 && N < 64);
+ if (bit == 0) {
+ return 0;
+ } else {
+ /* Careful. This won't work for N == 64. */
+ return (1ULL << N) - 1;
+ }
}
static ULong VFPExpandImm ( ULong imm8, Int N )
{
- vassert(imm8 <= 0xFF);
- vassert(N == 32 || N == 64);
- Int E = ((N == 32) ? 8 : 11) - 2; // The spec incorrectly omits the -2.
- Int F = N - E - 1;
- ULong imm8_6 = (imm8 >> 6) & 1;
- /* sign: 1 bit */
- /* exp: E bits */
- /* frac: F bits */
- ULong sign = (imm8 >> 7) & 1;
- ULong exp = ((imm8_6 ^ 1) << (E-1)) | Replicate(imm8_6, E-1);
- ULong frac = ((imm8 & 63) << (F-6)) | Replicate(0, F-6);
- vassert(sign < (1ULL << 1));
- vassert(exp < (1ULL << E));
- vassert(frac < (1ULL << F));
- vassert(1 + E + F == N);
- ULong res = (sign << (E+F)) | (exp << F) | frac;
- return res;
+ vassert(imm8 <= 0xFF);
+ vassert(N == 32 || N == 64);
+ Int E = ((N == 32) ? 8 : 11) - 2; // The spec incorrectly omits the -2.
+ Int F = N - E - 1;
+ ULong imm8_6 = (imm8 >> 6) & 1;
+ /* sign: 1 bit */
+ /* exp: E bits */
+ /* frac: F bits */
+ ULong sign = (imm8 >> 7) & 1;
+ ULong exp = ((imm8_6 ^ 1) << (E-1)) | Replicate(imm8_6, E-1);
+ ULong frac = ((imm8 & 63) << (F-6)) | Replicate(0, F-6);
+ vassert(sign < (1ULL << 1));
+ vassert(exp < (1ULL << E));
+ vassert(frac < (1ULL << F));
+ vassert(1 + E + F == N);
+ ULong res = (sign << (E+F)) | (exp << F) | frac;
+ return res;
}
+/* Help a bit for decoding laneage for vector operations that can be
+ of the form 4x32, 2x64 or 2x32-and-zero-upper-half, as encoded by Q
+ and SZ bits, typically for vector floating point. */
+static Bool getLaneInfo_Q_SZ ( /*OUT*/IRType* tyI, /*OUT*/IRType* tyF,
+ /*OUT*/UInt* nLanes, /*OUT*/Bool* zeroUpper,
+ /*OUT*/const HChar** arrSpec,
+ Bool bitQ, Bool bitSZ )
+{
+ vassert(bitQ == True || bitQ == False);
+ vassert(bitSZ == True || bitSZ == False);
+ if (bitQ && bitSZ) { // 2x64
+ if (tyI) *tyI = Ity_I64;
+ if (tyF) *tyF = Ity_F64;
+ if (nLanes) *nLanes = 2;
+ if (zeroUpper) *zeroUpper = False;
+ if (arrSpec) *arrSpec = "2d";
+ return True;
+ }
+ if (bitQ && !bitSZ) { // 4x32
+ if (tyI) *tyI = Ity_I32;
+ if (tyF) *tyF = Ity_F32;
+ if (nLanes) *nLanes = 4;
+ if (zeroUpper) *zeroUpper = False;
+ if (arrSpec) *arrSpec = "4s";
+ return True;
+ }
+ if (!bitQ && !bitSZ) { // 2x32
+ if (tyI) *tyI = Ity_I32;
+ if (tyF) *tyF = Ity_F32;
+ if (nLanes) *nLanes = 2;
+ if (zeroUpper) *zeroUpper = True;
+ if (arrSpec) *arrSpec = "2s";
+ return True;
+ }
+ // Else impliedly 1x64, which isn't allowed.
+ return False;
+}
+
+/* Helper for decoding laneage for simple vector operations,
+ eg integer add. */
+static Bool getLaneInfo_SIMPLE ( /*OUT*/Bool* zeroUpper,
+ /*OUT*/const HChar** arrSpec,
+ Bool bitQ, UInt szBlg2 )
+{
+ vassert(bitQ == True || bitQ == False);
+ vassert(szBlg2 < 4);
+ Bool zu = False;
+ const HChar* as = NULL;
+ switch ((szBlg2 << 1) | (bitQ ? 1 : 0)) {
+ case 0: zu = True; as = "8b"; break;
+ case 1: zu = False; as = "16b"; break;
+ case 2: zu = True; as = "4h"; break;
+ case 3: zu = False; as = "8h"; break;
+ case 4: zu = True; as = "2s"; break;
+ case 5: zu = False; as = "4s"; break;
+ case 6: return False; // impliedly 1x64
+ case 7: zu = False; as = "2d"; break;
+ default: vassert(0);
+ }
+ vassert(as);
+ if (arrSpec) *arrSpec = as;
+ if (zeroUpper) *zeroUpper = zu;
+ return True;
+}
+
+
static
Bool dis_ARM64_simd_and_fp(/*MB_OUT*/DisResult* dres, UInt insn)
{
@@ -4294,28 +4506,28 @@
switch (ix) {
case 1:
putQReg128(dd, mkV128(0));
- putQReg(dd, getIReg32orZR(nn));
+ putQRegLO(dd, getIReg32orZR(nn));
DIP("fmov s%u, w%u\n", dd, nn);
break;
case 2:
putQReg128(dd, mkV128(0));
- putQReg(dd, getIReg64orZR(nn));
+ putQRegLO(dd, getIReg64orZR(nn));
DIP("fmov d%u, x%u\n", dd, nn);
break;
case 3:
- putQReg64HI(dd, getIReg64orZR(nn));
+ putQRegHI64(dd, getIReg64orZR(nn));
DIP("fmov v%u.d[1], x%u\n", dd, nn);
break;
case 4:
- putIReg32orZR(dd, getQReg(Ity_I32, nn));
+ putIReg32orZR(dd, getQRegLO(nn, Ity_I32));
DIP("fmov w%u, s%u\n", dd, nn);
break;
case 5:
- putIReg64orZR(dd, getQReg(Ity_I64, nn));
+ putIReg64orZR(dd, getQRegLO(nn, Ity_I64));
DIP("fmov x%u, d%u\n", dd, nn);
break;
case 6:
- putIReg64orZR(dd, getQReg64HI(nn));
+ putIReg64orZR(dd, getQRegHI64(nn));
DIP("fmov x%u, v%u.d[1]\n", dd, nn);
break;
default:
@@ -4341,8 +4553,9 @@
vassert(0 == (imm & 0xFFFFFFFF00000000ULL));
}
putQReg128(dd, mkV128(0));
- putQReg(dd, isD ? mkU64(imm) : mkU32(imm & 0xFFFFFFFFULL));
- DIP("fmov %s, #0x%llx\n", nameQReg(isD ? 8 : 4, dd), imm);
+ putQRegLO(dd, isD ? mkU64(imm) : mkU32(imm & 0xFFFFFFFFULL));
+ DIP("fmov %s, #0x%llx\n",
+ nameQRegLO(dd, isD ? Ity_F64 : Ity_F32), imm);
return True;
}
@@ -4377,9 +4590,9 @@
? unop(ops[ix], src)
: binop(ops[ix], mkexpr(mk_get_IR_rounding_mode()), src);
putQReg128(dd, mkV128(0));
- putQReg(dd, res);
+ putQRegLO(dd, res);
DIP("%ccvtf %s, %s\n",
- isU ? 'u' : 's', nameQReg(isF64 ? 8 : 4, dd),
+ isU ? 'u' : 's', nameQRegLO(dd, isF64 ? Ity_F64 : Ity_F32),
nameIRegOrZR(isI64, nn));
return True;
}
@@ -4402,7 +4615,6 @@
UInt dd = INSN(4,0);
IROp iop = Iop_INVALID;
IRType ty = isD ? Ity_F64 : Ity_F32;
- UInt szB = isD ? 8 : 4;
Bool neg = False;
const HChar* nm = "???";
switch (op) {
@@ -4416,13 +4628,13 @@
}
vassert(iop != Iop_INVALID);
IRExpr* resE = triop(iop, mkexpr(mk_get_IR_rounding_mode()),
- getQReg(ty, nn), getQReg(ty, mm));
+ getQRegLO(nn, ty), getQRegLO(mm, ty));
IRTemp res = newTemp(ty);
assign(res, neg ? unop(mkNEGF(ty),resE) : resE);
putQReg128(dd, mkV128(0));
- putQReg(dd, mkexpr(res));
+ putQRegLO(dd, mkexpr(res));
DIP("%s %s, %s, %s\n",
- nm, nameQReg(szB, dd), nameQReg(szB, nn), nameQReg(szB, mm));
+ nm, nameQRegLO(dd, ty), nameQRegLO(nn, ty), nameQRegLO(mm, ty));
return True;
}
@@ -4442,32 +4654,32 @@
UInt nn = INSN(9,5);
UInt dd = INSN(4,0);
IRType ty = isD ? Ity_F64 : Ity_F32;
- UInt szB = isD ? 8 : 4;
IRTemp res = newTemp(ty);
if (opc == BITS2(0,0)) {
- assign(res, getQReg(ty, nn));
+ assign(res, getQRegLO(nn, ty));
putQReg128(dd, mkV128(0x0000));
- putQReg(dd, mkexpr(res));
- DIP("fmov %s, %s\n", nameQReg(szB, dd), nameQReg(szB, nn));
+ putQRegLO(dd, mkexpr(res));
+ DIP("fmov %s, %s\n",
+ nameQRegLO(dd, ty), nameQRegLO(nn, ty));
return True;
}
if (opc == BITS2(1,0) || opc == BITS2(0,1)) {
Bool isAbs = opc == BITS2(0,1);
IROp op = isAbs ? mkABSF(ty) : mkNEGF(ty);
- assign(res, unop(op, getQReg(ty, nn)));
+ assign(res, unop(op, getQRegLO(nn, ty)));
putQReg128(dd, mkV128(0x0000));
- putQReg(dd, mkexpr(res));
+ putQRegLO(dd, mkexpr(res));
DIP("%s %s, %s\n", isAbs ? "fabs" : "fneg",
- nameQReg(szB, dd), nameQReg(szB, nn));
+ nameQRegLO(dd, ty), nameQRegLO(nn, ty));
return True;
}
if (opc == BITS2(1,1)) {
assign(res,
binop(mkSQRTF(ty),
- mkexpr(mk_get_IR_rounding_mode()), getQReg(ty, nn)));
+ mkexpr(mk_get_IR_rounding_mode()), getQRegLO(nn, ty)));
putQReg128(dd, mkV128(0x0000));
- putQReg(dd, mkexpr(res));
- DIP("fsqrt %s, %s\n", nameQReg(szB, dd), nameQReg(szB, nn));
+ putQRegLO(dd, mkexpr(res));
+ DIP("fsqrt %s, %s\n", nameQRegLO(dd, ty), nameQRegLO(nn, ty));
return True;
}
/* else fall through; other cases are ATC */
@@ -4498,26 +4710,25 @@
Bool isCMPE = INSN(4,4) == 1;
Bool cmpZero = INSN(3,3) == 1;
IRType ty = isD ? Ity_F64 : Ity_F32;
- UInt szB = isD ? 8 : 4;
Bool valid = True;
if (cmpZero && mm != 0) valid = False;
if (valid) {
IRTemp argL = newTemp(ty);
IRTemp argR = newTemp(ty);
IRTemp irRes = newTemp(Ity_I32);
- assign(argL, getQReg(ty, nn));
+ assign(argL, getQRegLO(nn, ty));
assign(argR,
cmpZero
? (IRExpr_Const(isD ? IRConst_F64i(0) : IRConst_F32i(0)))
- : getQReg(ty, mm));
+ : getQRegLO(mm, ty));
assign(irRes, binop(isD ? Iop_CmpF64 : Iop_CmpF32,
mkexpr(argL), mkexpr(argR)));
IRTemp nzcv = mk_convert_IRCmpF64Result_to_NZCV(irRes);
IRTemp nzcv_28x0 = newTemp(Ity_I64);
assign(nzcv_28x0, binop(Iop_Shl64, mkexpr(nzcv), mkU8(28)));
setFlags_COPY(nzcv_28x0);
- DIP("fcmp%s %s, %s\n", isCMPE ? "e" : "",
- nameQReg(szB, nn), cmpZero ? "#0.0" : nameQReg(szB, mm));
+ DIP("fcmp%s %s, %s\n", isCMPE ? "e" : "", nameQRegLO(nn, ty),
+ cmpZero ? "#0.0" : nameQRegLO(mm, ty));
return True;
}
}
@@ -4544,15 +4755,14 @@
UInt dd = INSN(4,0);
UInt ix = (INSN(21,21) << 1) | INSN(15,15);
IRType ty = isD ? Ity_F64 : Ity_F32;
- UInt szB = isD ? 8 : 4;
IROp opADD = mkADDF(ty);
IROp opSUB = mkSUBF(ty);
IROp opMUL = mkMULF(ty);
IROp opNEG = mkNEGF(ty);
IRTemp res = newTemp(ty);
- IRExpr* eA = getQReg(ty, aa);
- IRExpr* eN = getQReg(ty, nn);
- IRExpr* eM = getQReg(ty, mm);
+ IRExpr* eA = getQRegLO(aa, ty);
+ IRExpr* eN = getQRegLO(nn, ty);
+ IRExpr* eM = getQRegLO(mm, ty);
IRExpr* rm = mkexpr(mk_get_IR_rounding_mode());
IRExpr* eNxM = triop(opMUL, rm, eN, eM);
switch (ix) {
@@ -4563,11 +4773,11 @@
default: vassert(0);
}
putQReg128(dd, mkV128(0x0000));
- putQReg(dd, mkexpr(res));
+ putQRegLO(dd, mkexpr(res));
const HChar* names[4] = { "fmadd", "fmsub", "fnmadd", "fnmsub" };
DIP("%s %s, %s, %s, %s\n",
- names[ix], nameQReg(szB, dd), nameQReg(szB, nn),
- nameQReg(szB, mm), nameQReg(szB, aa));
+ names[ix], nameQRegLO(dd, ty), nameQRegLO(nn, ty),
+ nameQRegLO(mm, ty), nameQRegLO(aa, ty));
return True;
}
@@ -4642,16 +4852,15 @@
} else {
return False;
}
- UInt srcSzB = isF64 ? 8 : 4;
IRType srcTy = isF64 ? Ity_F64 : Ity_F32;
IRType dstTy = isI64 ? Ity_I64 : Ity_I32;
IRTemp src = newTemp(srcTy);
IRTemp dst = newTemp(dstTy);
- assign(src, getQReg(srcTy, nn));
+ assign(src, getQRegLO(nn, srcTy));
assign(dst, binop(op, mkU32(irrm), mkexpr(src)));
putIRegOrZR(isI64, dd, mkexpr(dst));
DIP("fcvt%c%c %s, %s\n", ch, isU ? 'u' : 's',
- nameIRegOrZR(isI64, dd), nameQReg(srcSzB, nn));
+ nameIRegOrZR(isI64, dd), nameQRegLO(nn, srcTy));
return True;
}
@@ -4677,7 +4886,6 @@
UInt nn = INSN(9,5);
UInt dd = INSN(4,0);
IRType ty = isD ? Ity_F64 : Ity_F32;
- UInt szB = isD ? 8 : 4;
IRExpr* irrmE = NULL;
UChar ch = '?';
switch (rm) {
@@ -4689,12 +4897,13 @@
if (irrmE) {
IRTemp src = newTemp(ty);
IRTemp dst = newTemp(ty);
- assign(src, getQReg(ty, nn));
+ assign(src, getQRegLO(nn, ty));
assign(dst, binop(isD ? Iop_RoundF64toInt : Iop_RoundF32toInt,
irrmE, mkexpr(src)));
putQReg128(dd, mkV128(0x0000));
- putQReg(dd, mkexpr(dst));
- DIP("frint%c %s, %s\n", ch, nameQReg(szB, dd), nameQReg(szB, nn));
+ putQRegLO(dd, mkexpr(dst));
+ DIP("frint%c %s, %s\n",
+ ch, nameQRegLO(dd, ty), nameQRegLO(nn, ty));
return True;
}
/* else unhandled rounding mode case -- fall through */
@@ -4720,20 +4929,22 @@
if (b2322 == BITS2(0,0) && b1615 == BITS2(0,1)) {
/* Convert S to D */
IRTemp res = newTemp(Ity_F64);
- assign(res, unop(Iop_F32toF64, getQReg(Ity_F32, nn)));
+ assign(res, unop(Iop_F32toF64, getQRegLO(nn, Ity_F32)));
putQReg128(dd, mkV128(0x0000));
- putQReg(dd, mkexpr(res));
- DIP("fcvt %s, %s\n", nameQReg(8, dd), nameQReg(4, nn));
+ putQRegLO(dd, mkexpr(res));
+ DIP("fcvt %s, %s\n",
+ nameQRegLO(dd, Ity_F64), nameQRegLO(nn, Ity_F32));
return True;
}
if (b2322 == BITS2(0,1) && b1615 == BITS2(0,0)) {
/* Convert D to S */
IRTemp res = newTemp(Ity_F32);
assign(res, binop(Iop_F64toF32, mkexpr(mk_get_IR_rounding_mode()),
- getQReg(Ity_F64, nn)));
+ getQRegLO(nn, Ity_F64)));
putQReg128(dd, mkV128(0x0000));
- putQReg(dd, mkexpr(res));
- DIP("fcvt %s, %s\n", nameQReg(4, dd), nameQReg(8, nn));
+ putQRegLO(dd, mkexpr(res));
+ DIP("fcvt %s, %s\n",
+ nameQRegLO(dd, Ity_F32), nameQRegLO(nn, Ity_F64));
return True;
}
/* else unhandled */
@@ -4751,18 +4962,242 @@
UInt nn = INSN(9,5);
UInt dd = INSN(4,0);
IRType ty = isD ? Ity_F64 : Ity_F32;
- UInt szB = isD ? 8 : 4;
IRTemp res = newTemp(ty);
- assign(res, unop(mkABSF(ty), triop(mkSUBF(ty),
- mkexpr(mk_get_IR_rounding_mode()),
- getQReg(ty,nn), getQReg(ty,mm))));
+ assign(res, unop(mkABSF(ty),
+ triop(mkSUBF(ty),
+ mkexpr(mk_get_IR_rounding_mode()),
+ getQRegLO(nn,ty), getQRegLO(mm,ty))));
putQReg128(dd, mkV128(0x0000));
- putQReg(dd, mkexpr(res));
+ putQRegLO(dd, mkexpr(res));
DIP("fabd %s, %s, %s\n",
- nameQReg(szB, dd), nameQReg(szB, nn), nameQReg(szB, mm));
+ nameQRegLO(dd, ty), nameQRegLO(nn, ty), nameQRegLO(mm, ty));
return True;
}
+ /* -------------- {S,U}CVTF (vector, integer) -------------- */
+ /* 31 28 22 21 15 9 4
+ 0q0 01110 0 sz 1 00001 110110 n d SCVTF Vd, Vn
+ 0q1 01110 0 sz 1 00001 110110 n d UCVTF Vd, Vn
+ with laneage:
+ case sz:Q of 00 -> 2S, zero upper, 01 -> 4S, 10 -> illegal, 11 -> 2D
+ */
+ if (INSN(31,31) == 0 && INSN(28,23) == BITS6(0,1,1,1,0,0)
+ && INSN(21,16) == BITS6(1,0,0,0,0,1)
+ && INSN(15,10) == BITS6(1,1,0,1,1,0)) {
+ Bool isQ = INSN(30,30) == 1;
+ Bool isU = INSN(29,29) == 1;
+ Bool isF64 = INSN(22,22) == 1;
+ UInt nn = INSN(9,5);
+ UInt dd = INSN(4,0);
+ if (isQ || !isF64) {
+ IRType tyF = Ity_INVALID, tyI = Ity_INVALID;
+ UInt nLanes = 0;
+ Bool zeroHI = False;
+ const HChar* arrSpec = NULL;
+ Bool ok = getLaneInfo_Q_SZ(&tyI, &tyF, &nLanes, &zeroHI, &arrSpec,
+ isQ, isF64 );
+ IROp op = isU ? (isF64 ? Iop_I64UtoF64 : Iop_I32UtoF32)
+ : (isF64 ? Iop_I64StoF64 : Iop_I32StoF32);
+ IRTemp rm = mk_get_IR_rounding_mode();
+ UInt i;
+ vassert(ok); /* the 'if' above should ensure this */
+ for (i = 0; i < nLanes; i++) {
+ putQRegLane(dd, i,
+ binop(op, mkexpr(rm), getQRegLane(nn, i, tyI)));
+ }
+ if (zeroHI) {
+ putQRegLane(dd, 1, mkU64(0));
+ }
+ DIP("%ccvtf %s.%s, %s.%s\n", isU ? 'u' : 's',
+ nameQReg128(dd), arrSpec, nameQReg128(nn), arrSpec);
+ return True;
+ }
+ /* else fall through */
+ }
+
+ /* ---------- F{ADD,SUB,MUL,DIV,MLA,MLS} (vector) ---------- */
+ /* 31 28 22 21 20 15 9 4 case
+ 0q0 01110 0 sz 1 m 110101 n d FADD Vd,Vn,Vm 1
+ 0q0 01110 1 sz 1 m 110101 n d FSUB Vd,Vn,Vm 2
+ 0q1 01110 0 sz 1 m 110111 n d FMUL Vd,Vn,Vm 3
+ 0q1 01110 0 sz 1 m 111111 n d FDIV Vd,Vn,Vm 4
+ 0q0 01110 0 sz 1 m 110011 n d FMLA Vd,Vn,Vm 5
+ 0q0 01110 1 sz 1 m 110011 n d FMLS Vd,Vn,Vm 6
+ */
+ if (INSN(31,31) == 0
+ && INSN(28,24) == BITS5(0,1,1,1,0) && INSN(21,21) == 1) {
+ Bool isQ = INSN(30,30) == 1;
+ UInt b29 = INSN(29,29);
+ UInt b23 = INSN(23,23);
+ Bool isF64 = INSN(22,22) == 1;
+ UInt mm = INSN(20,16);
+ UInt b1510 = INSN(15,10);
+ UInt nn = INSN(9,5);
+ UInt dd = INSN(4,0);
+ UInt ix = 0;
+ /**/ if (b29 == 0 && b23 == 0 && b1510 == BITS6(1,1,0,1,0,1)) ix = 1;
+ else if (b29 == 0 && b23 == 1 && b1510 == BITS6(1,1,0,1,0,1)) ix = 2;
+ else if (b29 == 1 && b23 == 0 && b1510 == BITS6(1,1,0,1,1,1)) ix = 3;
+ else if (b29 == 1 && b23 == 0 && b1510 == BITS6(1,1,1,1,1,1)) ix = 4;
+ else if (b29 == 0 && b23 == 0 && b1510 == BITS6(1,1,0,0,1,1)) ix = 5;
+ else if (b29 == 0 && b23 == 1 && b1510 == BITS6(1,1,0,0,1,1)) ix = 6;
+ IRType laneTy = Ity_INVALID;
+ Bool zeroHI = False;
+ const HChar* arr = "??";
+ Bool ok
+ = getLaneInfo_Q_SZ(NULL, &laneTy, NULL, &zeroHI, &arr, isQ, isF64);
+ /* Skip MLA/MLS for the time being */
+ if (ok && ix >= 1 && ix <= 4) {
+ const IROp ops64[4]
+ = { Iop_Add64Fx2, Iop_Sub64Fx2, Iop_Mul64Fx2, Iop_Div64Fx2 };
+ const IROp ops32[4]
+ = { Iop_Add32Fx4, Iop_Sub32Fx4, Iop_Mul32Fx4, Iop_Div32Fx4 };
+ const HChar* names[4]
+ = { "fadd", "fsub", "fmul", "fdiv" };
+ IROp op = laneTy==Ity_F64 ? ops64[ix-1] : ops32[ix-1];
+ IRTemp rm = mk_get_IR_rounding_mode();
+ IRTemp t1 = newTemp(Ity_V128);
+ IRTemp t2 = newTemp(Ity_V128);
+ assign(t1, triop(op, mkexpr(rm), getQReg128(nn), getQReg128(mm)));
+ assign(t2, zeroHI ? unop(Iop_ZeroHI64, mkexpr(t1)) : mkexpr(t1));
+ putQReg128(dd, mkexpr(t2));
+ DIP("%s %s.%s, %s.%s, %s.%s\n", names[ix-1],
+ nameQReg128(dd), arr, nameQReg128(nn), arr, nameQReg128(mm), arr);
+ return True;
+ }
+ }
+
+ /* ---------------- ADD/SUB (vector) ---------------- */
+ /* 31 28 23 21 20 15 9 4
+ 0q0 01110 size 1 m 100001 n d ADD Vd.T, Vn.T, Vm.T
+ 0q1 01110 size 1 m 100001 n d SUB Vd.T, Vn.T, Vm.T
+ */
+ if (INSN(31,31) == 0 && INSN(28,24) == BITS5(0,1,1,1,0)
+ && INSN(21,21) == 1 && INSN(15,10) == BITS6(1,0,0,0,0,1)) {
+ Bool isQ = INSN(30,30) == 1;
+ UInt szBlg2 = INSN(23,22);
+ Bool isSUB = INSN(29,29) == 1;
+ UInt mm = INSN(20,16);
+ UInt nn = INSN(9,5);
+ UInt dd = INSN(4,0);
+ Bool zeroHI = False;
+ const HChar* arrSpec = "";
+ Bool ok = getLaneInfo_SIMPLE(&zeroHI, &arrSpec, isQ, szBlg2 );
+ if (ok) {
+ const IROp opADD[4]
+ = { Iop_Add8x16, Iop_Add16x8, Iop_Add32x4, Iop_Add64x2 };
+ const IROp opSUB[4]
+ = { Iop_Sub8x16, Iop_Sub16x8, Iop_Sub32x4, Iop_Sub64x2 };
+ vassert(szBlg2 < 4);
+ IROp op = isSUB ? opSUB[szBlg2] : opADD[szBlg2];
+ IRTemp t = newTemp(Ity_V128);
+ assign(t, binop(op, getQReg128(nn), getQReg128(mm)));
+ putQReg128(dd, zeroHI ? unop(Iop_ZeroHI64, mkexpr(t)) : mkexpr(t));
+ const HChar* nm = isSUB ? "sub" : "add";
+ DIP("%s %s.%s, %s.%s, %s.%s\n", nm,
+ nameQReg128(dd), arrSpec,
+ nameQReg128(nn), arrSpec, nameQReg128(mm), arrSpec);
+ return True;
+ }
+ /* else fall through */
+ }
+
+ /* -------------------- XTN{,2} -------------------- */
+ /* 31 28 23 21 15 9 4
+ 0q0 01110 size 100001 001010 n d
+ */
+ if (INSN(31,31) == 0 && INSN(29,24) == BITS6(0,0,1,1,1,0)
+ && INSN(21,16) == BITS6(1,0,0,0,0,1)
+ && INSN(15,10) == BITS6(0,0,1,0,1,0)) {
+ Bool isQ = INSN(30,30) == 1;
+ UInt size = INSN(23,22);
+ UInt nn = INSN(9,5);
+ UInt dd = INSN(4,0);
+ IROp op = Iop_INVALID;
+ const HChar* tb = NULL;
+ const HChar* ta = NULL;
+ switch ((size << 1) | (isQ ? 1 : 0)) {
+ case 0: tb = "8b"; ta = "8h"; op = Iop_NarrowUn16to8x8; break;
+ case 1: tb = "16b"; ta = "8h"; op = Iop_NarrowUn16to8x8; break;
+ case 2: tb = "4h"; ta = "4s"; op = Iop_NarrowUn32to16x4; break;
+ case 3: tb = "8h"; ta = "4s"; op = Iop_NarrowUn32to16x4; break;
+ case 4: tb = "2s"; ta = "2d"; op = Iop_NarrowUn64to32x2; break;
+ case 5: tb = "4s"; ta = "2d"; op = Iop_NarrowUn64to32x2; break;
+ case 6: break;
+ case 7: break;
+ default: vassert(0);
+ }
+ if (op != Iop_INVALID) {
+ if (!isQ) {
+ putQRegLane(dd, 1, mkU64(0));
+ }
+ putQRegLane(dd, isQ ? 1 : 0, unop(op, getQReg128(nn)));
+ DIP("xtn%s %s.%s, %s.%s\n", isQ ? "2" : "",
+ nameQReg128(dd), tb, nameQReg128(nn), ta);
+ return True;
+ }
+ /* else fall through */
+ }
+
+ /* ---------------- DUP (element, vector) ---------------- */
+ /* 31 28 20 15 9 4
+ 0q0 01110000 imm5 000001 n d DUP Vd.T, Vn.Ts[index]
+ */
+ if (INSN(31,31) == 0 && INSN(29,21) == BITS9(0,0,1,1,1,0,0,0,0)
+ && INSN(15,10) == BITS6(0,0,0,0,0,1)) {
+ Bool isQ = INSN(30,30) == 1;
+ UInt imm5 = INSN(20,16);
+ UInt nn = INSN(9,5);
+ UInt dd = INSN(4,0);
+ IRTemp w0 = newTemp(Ity_I64);
+ const HChar* arT = "??";
+ const HChar* arTs = "??";
+ IRType laneTy = Ity_INVALID;
+ UInt laneNo = 16; /* invalid */
+ if (imm5 & 1) {
+ arT = isQ ? "16b" : "8b";
+ arTs = "b";
+ laneNo = (imm5 >> 1) & 15;
+ laneTy = Ity_I8;
+ assign(w0, unop(Iop_8Uto64, getQRegLane(nn, laneNo, laneTy)));
+ }
+ else if (imm5 & 2) {
+ arT = isQ ? "8h" : "4h";
+ arTs = "h";
+ laneNo = (imm5 >> 2) & 7;
+ laneTy = Ity_I16;
+ assign(w0, unop(Iop_16Uto64, getQRegLane(nn, laneNo, laneTy)));
+ }
+ else if (imm5 & 4) {
+ arT = isQ ? "4s" : "2s";
+ arTs = "s";
+ laneNo = (imm5 >> 3) & 3;
+ laneTy = Ity_I32;
+ assign(w0, unop(Iop_32Uto64, getQRegLane(nn, laneNo, laneTy)));
+ }
+ else if ((imm5 & 8) && isQ) {
+ arT = "2d";
+ arTs = "d";
+ laneNo = (imm5 >> 4) & 1;
+ laneTy = Ity_I64;
+ assign(w0, getQRegLane(nn, laneNo, laneTy));
+ }
+ else {
+ /* invalid; leave laneTy unchanged. */
+ }
+ /* */
+ if (laneTy != Ity_INVALID) {
+ vassert(laneNo < 16);
+ IRTemp w1 = math_DUP_TO_64(w0, laneTy);
+ putQReg128(dd, binop(Iop_64HLtoV128,
+ isQ ? mkexpr(w1) : mkU64(0), mkexpr(w1)));
+ DIP("dup %s.%s, %s.%s[%u]\n",
+ nameQReg128(dd), arT, nameQReg128(nn), arTs, laneNo);
+ return True;
+ }
+ /* else fall through */
+ }
+
/* FIXME Temporary hacks to get through ld.so FIXME */
/* ------------------ movi vD.4s, #0x0 ------------------ */
Modified: trunk/priv/host_arm64_defs.c
==============================================================================
--- trunk/priv/host_arm64_defs.c (original)
+++ trunk/priv/host_arm64_defs.c Sun Jan 26 19:11:14 2014
@@ -848,6 +848,23 @@
}
}
+static void showARM64VecBinOp(/*OUT*/const HChar** nm,
+ /*OUT*/const HChar** ar, ARM64VecBinOp op ) {
+ switch (op) {
+ case ARM64vecb_ADD64x2: *nm = "add "; *ar = "2d"; return;
+ case ARM64vecb_ADD32x4: *nm = "add "; *ar = "4s"; return;
+ case ARM64vecb_ADD16x8: *nm = "add "; *ar = "8h"; return;
+ case ARM64vecb_SUB64x2: *nm = "sub "; *ar = "2d"; return;
+ case ARM64vecb_SUB32x4: *nm = "sub "; *ar = "4s"; return;
+ case ARM64vecb_SUB16x8: *nm = "sub "; *ar = "8h"; return;
+ case ARM64vecb_FADD64x2: *nm = "fadd"; *ar = "2d"; return;
+ case ARM64vecb_FSUB64x2: *nm = "fsub"; *ar = "2d"; return;
+ case ARM64vecb_FMUL64x2: *nm = "fmul"; *ar = "2d"; return;
+ case ARM64vecb_FDIV64x2: *nm = "fdiv"; *ar = "2d"; return;
+ default: vpanic("showARM64VecBinOp");
+ }
+}
+
//ZZ const HChar* showARMNeonBinOp ( ARMNeonBinOp op ) {
//ZZ switch (op) {
//ZZ case ARMneon_VAND: return "vand";
@@ -1512,6 +1529,25 @@
i->ARM64in.FPCR.iReg = iReg;
return i;
}
+ARM64Instr* ARM64Instr_VBinV ( ARM64VecBinOp op,
+ HReg dst, HReg argL, HReg argR ) {
+ ARM64Instr* i = LibVEX_Alloc(sizeof(ARM64Instr));
+ i->tag = ARM64in_VBinV;
+ i->ARM64in.VBinV.op = op;
+ i->ARM64in.VBinV.dst = dst;
+ i->ARM64in.VBinV.argL = argL;
+ i->ARM64in.VBinV.argR = argR;
+ return i;
+}
+ARM64Instr* ARM64Instr_VNarrowV ( UInt dszBlg2, HReg dst, HReg src ) {
+ ARM64Instr* i = LibVEX_Alloc(sizeof(ARM64Instr));
+ i->tag = ARM64in_VNarrowV;
+ i->ARM64in.VNarrowV.dszBlg2 = dszBlg2;
+ i->ARM64in.VNarrowV.dst = dst;
+ i->ARM64in.VNarrowV.src = src;
+ vassert(dszBlg2 == 0 || dszBlg2 == 1 || dszBlg2 == 2);
+ return i;
+}
//ZZ ARMInstr* ARMInstr_VAluS ( ARMVfpOp op, HReg dst, HReg argL, HReg argR ) {
//ZZ ARMInstr* i = LibVEX_Alloc(sizeof(ARMInstr));
//ZZ i->tag = ARMin_VAluS;
@@ -2104,6 +2140,30 @@
vex_printf(", fpcr");
}
return;
+ case ARM64in_VBinV: {
+ const HChar* nm = "??";
+ const HChar* ar = "??";
+ showARM64VecBinOp(&nm, &ar, i->ARM64in.VBinV.op);
+ vex_printf("%s ", nm);
+ ppHRegARM64(i->ARM64in.VBinV.dst);
+ vex_printf(".%s, ", ar);
+ ppHRegARM64(i->ARM64in.VBinV.argL);
+ vex_printf(".%s, ", ar);
+ ppHRegARM64(i->ARM64in.VBinV.argR);
+ vex_printf(".%s", ar);
+ return;
+ }
+ case ARM64in_VNarrowV: {
+ UInt dszBlg2 = i->ARM64in.VNarrowV.dszBlg2;
+ const HChar* darr[3] = { "8b", "4h", "2s" };
+ const HChar* sarr[3] = { "8h", "4s", "2d" };
+ vex_printf("xtn ");
+ ppHRegARM64(i->ARM64in.VNarrowV.dst);
+ vex_printf(".%s, ", dszBlg2 < 3 ? darr[dszBlg2] : "??");
+ ppHRegARM64(i->ARM64in.VNarrowV.src);
+ vex_printf(".%s", dszBlg2 < 3 ? sarr[dszBlg2] : "??");
+ return;
+ }
//ZZ case ARMin_VAluS:
//ZZ vex_printf("f%-3ss ", showARMVfpOp(i->ARMin.VAluS.op));
//ZZ ppHRegARM(i->ARMin.VAluS.dst);
@@ -2567,6 +2627,15 @@
else
addHRegUse(u, HRmWrite, i->ARM64in.FPCR.iReg);
return;
+ case ARM64in_VBinV:
+ addHRegUse(u, HRmWrite, i->ARM64in.VBinV.dst);
+ addHRegUse(u, HRmRead, i->ARM64in.VBinV.argL);
+ addHRegUse(u, HRmRead, i->ARM64in.VBinV.argR);
+ return;
+ case ARM64in_VNarrowV:
+ addHRegUse(u, HRmWrite, i->ARM64in.VNarrowV.dst);
+ addHRegUse(u, HRmRead, i->ARM64in.VNarrowV.src);
+ return;
//ZZ case ARMin_VAluS:
//ZZ addHRegUse(u, HRmWrite, i->ARMin.VAluS.dst);
//ZZ addHRegUse(u, HRmRead, i->ARMin.VAluS.argL);
@@ -2842,6 +2911,15 @@
case ARM64in_FPCR:
i->ARM64in.FPCR.iReg = lookupHRegRemap(m, i->ARM64in.FPCR.iReg);
return;
+ case ARM64in_VBinV:
+ i->ARM64in.VBinV.dst = lookupHRegRemap(m, i->ARM64in.VBinV.dst);
+ i->ARM64in.VBinV.argL = lookupHRegRemap(m, i->ARM64in.VBinV.argL);
+ i->ARM64in.VBinV.argR = lookupHRegRemap(m, i->ARM64in.VBinV.argR);
+ return;
+ case ARM64in_VNarrowV:
+ i->ARM64in.VNarrowV.dst = lookupHRegRemap(m, i->ARM64in.VNarrowV.dst);
+ i->ARM64in.VNarrowV.src = lookupHRegRemap(m, i->ARM64in.VNarrowV.src);
+ return;
//ZZ case ARMin_VAluS:
//ZZ i->ARMin.VAluS.dst = lookupHRegRemap(m, i->ARMin.VAluS.dst);
//ZZ i->ARMin.VAluS.argL = lookupHRegRemap(m, i->ARMin.VAluS.argL);
@@ -3117,15 +3195,16 @@
#define X110 BITS4(0, 1,1,0)
#define X111 BITS4(0, 1,1,1)
-#define BITS8(zzb7,zzb6,zzb5,zzb4,zzb3,zzb2,zzb1,zzb0) \
- ((BITS4(zzb7,zzb6,zzb5,zzb4) << 4) | BITS4(zzb3,zzb2,zzb1,zzb0))
-
#define X0000 BITS4(0,0,0,0)
#define X0001 BITS4(0,0,0,1)
#define X0010 BITS4(0,0,1,0)
#define X0011 BITS4(0,0,1,1)
+#define BITS8(zzb7,zzb6,zzb5,zzb4,zzb3,zzb2,zzb1,zzb0) \
+ ((BITS4(zzb7,zzb6,zzb5,zzb4) << 4) | BITS4(zzb3,zzb2,zzb1,zzb0))
+
#define X00000 BITS8(0,0,0, 0,0,0,0,0)
+#define X00001 BITS8(0,0,0, 0,0,0,0,1)
#define X00111 BITS8(0,0,0, 0,0,1,1,1)
#define X01000 BITS8(0,0,0, 0,1,0,0,0)
#define X10000 BITS8(0,0,0, 1,0,0,0,0)
@@ -3143,14 +3222,18 @@
#define X010001 BITS8(0,0, 0,1,0,0,0,1)
#define X011010 BITS8(0,0, 0,1,1,0,1,0)
#define X011111 BITS8(0,0, 0,1,1,1,1,1)
+#define X100001 BITS8(0,0, 1,0,0,0,0,1)
#define X100100 BITS8(0,0, 1,0,0,1,0,0)
#define X100101 BITS8(0,0, 1,0,0,1,0,1)
#define X100110 BITS8(0,0, 1,0,0,1,1,0)
#define X110000 BITS8(0,0, 1,1,0,0,0,0)
#define X110001 BITS8(0,0, 1,1,0,0,0,1)
+#define X110101 BITS8(0,0, 1,1,0,1,0,1)
+#define X110111 BITS8(0,0, 1,1,0,1,1,1)
#define X111000 BITS8(0,0, 1,1,1,0,0,0)
#define X111001 BITS8(0,0, 1,1,1,0,0,1)
#define X111101 BITS8(0,0, 1,1,1,1,0,1)
+#define X111111 BITS8(0,0, 1,1,1,1,1,1)
#define X00100000 BITS8(0,0,1,0,0,0,0,0)
#define X00100001 BITS8(0,0,1,0,0,0,0,1)
@@ -3165,6 +3248,10 @@
#define X01100010 BITS8(0,1,1,0,0,0,1,0)
#define X01100011 BITS8(0,1,1,0,0,0,1,1)
#define X01110000 BITS8(0,1,1,1,0,0,0,0)
+#define X01110001 BITS8(0,1,1,1,0,0,0,1)
+#define X01110011 BITS8(0,1,1,1,0,0,1,1)
+#define X01110101 BITS8(0,1,1,1,0,1,0,1)
+#define X01110111 BITS8(0,1,1,1,0,1,1,1)
#define X11000001 BITS8(1,1,0,0,0,0,0,1)
#define X11000011 BITS8(1,1,0,0,0,0,1,1)
#define X11010100 BITS8(1,1,0,1,0,1,0,0)
@@ -4418,7 +4505,7 @@
/* 31 28 23 21 20 18 15 9 4
000 11110 00 1 00 010 000000 n d SCVTF Sd, Wn
000 11110 01 1 00 010 000000 n d SCVTF Dd, Wn
- 100 11110 00 1 00 010 000000 n d SCVTF Sd, Xn x
+ 100 11110 00 1 00 010 000000 n d SCVTF Sd, Xn
100 11110 01 1 00 010 000000 n d SCVTF Dd, Xn
000 11110 00 1 00 011 000000 n d UCVTF Sd, Wn
000 11110 01 1 00 011 000000 n d UCVTF Dd, Wn
@@ -4521,16 +4608,6 @@
}
goto done;
}
- case ARM64in_FPCR: {
- Bool toFPCR = i->ARM64in.FPCR.toFPCR;
- UInt iReg = iregNo(i->ARM64in.FPCR.iReg);
- if (toFPCR) {
- /* 0xD51B44 000 Rt MSR fpcr, rT */
- *p++ = 0xD51B4400 | (iReg & 0x1F);
- goto done;
- }
- goto bad; // FPCR -> iReg case currently ATC
- }
case ARM64in_VUnaryD: {
/* 31 23 21 16 14 9 4
000,11110 01 1,0000 0,0 10000 n d FMOV Dd, Dn (not handled)
@@ -4653,6 +4730,75 @@
*p++ = X_3_8_5_6_5_5(X000, X11110001, sM, X001000, sN, X00000);
goto done;
}
+ case ARM64in_FPCR: {
+ Bool toFPCR = i->ARM64in.FPCR.toFPCR;
+ UInt iReg = iregNo(i->ARM64in.FPCR.iReg);
+ if (toFPCR) {
+ /* 0xD51B44 000 Rt MSR fpcr, rT */
+ *p++ = 0xD51B4400 | (iReg & 0x1F);
+ goto done;
+ }
+ goto bad; // FPCR -> iReg case currently ATC
+ }
+ case ARM64in_VBinV: {
+ /* 31 23 20 15 9 4
+ 010 01110 11 1 m 100001 n d ADD Vd.2d, Vn.2d, Vm.2d
+ 010 01110 10 1 m 100001 n d ADD Vd.4s, Vn.4s, Vm.4s
+ 011 01110 11 1 m 100001 n d SUB Vd.2d, Vn.2d, Vm.2d
+ 011 01110 10 1 m 100001 n d SUB Vd.4s, Vn.4s, Vm.4s
+ 011 01110 01 1 m 100001 n d SUB Vd.8h, Vn.8h, Vm.8h
+ 010 01110 01 1 m 110101 n d FADD Vd.2d, Vn.2d, Vm.2d
+ 010 01110 11 1 m 110101 n d FSUB Vd.2d, Vn.2d, Vm.2d
+ 011 01110 01 1 m 110111 n d FMUL Vd.2d, Vn.2d, Vm.2d
+ 011 01110 01 1 m 111111 n d FDIV Vd.2d, Vn.2d, Vm.2d
+ */
+ UInt vD = qregNo(i->ARM64in.VBinV.dst);
+ UInt vN = qregNo(i->ARM64in.VBinV.argL);
+ UInt vM = qregNo(i->ARM64in.VBinV.argR);
+ switch (i->ARM64in.VBinV.op) {
+ case ARM64vecb_ADD64x2:
+ *p++ = X_3_8_5_6_5_5(X010, X01110111, vM, X100001, vN, vD);
+ break;
+ case ARM64vecb_SUB64x2:
+ *p++ = X_3_8_5_6_5_5(X011, X01110111, vM, X100001, vN, vD);
+ break;
+ case ARM64vecb_SUB32x4:
+ *p++ = X_3_8_5_6_5_5(X011, X01110101, vM, X100001, vN, vD);
+ break;
+ case ARM64vecb_SUB16x8:
+ *p++ = X_3_8_5_6_5_5(X011, X01110011, vM, X100001, vN, vD);
+ break;
+ case ARM64vecb_FADD64x2:
+ *p++ = X_3_8_5_6_5_5(X010, X01110011, vM, X110101, vN, vD);
+ break;
+ case ARM64vecb_FSUB64x2:
+ *p++ = X_3_8_5_6_5_5(X010, X01110111, vM, X110101, vN, vD);
+ break;
+ case ARM64vecb_FMUL64x2:
+ *p++ = X_3_8_5_6_5_5(X011, X01110011, vM, X110111, vN, vD);
+ break;
+ case ARM64vecb_FDIV64x2:
+ *p++ = X_3_8_5_6_5_5(X011, X01110011, vM, X111111, vN, vD);
+ break;
+ default:
+ goto bad;
+ }
+ goto done;
+ }
+ case ARM64in_VNarrowV: {
+ /* 31 23 21 15 9 4
+ 000 01110 00 1,00001 001010 n d XTN Vd.8b, Vn.8h
+ 000 01110 01 1,00001 001010 n d XTN Vd.4h, Vn.4s
+ 000 01110 10 1,00001 001010 n d XTN Vd.2s, Vn.2d
+ */
+ UInt vD = qregNo(i->ARM64in.VNarrowV.dst);
+ UInt vN = qregNo(i->ARM64in.VNarrowV.src);
+ UInt dszBlg2 = i->ARM64in.VNarrowV.dszBlg2;
+ vassert(dszBlg2 >= 0 && dszBlg2 <= 2);
+ *p++ = X_3_8_5_6_5_5(X000, X01110001 | (dszBlg2 << 1),
+ X00001, X001010, vN, vD);
+ goto done;
+ }
//ZZ case ARMin_VAluS: {
//ZZ UInt dN = fregNo(i->ARMin.VAluS.argL);
//ZZ UInt dD = fregNo(i->ARMin.VAluS.dst);
Modified: trunk/priv/host_arm64_defs.h
==============================================================================
--- trunk/priv/host_arm64_defs.h (original)
+++ trunk/priv/host_arm64_defs.h Sun Jan 26 19:11:14 2014
@@ -119,7 +119,7 @@
typedef
enum {
- ARM64am_RI9=1, /* reg + simm9 */
+ ARM64am_RI9=10, /* reg + simm9 */
ARM64am_RI12, /* reg + uimm12 * szB (iow, scaled by access size) */
ARM64am_RR /* reg1 + reg2 */
}
@@ -155,8 +155,8 @@
typedef
enum {
- ARM64riA_I12=4, /* uimm12 << 0 or 12 only */
- ARM64riA_R /* reg */
+ ARM64riA_I12=20, /* uimm12 << 0 or 12 only */
+ ARM64riA_R /* reg */
}
ARM64RIATag;
@@ -212,7 +212,7 @@
typedef
enum {
- ARM64ri6_I6=8, /* uimm6, 1 .. 63 only */
+ ARM64ri6_I6=30, /* uimm6, 1 .. 63 only */
ARM64ri6_R /* reg */
}
ARM64RI6Tag;
@@ -239,7 +239,7 @@
typedef
enum {
- ARM64lo_AND=10,
+ ARM64lo_AND=40,
ARM64lo_OR,
ARM64lo_XOR
}
@@ -247,7 +247,7 @@
typedef
enum {
- ARM64sh_SHL=13,
+ ARM64sh_SHL=50,
ARM64sh_SHR,
ARM64sh_SAR
}
@@ -255,7 +255,7 @@
typedef
enum {
- ARM64un_NEG=16,
+ ARM64un_NEG=60,
ARM64un_NOT,
ARM64un_CLZ,
}
@@ -263,7 +263,7 @@
typedef
enum {
- ARM64mul_PLAIN=60, /* lo64(64 * 64) */
+ ARM64mul_PLAIN=70, /* lo64(64 * 64) */
ARM64mul_ZX, /* hi64(64 *u 64) */
ARM64mul_SX /* hi64(64 *s 64) */
}
@@ -273,7 +273,7 @@
/* These characterise an integer-FP conversion, but don't imply any
particular direction. */
enum {
- ARM64cvt_F32_I32S=65,
+ ARM64cvt_F32_I32S=80,
ARM64cvt_F64_I32S,
ARM64cvt_F32_I64S,
ARM64cvt_F64_I64S,
@@ -287,7 +287,7 @@
typedef
enum {
- ARM64fpb_ADD=75,
+ ARM64fpb_ADD=100,
ARM64fpb_SUB,
ARM64fpb_MUL,
ARM64fpb_DIV,
@@ -297,7 +297,7 @@
typedef
enum {
- ARM64fpu_NEG=82,
+ ARM64fpu_NEG=110,
ARM64fpu_ABS,
ARM64fpu_SQRT,
ARM64fpu_RINT,
@@ -305,6 +305,22 @@
}
ARM64FpUnaryOp;
+typedef
+ enum {
+ ARM64vecb_ADD64x2=120,
+ ARM64vecb_ADD32x4,
+ ARM64vecb_ADD16x8,
+ ARM64vecb_SUB64x2,
+ ARM64vecb_SUB32x4,
+ ARM64vecb_SUB16x8,
+ ARM64vecb_FADD64x2,
+ ARM64vecb_FSUB64x2,
+ ARM64vecb_FMUL64x2,
+ ARM64vecb_FDIV64x2,
+ ARM64vecb_INVALID
+ }
+ ARM64VecBinOp;
+
//ZZ extern const HChar* showARMVfpUna...
[truncated message content] |
|
From: <sv...@va...> - 2014-01-26 18:37:05
|
Author: sewardj
Date: Sun Jan 26 18:36:52 2014
New Revision: 13781
Log:
Handle and instrument an extra rounding-mode argument as added by
vex r2809 to the following primops:
Iop_Add32Fx4, Iop_Sub32Fx4, Iop_Mul32Fx4, Iop_Div32Fx4,
Iop_Add64Fx2, Iop_Sub64Fx2, Iop_Mul64Fx2, Iop_Div64Fx2,
Iop_Add64Fx4, Iop_Sub64Fx4, Iop_Mul64Fx4, Iop_Div64Fx4,
Iop_Add32Fx8, Iop_Sub32Fx8, Iop_Mul32Fx8, Iop_Div32Fx8,
Modified:
trunk/memcheck/mc_translate.c
Modified: trunk/memcheck/mc_translate.c
==============================================================================
--- trunk/memcheck/mc_translate.c (original)
+++ trunk/memcheck/mc_translate.c Sun Jan 26 18:36:52 2014
@@ -398,6 +398,7 @@
case Ity_I64: return IRExpr_Const(IRConst_U64(0));
case Ity_I128: return i128_const_zero();
case Ity_V128: return IRExpr_Const(IRConst_V128(0x0000));
+ case Ity_V256: return IRExpr_Const(IRConst_V256(0x00000000));
default: VG_(tool_panic)("memcheck:definedOfType");
}
}
@@ -767,6 +768,21 @@
return assignNew('V', mce, Ity_I64, binop(Iop_32HLto64, tmp, tmp));
}
+ if (src_ty == Ity_I32 && dst_ty == Ity_V128) {
+ /* PCast the arg, then clone it 4 times. */
+ IRAtom* tmp = assignNew('V', mce, Ity_I32, unop(Iop_CmpwNEZ32, vbits));
+ tmp = assignNew('V', mce, Ity_I64, binop(Iop_32HLto64, tmp, tmp));
+ return assignNew('V', mce, Ity_V128, binop(Iop_64HLtoV128, tmp, tmp));
+ }
+
+ if (src_ty == Ity_I32 && dst_ty == Ity_V256) {
+ /* PCast the arg, then clone it 8 times. */
+ IRAtom* tmp = assignNew('V', mce, Ity_I32, unop(Iop_CmpwNEZ32, vbits));
+ tmp = assignNew('V', mce, Ity_I64, binop(Iop_32HLto64, tmp, tmp));
+ tmp = assignNew('V', mce, Ity_V128, binop(Iop_64HLtoV128, tmp, tmp));
+ return assignNew('V', mce, Ity_V256, binop(Iop_V128HLtoV256, tmp, tmp));
+ }
+
if (src_ty == Ity_I64 && dst_ty == Ity_I32) {
/* PCast the arg. This gives all 0s or all 1s. Then throw away
the top half. */
@@ -2244,6 +2260,69 @@
return at;
}
+/* --- 64Fx2 binary FP ops, with rounding mode --- */
+
+static
+IRAtom* binary64Fx2_w_rm ( MCEnv* mce, IRAtom* vRM,
+ IRAtom* vatomX, IRAtom* vatomY )
+{
+ /* This is the same as binary64Fx2, except that we subsequently
+ pessimise vRM (definedness of the rounding mode), widen to 128
+ bits and UifU it into the result. As with the scalar cases, if
+ the RM is a constant then it is defined and so this extra bit
+ will get constant-folded out later. */
+ // "do" the vector args
+ IRAtom* t1 = binary64Fx2(mce, vatomX, vatomY);
+ // PCast the RM, and widen it to 128 bits
+ IRAtom* t2 = mkPCastTo(mce, Ity_V128, vRM);
+ // Roll it into the result
+ t1 = mkUifUV128(mce, t1, t2);
+ return t1;
+}
+
+/* --- ... and ... 32Fx4 versions of the same --- */
+
+static
+IRAtom* binary32Fx4_w_rm ( MCEnv* mce, IRAtom* vRM,
+ IRAtom* vatomX, IRAtom* vatomY )
+{
+ IRAtom* t1 = binary32Fx4(mce, vatomX, vatomY);
+ // PCast the RM, and widen it to 128 bits
+ IRAtom* t2 = mkPCastTo(mce, Ity_V128, vRM);
+ // Roll it into the result
+ t1 = mkUifUV128(mce, t1, t2);
+ return t1;
+}
+
+/* --- ... and ... 64Fx4 versions of the same --- */
+
+static
+IRAtom* binary64Fx4_w_rm ( MCEnv* mce, IRAtom* vRM,
+ IRAtom* vatomX, IRAtom* vatomY )
+{
+ IRAtom* t1 = binary64Fx4(mce, vatomX, vatomY);
+ // PCast the RM, and widen it to 256 bits
+ IRAtom* t2 = mkPCastTo(mce, Ity_V256, vRM);
+ // Roll it into the result
+ t1 = mkUifUV256(mce, t1, t2);
+ return t1;
+}
+
+/* --- ... and ... 32Fx8 versions of the same --- */
+
+static
+IRAtom* binary32Fx8_w_rm ( MCEnv* mce, IRAtom* vRM,
+ IRAtom* vatomX, IRAtom* vatomY )
+{
+ IRAtom* t1 = binary32Fx8(mce, vatomX, vatomY);
+ // PCast the RM, and widen it to 256 bits
+ IRAtom* t2 = mkPCastTo(mce, Ity_V256, vRM);
+ // Roll it into the result
+ t1 = mkUifUV256(mce, t1, t2);
+ return t1;
+}
+
+
/* --- --- Vector saturated narrowing --- --- */
/* We used to do something very clever here, but on closer inspection
@@ -2715,6 +2794,31 @@
complainIfUndefined(mce, atom3, NULL);
return assignNew('V', mce, Ity_V128, triop(op, vatom1, vatom2, atom3));
+ /* Vector FP with rounding mode as the first arg */
+ case Iop_Add64Fx2:
+ case Iop_Sub64Fx2:
+ case Iop_Mul64Fx2:
+ case Iop_Div64Fx2:
+ return binary64Fx2_w_rm(mce, vatom1, vatom2, vatom3);
+
+ case Iop_Add32Fx4:
+ case Iop_Sub32Fx4:
+ case Iop_Mul32Fx4:
+ case Iop_Div32Fx4:
+ return binary32Fx4_w_rm(mce, vatom1, vatom2, vatom3);
+
+ case Iop_Add64Fx4:
+ case Iop_Sub64Fx4:
+ case Iop_Mul64Fx4:
+ case Iop_Div64Fx4:
+ return binary64Fx4_w_rm(mce, vatom1, vatom2, vatom3);
+
+ case Iop_Add32Fx8:
+ case Iop_Sub32Fx8:
+ case Iop_Mul32Fx8:
+ case Iop_Div32Fx8:
+ return binary32Fx8_w_rm(mce, vatom1, vatom2, vatom3);
+
default:
ppIROp(op);
VG_(tool_panic)("memcheck:expr2vbits_Triop");
@@ -3175,16 +3279,12 @@
case Iop_QNarrowBin16Sto8Ux16:
return vectorNarrowBinV128(mce, op, vatom1, vatom2);
- case Iop_Sub64Fx2:
- case Iop_Mul64Fx2:
case Iop_Min64Fx2:
case Iop_Max64Fx2:
- case Iop_Div64Fx2:
case Iop_CmpLT64Fx2:
case Iop_CmpLE64Fx2:
case Iop_CmpEQ64Fx2:
case Iop_CmpUN64Fx2:
- case Iop_Add64Fx2:
return binary64Fx2(mce, vatom1, vatom2);
case Iop_Sub64F0x2:
@@ -3199,18 +3299,14 @@
case Iop_Add64F0x2:
return binary64F0x2(mce, vatom1, vatom2);
- case Iop_Sub32Fx4:
- case Iop_Mul32Fx4:
case Iop_Min32Fx4:
case Iop_Max32Fx4:
- case Iop_Div32Fx4:
case Iop_CmpLT32Fx4:
case Iop_CmpLE32Fx4:
case Iop_CmpEQ32Fx4:
case Iop_CmpUN32Fx4:
case Iop_CmpGT32Fx4:
case Iop_CmpGE32Fx4:
- case Iop_Add32Fx4:
case Iop_Recps32Fx4:
case Iop_Rsqrts32Fx4:
return binary32Fx4(mce, vatom1, vatom2);
@@ -3417,18 +3513,10 @@
/* V256-bit SIMD */
- case Iop_Add64Fx4:
- case Iop_Sub64Fx4:
- case Iop_Mul64Fx4:
- case Iop_Div64Fx4:
case Iop_Max64Fx4:
case Iop_Min64Fx4:
return binary64Fx4(mce, vatom1, vatom2);
- case Iop_Add32Fx8:
- case Iop_Sub32Fx8:
- case Iop_Mul32Fx8:
- case Iop_Div32Fx8:
case Iop_Max32Fx8:
case Iop_Min32Fx8:
return binary32Fx8(mce, vatom1, vatom2);
@@ -5730,6 +5818,7 @@
case Ico_F32i: return False;
case Ico_F64i: return False;
case Ico_V128: return False;
+ case Ico_V256: return False;
default: ppIRExpr(at); tl_assert(0);
}
/* VG_(printf)("%llx\n", n); */
|
Author: sewardj
Date: Sun Jan 26 18:34:23 2014
New Revision: 2809
Log:
Make the following primops take a third (initial) argument to
indicate the rounding mode to use, like their scalar cousins do:
Iop_Add32Fx4 Iop_Sub32Fx4 Iop_Mul32Fx4 Iop_Div32Fx4
Iop_Add64Fx2 Iop_Sub64Fx2 Iop_Mul64Fx2 Iop_Div64Fx2
Iop_Add64Fx4 Iop_Sub64Fx4 Iop_Mul64Fx4 Iop_Div64Fx4
Iop_Add32Fx8 Iop_Sub32Fx8 Iop_Mul32Fx8 Iop_Div32Fx8
Fix up the x86 and amd64 front ends to add fake rounding modes
(Irrm_NEAREST) when generating expressions using these primops.
Fix up the x86 and amd64 back ends to accept these as triops
rather than as binops, and ignore the first arg.
Add three more ir_opt folding rules to remove memcheck
instrumentation arising from instrumentation of known-defined
rounding modes.
Overall functional and performance effects should be zero.
Modified:
trunk/priv/guest_amd64_toIR.c
trunk/priv/guest_x86_toIR.c
trunk/priv/host_amd64_isel.c
trunk/priv/host_x86_isel.c
trunk/priv/ir_defs.c
trunk/priv/ir_opt.c
trunk/pub/libvex_ir.h
Modified: trunk/priv/guest_amd64_toIR.c
==============================================================================
--- trunk/priv/guest_amd64_toIR.c (original)
+++ trunk/priv/guest_amd64_toIR.c Sun Jan 26 18:34:23 2014
@@ -8548,6 +8548,32 @@
/*--- SSE/SSE2/SSE3 helpers ---*/
/*------------------------------------------------------------*/
+/* Indicates whether the op requires a rounding-mode argument. Note
+ that this covers only vector floating point arithmetic ops, and
+ omits the scalar ones that need rounding modes. Note also that
+ inconsistencies here will get picked up later by the IR sanity
+ checker, so this isn't correctness-critical. */
+static Bool requiresRMode ( IROp op )
+{
+ switch (op) {
+ /* 128 bit ops */
+ case Iop_Add32Fx4: case Iop_Sub32Fx4:
+ case Iop_Mul32Fx4: case Iop_Div32Fx4:
+ case Iop_Add64Fx2: case Iop_Sub64Fx2:
+ case Iop_Mul64Fx2: case Iop_Div64Fx2:
+ /* 256 bit ops */
+ case Iop_Add32Fx8: case Iop_Sub32Fx8:
+ case Iop_Mul32Fx8: case Iop_Div32Fx8:
+ case Iop_Add64Fx4: case Iop_Sub64Fx4:
+ case Iop_Mul64Fx4: case Iop_Div64Fx4:
+ return True;
+ default:
+ break;
+ }
+ return False;
+}
+
+
/* Worker function; do not call directly.
Handles full width G = G `op` E and G = (not G) `op` E.
*/
@@ -8563,22 +8589,35 @@
Int alen;
IRTemp addr;
UChar rm = getUChar(delta);
+ Bool needsRMode = requiresRMode(op);
IRExpr* gpart
= invertG ? unop(Iop_NotV128, getXMMReg(gregOfRexRM(pfx,rm)))
: getXMMReg(gregOfRexRM(pfx,rm));
if (epartIsReg(rm)) {
- putXMMReg( gregOfRexRM(pfx,rm),
- binop(op, gpart,
- getXMMReg(eregOfRexRM(pfx,rm))) );
+ putXMMReg(
+ gregOfRexRM(pfx,rm),
+ needsRMode
+ ? triop(op, get_FAKE_roundingmode(), /* XXXROUNDINGFIXME */
+ gpart,
+ getXMMReg(eregOfRexRM(pfx,rm)))
+ : binop(op, gpart,
+ getXMMReg(eregOfRexRM(pfx,rm)))
+ );
DIP("%s %s,%s\n", opname,
nameXMMReg(eregOfRexRM(pfx,rm)),
nameXMMReg(gregOfRexRM(pfx,rm)) );
return delta+1;
} else {
addr = disAMode ( &alen, vbi, pfx, delta, dis_buf, 0 );
- putXMMReg( gregOfRexRM(pfx,rm),
- binop(op, gpart,
- loadLE(Ity_V128, mkexpr(addr))) );
+ putXMMReg(
+ gregOfRexRM(pfx,rm),
+ needsRMode
+ ? triop(op, get_FAKE_roundingmode(), /* XXXROUNDINGFIXME */
+ gpart,
+ loadLE(Ity_V128, mkexpr(addr)))
+ : binop(op, gpart,
+ loadLE(Ity_V128, mkexpr(addr)))
+ );
DIP("%s %s,%s\n", opname,
dis_buf,
nameXMMReg(gregOfRexRM(pfx,rm)) );
@@ -10982,9 +11021,11 @@
IRTemp subV = newTemp(Ity_V128);
IRTemp a1 = newTemp(Ity_I64);
IRTemp s0 = newTemp(Ity_I64);
+ IRTemp rm = newTemp(Ity_I32);
- assign( addV, binop(Iop_Add64Fx2, mkexpr(dV), mkexpr(sV)) );
- assign( subV, binop(Iop_Sub64Fx2, mkexpr(dV), mkexpr(sV)) );
+ assign( rm, get_FAKE_roundingmode() ); /* XXXROUNDINGFIXME */
+ assign( addV, triop(Iop_Add64Fx2, mkexpr(rm), mkexpr(dV), mkexpr(sV)) );
+ assign( subV, triop(Iop_Sub64Fx2, mkexpr(rm), mkexpr(dV), mkexpr(sV)) );
assign( a1, unop(Iop_V128HIto64, mkexpr(addV) ));
assign( s0, unop(Iop_V128to64, mkexpr(subV) ));
@@ -11000,10 +11041,12 @@
IRTemp a3, a2, a1, a0, s3, s2, s1, s0;
IRTemp addV = newTemp(Ity_V256);
IRTemp subV = newTemp(Ity_V256);
+ IRTemp rm = newTemp(Ity_I32);
a3 = a2 = a1 = a0 = s3 = s2 = s1 = s0 = IRTemp_INVALID;
- assign( addV, binop(Iop_Add64Fx4, mkexpr(dV), mkexpr(sV)) );
- assign( subV, binop(Iop_Sub64Fx4, mkexpr(dV), mkexpr(sV)) );
+ assign( rm, get_FAKE_roundingmode() ); /* XXXROUNDINGFIXME */
+ assign( addV, triop(Iop_Add64Fx4, mkexpr(rm), mkexpr(dV), mkexpr(sV)) );
+ assign( subV, triop(Iop_Sub64Fx4, mkexpr(rm), mkexpr(dV), mkexpr(sV)) );
breakupV256to64s( addV, &a3, &a2, &a1, &a0 );
breakupV256to64s( subV, &s3, &s2, &s1, &s0 );
@@ -11019,10 +11062,12 @@
IRTemp a3, a2, a1, a0, s3, s2, s1, s0;
IRTemp addV = newTemp(Ity_V128);
IRTemp subV = newTemp(Ity_V128);
+ IRTemp rm = newTemp(Ity_I32);
a3 = a2 = a1 = a0 = s3 = s2 = s1 = s0 = IRTemp_INVALID;
- assign( addV, binop(Iop_Add32Fx4, mkexpr(dV), mkexpr(sV)) );
- assign( subV, binop(Iop_Sub32Fx4, mkexpr(dV), mkexpr(sV)) );
+ assign( rm, get_FAKE_roundingmode() ); /* XXXROUNDINGFIXME */
+ assign( addV, triop(Iop_Add32Fx4, mkexpr(rm), mkexpr(dV), mkexpr(sV)) );
+ assign( subV, triop(Iop_Sub32Fx4, mkexpr(rm), mkexpr(dV), mkexpr(sV)) );
breakupV128to32s( addV, &a3, &a2, &a1, &a0 );
breakupV128to32s( subV, &s3, &s2, &s1, &s0 );
@@ -11039,11 +11084,13 @@
IRTemp s7, s6, s5, s4, s3, s2, s1, s0;
IRTemp addV = newTemp(Ity_V256);
IRTemp subV = newTemp(Ity_V256);
+ IRTemp rm = newTemp(Ity_I32);
a7 = a6 = a5 = a4 = a3 = a2 = a1 = a0 = IRTemp_INVALID;
s7 = s6 = s5 = s4 = s3 = s2 = s1 = s0 = IRTemp_INVALID;
- assign( addV, binop(Iop_Add32Fx8, mkexpr(dV), mkexpr(sV)) );
- assign( subV, binop(Iop_Sub32Fx8, mkexpr(dV), mkexpr(sV)) );
+ assign( rm, get_FAKE_roundingmode() ); /* XXXROUNDINGFIXME */
+ assign( addV, triop(Iop_Add32Fx8, mkexpr(rm), mkexpr(dV), mkexpr(sV)) );
+ assign( subV, triop(Iop_Sub32Fx8, mkexpr(rm), mkexpr(dV), mkexpr(sV)) );
breakupV256to32s( addV, &a7, &a6, &a5, &a4, &a3, &a2, &a1, &a0 );
breakupV256to32s( subV, &s7, &s6, &s5, &s4, &s3, &s2, &s1, &s0 );
@@ -14594,6 +14641,7 @@
IRTemp s3, s2, s1, s0, d3, d2, d1, d0;
IRTemp leftV = newTemp(Ity_V128);
IRTemp rightV = newTemp(Ity_V128);
+ IRTemp rm = newTemp(Ity_I32);
s3 = s2 = s1 = s0 = d3 = d2 = d1 = d0 = IRTemp_INVALID;
breakupV128to32s( sV, &s3, &s2, &s1, &s0 );
@@ -14603,8 +14651,9 @@
assign( rightV, mkV128from32s( s3, s1, d3, d1 ) );
IRTemp res = newTemp(Ity_V128);
- assign( res, binop(isAdd ? Iop_Add32Fx4 : Iop_Sub32Fx4,
- mkexpr(leftV), mkexpr(rightV) ) );
+ assign( rm, get_FAKE_roundingmode() ); /* XXXROUNDINGFIXME */
+ assign( res, triop(isAdd ? Iop_Add32Fx4 : Iop_Sub32Fx4,
+ mkexpr(rm), mkexpr(leftV), mkexpr(rightV) ) );
return res;
}
@@ -14614,6 +14663,7 @@
IRTemp s1, s0, d1, d0;
IRTemp leftV = newTemp(Ity_V128);
IRTemp rightV = newTemp(Ity_V128);
+ IRTemp rm = newTemp(Ity_I32);
s1 = s0 = d1 = d0 = IRTemp_INVALID;
breakupV128to64s( sV, &s1, &s0 );
@@ -14623,8 +14673,9 @@
assign( rightV, binop(Iop_64HLtoV128, mkexpr(s1), mkexpr(d1)) );
IRTemp res = newTemp(Ity_V128);
- assign( res, binop(isAdd ? Iop_Add64Fx2 : Iop_Sub64Fx2,
- mkexpr(leftV), mkexpr(rightV) ) );
+ assign( rm, get_FAKE_roundingmode() ); /* XXXROUNDINGFIXME */
+ assign( res, triop(isAdd ? Iop_Add64Fx2 : Iop_Sub64Fx2,
+ mkexpr(rm), mkexpr(leftV), mkexpr(rightV) ) );
return res;
}
@@ -18271,8 +18322,11 @@
UShort imm8_perms[4] = { 0x0000, 0x00FF, 0xFF00, 0xFFFF };
IRTemp and_vec = newTemp(Ity_V128);
IRTemp sum_vec = newTemp(Ity_V128);
+ IRTemp rm = newTemp(Ity_I32);
+ assign( rm, get_FAKE_roundingmode() ); /* XXXROUNDINGFIXME */
assign( and_vec, binop( Iop_AndV128,
- binop( Iop_Mul64Fx2,
+ triop( Iop_Mul64Fx2,
+ mkexpr(rm),
mkexpr(dst_vec), mkexpr(src_vec) ),
mkV128( imm8_perms[ ((imm8 >> 4) & 3) ] ) ) );
@@ -18296,6 +18350,7 @@
IRTemp tmp_prod_vec = newTemp(Ity_V128);
IRTemp prod_vec = newTemp(Ity_V128);
IRTemp sum_vec = newTemp(Ity_V128);
+ IRTemp rm = newTemp(Ity_I32);
IRTemp v3, v2, v1, v0;
v3 = v2 = v1 = v0 = IRTemp_INVALID;
UShort imm8_perms[16] = { 0x0000, 0x000F, 0x00F0, 0x00FF, 0x0F00,
@@ -18303,15 +18358,17 @@
0xF0F0, 0xF0FF, 0xFF00, 0xFF0F, 0xFFF0,
0xFFFF };
+ assign( rm, get_FAKE_roundingmode() ); /* XXXROUNDINGFIXME */
assign( tmp_prod_vec,
binop( Iop_AndV128,
- binop( Iop_Mul32Fx4, mkexpr(dst_vec),
- mkexpr(src_vec) ),
+ triop( Iop_Mul32Fx4,
+ mkexpr(rm), mkexpr(dst_vec), mkexpr(src_vec) ),
mkV128( imm8_perms[((imm8 >> 4)& 15)] ) ) );
breakupV128to32s( tmp_prod_vec, &v3, &v2, &v1, &v0 );
assign( prod_vec, mkV128from32s( v3, v1, v2, v0 ) );
- assign( sum_vec, binop( Iop_Add32Fx4,
+ assign( sum_vec, triop( Iop_Add32Fx4,
+ mkexpr(rm),
binop( Iop_InterleaveHI32x4,
mkexpr(prod_vec), mkexpr(prod_vec) ),
binop( Iop_InterleaveLO32x4,
@@ -18319,7 +18376,8 @@
IRTemp res = newTemp(Ity_V128);
assign( res, binop( Iop_AndV128,
- binop( Iop_Add32Fx4,
+ triop( Iop_Add32Fx4,
+ mkexpr(rm),
binop( Iop_InterleaveHI32x4,
mkexpr(sum_vec), mkexpr(sum_vec) ),
binop( Iop_InterleaveLO32x4,
@@ -21898,8 +21956,17 @@
if (op != Iop_INVALID) {
vassert(opFn == NULL);
res = newTemp(Ity_V128);
- assign(res, swapArgs ? binop(op, mkexpr(tSR), mkexpr(tSL))
- : binop(op, mkexpr(tSL), mkexpr(tSR)));
+ if (requiresRMode(op)) {
+ IRTemp rm = newTemp(Ity_I32);
+ assign(rm, get_FAKE_roundingmode()); /* XXXROUNDINGFIXME */
+ assign(res, swapArgs
+ ? triop(op, mkexpr(rm), mkexpr(tSR), mkexpr(tSL))
+ : triop(op, mkexpr(rm), mkexpr(tSL), mkexpr(tSR)));
+ } else {
+ assign(res, swapArgs
+ ? binop(op, mkexpr(tSR), mkexpr(tSL))
+ : binop(op, mkexpr(tSL), mkexpr(tSR)));
+ }
} else {
vassert(opFn != NULL);
res = swapArgs ? opFn(tSR, tSL) : opFn(tSL, tSR);
@@ -22802,8 +22869,17 @@
if (op != Iop_INVALID) {
vassert(opFn == NULL);
res = newTemp(Ity_V256);
- assign(res, swapArgs ? binop(op, mkexpr(tSR), mkexpr(tSL))
- : binop(op, mkexpr(tSL), mkexpr(tSR)));
+ if (requiresRMode(op)) {
+ IRTemp rm = newTemp(Ity_I32);
+ assign(rm, get_FAKE_roundingmode()); /* XXXROUNDINGFIXME */
+ assign(res, swapArgs
+ ? triop(op, mkexpr(rm), mkexpr(tSR), mkexpr(tSL))
+ : triop(op, mkexpr(rm), mkexpr(tSL), mkexpr(tSR)));
+ } else {
+ assign(res, swapArgs
+ ? binop(op, mkexpr(tSR), mkexpr(tSL))
+ : binop(op, mkexpr(tSL), mkexpr(tSR)));
+ }
} else {
vassert(opFn != NULL);
res = swapArgs ? opFn(tSR, tSL) : opFn(tSL, tSR);
Modified: trunk/priv/guest_x86_toIR.c
==============================================================================
--- trunk/priv/guest_x86_toIR.c (original)
+++ trunk/priv/guest_x86_toIR.c Sun Jan 26 18:34:23 2014
@@ -6856,6 +6856,27 @@
/*--- SSE/SSE2/SSE3 helpers ---*/
/*------------------------------------------------------------*/
+/* Indicates whether the op requires a rounding-mode argument. Note
+ that this covers only vector floating point arithmetic ops, and
+ omits the scalar ones that need rounding modes. Note also that
+ inconsistencies here will get picked up later by the IR sanity
+ checker, so this isn't correctness-critical. */
+static Bool requiresRMode ( IROp op )
+{
+ switch (op) {
+ /* 128 bit ops */
+ case Iop_Add32Fx4: case Iop_Sub32Fx4:
+ case Iop_Mul32Fx4: case Iop_Div32Fx4:
+ case Iop_Add64Fx2: case Iop_Sub64Fx2:
+ case Iop_Mul64Fx2: case Iop_Div64Fx2:
+ return True;
+ default:
+ break;
+ }
+ return False;
+}
+
+
/* Worker function; do not call directly.
Handles full width G = G `op` E and G = (not G) `op` E.
*/
@@ -6874,18 +6895,30 @@
= invertG ? unop(Iop_NotV128, getXMMReg(gregOfRM(rm)))
: getXMMReg(gregOfRM(rm));
if (epartIsReg(rm)) {
- putXMMReg( gregOfRM(rm),
- binop(op, gpart,
- getXMMReg(eregOfRM(rm))) );
+ putXMMReg(
+ gregOfRM(rm),
+ requiresRMode(op)
+ ? triop(op, get_FAKE_roundingmode(), /* XXXROUNDINGFIXME */
+ gpart,
+ getXMMReg(eregOfRM(rm)))
+ : binop(op, gpart,
+ getXMMReg(eregOfRM(rm)))
+ );
DIP("%s %s,%s\n", opname,
nameXMMReg(eregOfRM(rm)),
nameXMMReg(gregOfRM(rm)) );
return delta+1;
} else {
addr = disAMode ( &alen, sorb, delta, dis_buf );
- putXMMReg( gregOfRM(rm),
- binop(op, gpart,
- loadLE(Ity_V128, mkexpr(addr))) );
+ putXMMReg(
+ gregOfRM(rm),
+ requiresRMode(op)
+ ? triop(op, get_FAKE_roundingmode(), /* XXXROUNDINGFIXME */
+ gpart,
+ loadLE(Ity_V128, mkexpr(addr)))
+ : binop(op, gpart,
+ loadLE(Ity_V128, mkexpr(addr)))
+ );
DIP("%s %s,%s\n", opname,
dis_buf,
nameXMMReg(gregOfRM(rm)) );
@@ -11712,6 +11745,7 @@
IRTemp gV = newTemp(Ity_V128);
IRTemp addV = newTemp(Ity_V128);
IRTemp subV = newTemp(Ity_V128);
+ IRTemp rm = newTemp(Ity_I32);
a3 = a2 = a1 = a0 = s3 = s2 = s1 = s0 = IRTemp_INVALID;
modrm = insn[3];
@@ -11730,8 +11764,9 @@
assign( gV, getXMMReg(gregOfRM(modrm)) );
- assign( addV, binop(Iop_Add32Fx4, mkexpr(gV), mkexpr(eV)) );
- assign( subV, binop(Iop_Sub32Fx4, mkexpr(gV), mkexpr(eV)) );
+ assign( rm, get_FAKE_roundingmode() ); /* XXXROUNDINGFIXME */
+ assign( addV, triop(Iop_Add32Fx4, mkexpr(rm), mkexpr(gV), mkexpr(eV)) );
+ assign( subV, triop(Iop_Sub32Fx4, mkexpr(rm), mkexpr(gV), mkexpr(eV)) );
breakup128to32s( addV, &a3, &a2, &a1, &a0 );
breakup128to32s( subV, &s3, &s2, &s1, &s0 );
@@ -11748,6 +11783,7 @@
IRTemp subV = newTemp(Ity_V128);
IRTemp a1 = newTemp(Ity_I64);
IRTemp s0 = newTemp(Ity_I64);
+ IRTemp rm = newTemp(Ity_I32);
modrm = insn[2];
if (epartIsReg(modrm)) {
@@ -11765,8 +11801,9 @@
assign( gV, getXMMReg(gregOfRM(modrm)) );
- assign( addV, binop(Iop_Add64Fx2, mkexpr(gV), mkexpr(eV)) );
- assign( subV, binop(Iop_Sub64Fx2, mkexpr(gV), mkexpr(eV)) );
+ assign( rm, get_FAKE_roundingmode() ); /* XXXROUNDINGFIXME */
+ assign( addV, triop(Iop_Add64Fx2, mkexpr(rm), mkexpr(gV), mkexpr(eV)) );
+ assign( subV, triop(Iop_Sub64Fx2, mkexpr(rm), mkexpr(gV), mkexpr(eV)) );
assign( a1, unop(Iop_V128HIto64, mkexpr(addV) ));
assign( s0, unop(Iop_V128to64, mkexpr(subV) ));
@@ -11785,6 +11822,7 @@
IRTemp gV = newTemp(Ity_V128);
IRTemp leftV = newTemp(Ity_V128);
IRTemp rightV = newTemp(Ity_V128);
+ IRTemp rm = newTemp(Ity_I32);
Bool isAdd = insn[2] == 0x7C;
const HChar* str = isAdd ? "add" : "sub";
e3 = e2 = e1 = e0 = g3 = g2 = g1 = g0 = IRTemp_INVALID;
@@ -11811,9 +11849,10 @@
assign( leftV, mk128from32s( e2, e0, g2, g0 ) );
assign( rightV, mk128from32s( e3, e1, g3, g1 ) );
+ assign( rm, get_FAKE_roundingmode() ); /* XXXROUNDINGFIXME */
putXMMReg( gregOfRM(modrm),
- binop(isAdd ? Iop_Add32Fx4 : Iop_Sub32Fx4,
- mkexpr(leftV), mkexpr(rightV) ) );
+ triop(isAdd ? Iop_Add32Fx4 : Iop_Sub32Fx4,
+ mkexpr(rm), mkexpr(leftV), mkexpr(rightV) ) );
goto decode_success;
}
@@ -11828,6 +11867,7 @@
IRTemp gV = newTemp(Ity_V128);
IRTemp leftV = newTemp(Ity_V128);
IRTemp rightV = newTemp(Ity_V128);
+ IRTemp rm = newTemp(Ity_I32);
Bool isAdd = insn[1] == 0x7C;
const HChar* str = isAdd ? "add" : "sub";
@@ -11855,9 +11895,10 @@
assign( leftV, binop(Iop_64HLtoV128, mkexpr(e0),mkexpr(g0)) );
assign( rightV, binop(Iop_64HLtoV128, mkexpr(e1),mkexpr(g1)) );
+ assign( rm, get_FAKE_roundingmode() ); /* XXXROUNDINGFIXME */
putXMMReg( gregOfRM(modrm),
- binop(isAdd ? Iop_Add64Fx2 : Iop_Sub64Fx2,
- mkexpr(leftV), mkexpr(rightV) ) );
+ triop(isAdd ? Iop_Add64Fx2 : Iop_Sub64Fx2,
+ mkexpr(rm), mkexpr(leftV), mkexpr(rightV) ) );
goto decode_success;
}
Modified: trunk/priv/host_amd64_isel.c
==============================================================================
--- trunk/priv/host_amd64_isel.c (original)
+++ trunk/priv/host_amd64_isel.c Sun Jan 26 18:34:23 2014
@@ -3355,12 +3355,8 @@
case Iop_CmpLT32Fx4: op = Asse_CMPLTF; goto do_32Fx4;
case Iop_CmpLE32Fx4: op = Asse_CMPLEF; goto do_32Fx4;
case Iop_CmpUN32Fx4: op = Asse_CMPUNF; goto do_32Fx4;
- case Iop_Add32Fx4: op = Asse_ADDF; goto do_32Fx4;
- case Iop_Div32Fx4: op = Asse_DIVF; goto do_32Fx4;
case Iop_Max32Fx4: op = Asse_MAXF; goto do_32Fx4;
case Iop_Min32Fx4: op = Asse_MINF; goto do_32Fx4;
- case Iop_Mul32Fx4: op = Asse_MULF; goto do_32Fx4;
- case Iop_Sub32Fx4: op = Asse_SUBF; goto do_32Fx4;
do_32Fx4:
{
HReg argL = iselVecExpr(env, e->Iex.Binop.arg1);
@@ -3375,12 +3371,8 @@
case Iop_CmpLT64Fx2: op = Asse_CMPLTF; goto do_64Fx2;
case Iop_CmpLE64Fx2: op = Asse_CMPLEF; goto do_64Fx2;
case Iop_CmpUN64Fx2: op = Asse_CMPUNF; goto do_64Fx2;
- case Iop_Add64Fx2: op = Asse_ADDF; goto do_64Fx2;
- case Iop_Div64Fx2: op = Asse_DIVF; goto do_64Fx2;
case Iop_Max64Fx2: op = Asse_MAXF; goto do_64Fx2;
case Iop_Min64Fx2: op = Asse_MINF; goto do_64Fx2;
- case Iop_Mul64Fx2: op = Asse_MULF; goto do_64Fx2;
- case Iop_Sub64Fx2: op = Asse_SUBF; goto do_64Fx2;
do_64Fx2:
{
HReg argL = iselVecExpr(env, e->Iex.Binop.arg1);
@@ -3660,6 +3652,47 @@
} /* switch (e->Iex.Binop.op) */
} /* if (e->tag == Iex_Binop) */
+ if (e->tag == Iex_Triop) {
+ IRTriop *triop = e->Iex.Triop.details;
+ switch (triop->op) {
+
+ case Iop_Add64Fx2: op = Asse_ADDF; goto do_64Fx2_w_rm;
+ case Iop_Sub64Fx2: op = Asse_SUBF; goto do_64Fx2_w_rm;
+ case Iop_Mul64Fx2: op = Asse_MULF; goto do_64Fx2_w_rm;
+ case Iop_Div64Fx2: op = Asse_DIVF; goto do_64Fx2_w_rm;
+ do_64Fx2_w_rm:
+ {
+ HReg argL = iselVecExpr(env, triop->arg2);
+ HReg argR = iselVecExpr(env, triop->arg3);
+ HReg dst = newVRegV(env);
+ addInstr(env, mk_vMOVsd_RR(argL, dst));
+ /* XXXROUNDINGFIXME */
+ /* set roundingmode here */
+ addInstr(env, AMD64Instr_Sse64Fx2(op, argR, dst));
+ return dst;
+ }
+
+ case Iop_Add32Fx4: op = Asse_ADDF; goto do_32Fx4_w_rm;
+ case Iop_Sub32Fx4: op = Asse_SUBF; goto do_32Fx4_w_rm;
+ case Iop_Mul32Fx4: op = Asse_MULF; goto do_32Fx4_w_rm;
+ case Iop_Div32Fx4: op = Asse_DIVF; goto do_32Fx4_w_rm;
+ do_32Fx4_w_rm:
+ {
+ HReg argL = iselVecExpr(env, triop->arg2);
+ HReg argR = iselVecExpr(env, triop->arg3);
+ HReg dst = newVRegV(env);
+ addInstr(env, mk_vMOVsd_RR(argL, dst));
+ /* XXXROUNDINGFIXME */
+ /* set roundingmode here */
+ addInstr(env, AMD64Instr_Sse32Fx4(op, argR, dst));
+ return dst;
+ }
+
+ default:
+ break;
+ } /* switch (triop->op) */
+ } /* if (e->tag == Iex_Triop) */
+
if (e->tag == Iex_ITE) { // VFD
HReg r1 = iselVecExpr(env, e->Iex.ITE.iftrue);
HReg r0 = iselVecExpr(env, e->Iex.ITE.iffalse);
@@ -3851,10 +3884,6 @@
if (e->tag == Iex_Binop) {
switch (e->Iex.Binop.op) {
- case Iop_Add64Fx4: op = Asse_ADDF; goto do_64Fx4;
- case Iop_Sub64Fx4: op = Asse_SUBF; goto do_64Fx4;
- case Iop_Mul64Fx4: op = Asse_MULF; goto do_64Fx4;
- case Iop_Div64Fx4: op = Asse_DIVF; goto do_64Fx4;
case Iop_Max64Fx4: op = Asse_MAXF; goto do_64Fx4;
case Iop_Min64Fx4: op = Asse_MINF; goto do_64Fx4;
do_64Fx4:
@@ -3873,10 +3902,6 @@
return;
}
- case Iop_Add32Fx8: op = Asse_ADDF; goto do_32Fx8;
- case Iop_Sub32Fx8: op = Asse_SUBF; goto do_32Fx8;
- case Iop_Mul32Fx8: op = Asse_MULF; goto do_32Fx8;
- case Iop_Div32Fx8: op = Asse_DIVF; goto do_32Fx8;
case Iop_Max32Fx8: op = Asse_MAXF; goto do_32Fx8;
case Iop_Min32Fx8: op = Asse_MINF; goto do_32Fx8;
do_32Fx8:
@@ -4145,6 +4170,60 @@
} /* switch (e->Iex.Binop.op) */
} /* if (e->tag == Iex_Binop) */
+ if (e->tag == Iex_Triop) {
+ IRTriop *triop = e->Iex.Triop.details;
+ switch (triop->op) {
+
+ case Iop_Add64Fx4: op = Asse_ADDF; goto do_64Fx4_w_rm;
+ case Iop_Sub64Fx4: op = Asse_SUBF; goto do_64Fx4_w_rm;
+ case Iop_Mul64Fx4: op = Asse_MULF; goto do_64Fx4_w_rm;
+ case Iop_Div64Fx4: op = Asse_DIVF; goto do_64Fx4_w_rm;
+ do_64Fx4_w_rm:
+ {
+ HReg argLhi, argLlo, argRhi, argRlo;
+ iselDVecExpr(&argLhi, &argLlo, env, triop->arg2);
+ iselDVecExpr(&argRhi, &argRlo, env, triop->arg3);
+ HReg dstHi = newVRegV(env);
+ HReg dstLo = newVRegV(env);
+ addInstr(env, mk_vMOVsd_RR(argLhi, dstHi));
+ addInstr(env, mk_vMOVsd_RR(argLlo, dstLo));
+ /* XXXROUNDINGFIXME */
+ /* set roundingmode here */
+ addInstr(env, AMD64Instr_Sse64Fx2(op, argRhi, dstHi));
+ addInstr(env, AMD64Instr_Sse64Fx2(op, argRlo, dstLo));
+ *rHi = dstHi;
+ *rLo = dstLo;
+ return;
+ }
+
+ case Iop_Add32Fx8: op = Asse_ADDF; goto do_32Fx8_w_rm;
+ case Iop_Sub32Fx8: op = Asse_SUBF; goto do_32Fx8_w_rm;
+ case Iop_Mul32Fx8: op = Asse_MULF; goto do_32Fx8_w_rm;
+ case Iop_Div32Fx8: op = Asse_DIVF; goto do_32Fx8_w_rm;
+ do_32Fx8_w_rm:
+ {
+ HReg argLhi, argLlo, argRhi, argRlo;
+ iselDVecExpr(&argLhi, &argLlo, env, triop->arg2);
+ iselDVecExpr(&argRhi, &argRlo, env, triop->arg3);
+ HReg dstHi = newVRegV(env);
+ HReg dstLo = newVRegV(env);
+ addInstr(env, mk_vMOVsd_RR(argLhi, dstHi));
+ addInstr(env, mk_vMOVsd_RR(argLlo, dstLo));
+ /* XXXROUNDINGFIXME */
+ /* set roundingmode here */
+ addInstr(env, AMD64Instr_Sse32Fx4(op, argRhi, dstHi));
+ addInstr(env, AMD64Instr_Sse32Fx4(op, argRlo, dstLo));
+ *rHi = dstHi;
+ *rLo = dstLo;
+ return;
+ }
+
+ default:
+ break;
+ } /* switch (triop->op) */
+ } /* if (e->tag == Iex_Triop) */
+
+
if (e->tag == Iex_Qop && e->Iex.Qop.details->op == Iop_64x4toV256) {
HReg rsp = hregAMD64_RSP();
HReg vHi = newVRegV(env);
Modified: trunk/priv/host_x86_isel.c
==============================================================================
--- trunk/priv/host_x86_isel.c (original)
+++ trunk/priv/host_x86_isel.c Sun Jan 26 18:34:23 2014
@@ -3554,12 +3554,8 @@
case Iop_CmpLT32Fx4: op = Xsse_CMPLTF; goto do_32Fx4;
case Iop_CmpLE32Fx4: op = Xsse_CMPLEF; goto do_32Fx4;
case Iop_CmpUN32Fx4: op = Xsse_CMPUNF; goto do_32Fx4;
- case Iop_Add32Fx4: op = Xsse_ADDF; goto do_32Fx4;
- case Iop_Div32Fx4: op = Xsse_DIVF; goto do_32Fx4;
case Iop_Max32Fx4: op = Xsse_MAXF; goto do_32Fx4;
case Iop_Min32Fx4: op = Xsse_MINF; goto do_32Fx4;
- case Iop_Mul32Fx4: op = Xsse_MULF; goto do_32Fx4;
- case Iop_Sub32Fx4: op = Xsse_SUBF; goto do_32Fx4;
do_32Fx4:
{
HReg argL = iselVecExpr(env, e->Iex.Binop.arg1);
@@ -3574,12 +3570,8 @@
case Iop_CmpLT64Fx2: op = Xsse_CMPLTF; goto do_64Fx2;
case Iop_CmpLE64Fx2: op = Xsse_CMPLEF; goto do_64Fx2;
case Iop_CmpUN64Fx2: op = Xsse_CMPUNF; goto do_64Fx2;
- case Iop_Add64Fx2: op = Xsse_ADDF; goto do_64Fx2;
- case Iop_Div64Fx2: op = Xsse_DIVF; goto do_64Fx2;
case Iop_Max64Fx2: op = Xsse_MAXF; goto do_64Fx2;
case Iop_Min64Fx2: op = Xsse_MINF; goto do_64Fx2;
- case Iop_Mul64Fx2: op = Xsse_MULF; goto do_64Fx2;
- case Iop_Sub64Fx2: op = Xsse_SUBF; goto do_64Fx2;
do_64Fx2:
{
HReg argL = iselVecExpr(env, e->Iex.Binop.arg1);
@@ -3790,6 +3782,50 @@
} /* switch (e->Iex.Binop.op) */
} /* if (e->tag == Iex_Binop) */
+
+ if (e->tag == Iex_Triop) {
+ IRTriop *triop = e->Iex.Triop.details;
+ switch (triop->op) {
+
+ case Iop_Add32Fx4: op = Xsse_ADDF; goto do_32Fx4_w_rm;
+ case Iop_Sub32Fx4: op = Xsse_SUBF; goto do_32Fx4_w_rm;
+ case Iop_Mul32Fx4: op = Xsse_MULF; goto do_32Fx4_w_rm;
+ case Iop_Div32Fx4: op = Xsse_DIVF; goto do_32Fx4_w_rm;
+ do_32Fx4_w_rm:
+ {
+ HReg argL = iselVecExpr(env, triop->arg2);
+ HReg argR = iselVecExpr(env, triop->arg3);
+ HReg dst = newVRegV(env);
+ addInstr(env, mk_vMOVsd_RR(argL, dst));
+ /* XXXROUNDINGFIXME */
+ /* set roundingmode here */
+ addInstr(env, X86Instr_Sse32Fx4(op, argR, dst));
+ return dst;
+ }
+
+ case Iop_Add64Fx2: op = Xsse_ADDF; goto do_64Fx2_w_rm;
+ case Iop_Sub64Fx2: op = Xsse_SUBF; goto do_64Fx2_w_rm;
+ case Iop_Mul64Fx2: op = Xsse_MULF; goto do_64Fx2_w_rm;
+ case Iop_Div64Fx2: op = Xsse_DIVF; goto do_64Fx2_w_rm;
+ do_64Fx2_w_rm:
+ {
+ HReg argL = iselVecExpr(env, triop->arg2);
+ HReg argR = iselVecExpr(env, triop->arg3);
+ HReg dst = newVRegV(env);
+ REQUIRE_SSE2;
+ addInstr(env, mk_vMOVsd_RR(argL, dst));
+ /* XXXROUNDINGFIXME */
+ /* set roundingmode here */
+ addInstr(env, X86Instr_Sse64Fx2(op, argR, dst));
+ return dst;
+ }
+
+ default:
+ break;
+ } /* switch (triop->op) */
+ } /* if (e->tag == Iex_Triop) */
+
+
if (e->tag == Iex_ITE) { // VFD
HReg r1 = iselVecExpr(env, e->Iex.ITE.iftrue);
HReg r0 = iselVecExpr(env, e->Iex.ITE.iffalse);
Modified: trunk/priv/ir_defs.c
==============================================================================
--- trunk/priv/ir_defs.c (original)
+++ trunk/priv/ir_defs.c Sun Jan 26 18:34:23 2014
@@ -2789,19 +2789,19 @@
case Iop_CmpEQ64F0x2: case Iop_CmpLT64F0x2:
case Iop_CmpLE32F0x4: case Iop_CmpUN32F0x4:
case Iop_CmpLE64F0x2: case Iop_CmpUN64F0x2:
- case Iop_Add32Fx4: case Iop_Add32F0x4:
- case Iop_Add64Fx2: case Iop_Add64F0x2:
- case Iop_Div32Fx4: case Iop_Div32F0x4:
- case Iop_Div64Fx2: case Iop_Div64F0x2:
+ case Iop_Add32F0x4:
+ case Iop_Add64F0x2:
+ case Iop_Div32F0x4:
+ case Iop_Div64F0x2:
case Iop_Max32Fx4: case Iop_Max32F0x4:
case Iop_PwMax32Fx4: case Iop_PwMin32Fx4:
case Iop_Max64Fx2: case Iop_Max64F0x2:
case Iop_Min32Fx4: case Iop_Min32F0x4:
case Iop_Min64Fx2: case Iop_Min64F0x2:
- case Iop_Mul32Fx4: case Iop_Mul32F0x4:
- case Iop_Mul64Fx2: case Iop_Mul64F0x2:
- case Iop_Sub32Fx4: case Iop_Sub32F0x4:
- case Iop_Sub64Fx2: case Iop_Sub64F0x2:
+ case Iop_Mul32F0x4:
+ case Iop_Mul64F0x2:
+ case Iop_Sub32F0x4:
+ case Iop_Sub64F0x2:
case Iop_AndV128: case Iop_OrV128: case Iop_XorV128:
case Iop_Add8x16: case Iop_Add16x8:
case Iop_Add32x4: case Iop_Add64x2:
@@ -2966,7 +2966,7 @@
case Iop_QDMulLong16Sx4: case Iop_QDMulLong32Sx2:
BINARY(Ity_I64, Ity_I64, Ity_V128);
- /* s390 specific */
+ /* s390 specific */
case Iop_MAddF32:
case Iop_MSubF32:
QUATERNARY(ity_RMode,Ity_F32,Ity_F32,Ity_F32, Ity_F32);
@@ -2984,6 +2984,18 @@
case Iop_DivF128:
TERNARY(ity_RMode,Ity_F128,Ity_F128, Ity_F128);
+ case Iop_Add64Fx2: case Iop_Sub64Fx2:
+ case Iop_Mul64Fx2: case Iop_Div64Fx2:
+ case Iop_Add32Fx4: case Iop_Sub32Fx4:
+ case Iop_Mul32Fx4: case Iop_Div32Fx4:
+ TERNARY(ity_RMode,Ity_V128,Ity_V128, Ity_V128);
+
+ case Iop_Add64Fx4: case Iop_Sub64Fx4:
+ case Iop_Mul64Fx4: case Iop_Div64Fx4:
+ case Iop_Add32Fx8: case Iop_Sub32Fx8:
+ case Iop_Mul32Fx8: case Iop_Div32Fx8:
+ TERNARY(ity_RMode,Ity_V256,Ity_V256, Ity_V256);
+
case Iop_NegF128:
case Iop_AbsF128:
UNARY(Ity_F128, Ity_F128);
@@ -3203,10 +3215,6 @@
case Iop_64x4toV256:
QUATERNARY(Ity_I64, Ity_I64, Ity_I64, Ity_I64, Ity_V256);
- case Iop_Add64Fx4: case Iop_Sub64Fx4:
- case Iop_Mul64Fx4: case Iop_Div64Fx4:
- case Iop_Add32Fx8: case Iop_Sub32Fx8:
- case Iop_Mul32Fx8: case Iop_Div32Fx8:
case Iop_AndV256: case Iop_OrV256:
case Iop_XorV256:
case Iop_Max32Fx8: case Iop_Min32Fx8:
Modified: trunk/priv/ir_opt.c
==============================================================================
--- trunk/priv/ir_opt.c (original)
+++ trunk/priv/ir_opt.c Sun Jan 26 18:34:23 2014
@@ -1186,6 +1186,22 @@
&& e->Iex.Const.con->Ico.U64 == 0);
}
+/* Is this literally IRExpr_Const(IRConst_V128(0)) ? */
+static Bool isZeroV128 ( IRExpr* e )
+{
+ return toBool( e->tag == Iex_Const
+ && e->Iex.Const.con->tag == Ico_V128
+ && e->Iex.Const.con->Ico.V128 == 0x0000);
+}
+
+/* Is this literally IRExpr_Const(IRConst_V256(0)) ? */
+static Bool isZeroV256 ( IRExpr* e )
+{
+ return toBool( e->tag == Iex_Const
+ && e->Iex.Const.con->tag == Ico_V256
+ && e->Iex.Const.con->Ico.V256 == 0x00000000);
+}
+
/* Is this an integer constant with value 0 ? */
static Bool isZeroU ( IRExpr* e )
{
@@ -1999,6 +2015,17 @@
}
break;
}
+ /* Same reasoning for the 256-bit version. */
+ case Iop_V128HLtoV256: {
+ IRExpr* argHi = e->Iex.Binop.arg1;
+ IRExpr* argLo = e->Iex.Binop.arg2;
+ if (isZeroV128(argHi) && isZeroV128(argLo)) {
+ e2 = IRExpr_Const(IRConst_V256(0));
+ } else {
+ goto unhandled;
+ }
+ break;
+ }
/* -- V128 stuff -- */
case Iop_InterleaveLO8x16: {
@@ -2175,6 +2202,29 @@
e2 = e->Iex.Binop.arg1;
break;
}
+ /* OrV128(t,0) ==> t */
+ if (e->Iex.Binop.op == Iop_OrV128) {
+ if (isZeroV128(e->Iex.Binop.arg2)) {
+ e2 = e->Iex.Binop.arg1;
+ break;
+ }
+ if (isZeroV128(e->Iex.Binop.arg1)) {
+ e2 = e->Iex.Binop.arg2;
+ break;
+ }
+ }
+ /* OrV256(t,0) ==> t */
+ if (e->Iex.Binop.op == Iop_OrV256) {
+ if (isZeroV256(e->Iex.Binop.arg2)) {
+ e2 = e->Iex.Binop.arg1;
+ break;
+ }
+ //Disabled because there's no known test case right now.
+ //if (isZeroV256(e->Iex.Binop.arg1)) {
+ // e2 = e->Iex.Binop.arg2;
+ // break;
+ //}
+ }
break;
case Iop_Xor8:
Modified: trunk/pub/libvex_ir.h
==============================================================================
--- trunk/pub/libvex_ir.h (original)
+++ trunk/pub/libvex_ir.h Sun Jan 26 18:34:23 2014
@@ -1242,8 +1242,8 @@
/* BCD arithmetic instructions, (V128, V128) -> V128
* The BCD format is the same as that used in the BCD<->DPB conversion
- * routines, except using 124 digits (vs 60) plus the trailing 4-bit signed code.
- * */
+ * routines, except using 124 digits (vs 60) plus the trailing 4-bit
+ * signed code. */
Iop_BCDAdd, Iop_BCDSub,
/* Conversion I64 -> D64 */
@@ -1256,8 +1256,10 @@
/* --- 32x4 vector FP --- */
- /* binary */
+ /* ternary :: IRRoundingMode(I32) x V128 x V128 -> V128 */
Iop_Add32Fx4, Iop_Sub32Fx4, Iop_Mul32Fx4, Iop_Div32Fx4,
+
+ /* binary */
Iop_Max32Fx4, Iop_Min32Fx4,
Iop_Add32Fx2, Iop_Sub32Fx2,
/* Note: For the following compares, the ppc and arm front-ends assume a
@@ -1328,8 +1330,10 @@
/* --- 64x2 vector FP --- */
- /* binary */
+ /* ternary :: IRRoundingMode(I32) x V128 x V128 -> V128 */
Iop_Add64Fx2, Iop_Sub64Fx2, Iop_Mul64Fx2, Iop_Div64Fx2,
+
+ /* binary */
Iop_Max64Fx2, Iop_Min64Fx2,
Iop_CmpEQ64Fx2, Iop_CmpLT64Fx2, Iop_CmpLE64Fx2, Iop_CmpUN64Fx2,
@@ -1660,14 +1664,10 @@
Iop_SHA512, Iop_SHA256,
/* ------------------ 256-bit SIMD FP. ------------------ */
- Iop_Add64Fx4,
- Iop_Sub64Fx4,
- Iop_Mul64Fx4,
- Iop_Div64Fx4,
- Iop_Add32Fx8,
- Iop_Sub32Fx8,
- Iop_Mul32Fx8,
- Iop_Div32Fx8,
+
+ /* ternary :: IRRoundingMode(I32) x V256 x V256 -> V256 */
+ Iop_Add64Fx4, Iop_Sub64Fx4, Iop_Mul64Fx4, Iop_Div64Fx4,
+ Iop_Add32Fx8, Iop_Sub32Fx8, Iop_Mul32Fx8, Iop_Div32Fx8,
Iop_Sqrt32Fx8,
Iop_Sqrt64Fx4,
|
|
From: Ivan S. <van...@gm...> - 2014-01-26 10:32:36
|
Currently valgrind prints lots of false positives on any alsa mixer program, due to unimplemented VKI_SNDRV_CTL_IOCTL_TLV_READ ioctl: ==2862== Conditional jump or move depends on uninitialised value(s) ==2862== at 0x4E72307: snd_tlv_get_dB_range (tlv.c:170) ==2862== by 0x4E8A74D: get_dB_range (simple_none.c:1162) ==2862== by 0x4E8A7EE: get_dB_range_ops (simple_none.c:1176) ==2862== by 0x4E8559C: snd_mixer_selem_get_playback_dB_range (simple.c:298) ==2862== Uninitialised value was created by a heap allocation ==2862== at 0x4C2CD7B: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==2862== by 0x4E8A57E: init_db_range (simple_none.c:1114) ==2862== by 0x4E8A714: get_dB_range (simple_none.c:1159) ==2862== by 0x4E8A7EE: get_dB_range_ops (simple_none.c:1176) ==2862== by 0x4E8559C: snd_mixer_selem_get_playback_dB_range (simple.c:298) The problem is that this ioctl uses flexible array member and valgrind doesn't know how many elements of this array are initialized. I implemented a patch to fix this problem https://github.com/sorokin/valgrind/commit/610b24f0668a373451da82b9fd948c674a2583c6 I also implemented a few more ioctls, they are not strictly necessary (at least on program I tested on), so I can remove them from patch. During implementing new ioctls, I discovered that on my x86-64 system 'cmd' argument of 'sys_ioctl' sometimes has 32 most significant bits 0 and sometimes 1. As I understand kernel receive arguments for syscall as if they have type '[unsigned] long', but signature of 'sys_ioctl' is long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg); So 32 most significant bits of 'cmd' are discarded in kernel. Therefore ioctl(15, 0xffffffffc008551a, 0x1041bf0) and ioctl(15, 0x00000000c008551a, 0x1041bf0) are the same ioctls. The patch to ignore these 32 bits in valgrind is here: https://github.com/sorokin/valgrind/commit/426ceb042b3bc04bea249ab5fb7931b452ee6bca Should you please review these patches, and if they are OK, I will submit them to KDE bugzilla, as it is said in http://valgrind.org/support/summary.html. |