You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
1
|
2
(27) |
3
|
4
|
|
5
|
6
(2) |
7
|
8
|
9
(2) |
10
|
11
|
|
12
|
13
|
14
(12) |
15
(4) |
16
(3) |
17
(2) |
18
(3) |
|
19
(2) |
20
|
21
|
22
(9) |
23
(3) |
24
(2) |
25
(5) |
|
26
(1) |
27
(1) |
28
(6) |
29
(3) |
30
|
31
|
|
|
From: Julian S. <js...@ac...> - 2020-01-02 13:30:12
|
I landed a fix, O2a7d3ae768f9e5b29acd5cb743c3fb13640a391c. It all seems a bit dubious to me, but given that the resulting code actually works, I don't see that we have much option here. J |
|
From: Julian S. <se...@so...> - 2020-01-02 13:27:58
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=2a7d3ae768f9e5b29acd5cb743c3fb13640a391c commit 2a7d3ae768f9e5b29acd5cb743c3fb13640a391c Author: Julian Seward <js...@ac...> Date: Thu Jan 2 14:27:24 2020 +0100 sys_statx: don't complain if both |filename| and |buf| are NULL. So as to work around the Rust library's dubious use of statx. Diff: --- coregrind/m_syswrap/syswrap-linux.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/coregrind/m_syswrap/syswrap-linux.c b/coregrind/m_syswrap/syswrap-linux.c index 87c513a..96c309e 100644 --- a/coregrind/m_syswrap/syswrap-linux.c +++ b/coregrind/m_syswrap/syswrap-linux.c @@ -3692,10 +3692,19 @@ PRE(sys_statx) PRINT("sys_statx ( %ld, %#" FMT_REGWORD "x(%s), %ld, %ld, %#" FMT_REGWORD "x )", (Word)ARG1,ARG2,(char*)(Addr)ARG2,(Word)ARG3,(Word)ARG4,ARG5); PRE_REG_READ5(long, "statx", - int, dirfd, char *, file_name, int, flags, + int, dirfd, char *, filename, int, flags, unsigned int, mask, struct statx *, buf); - PRE_MEM_RASCIIZ( "statx(file_name)", ARG2 ); - PRE_MEM_WRITE( "statx(buf)", ARG5, sizeof(struct vki_statx) ); + // Work around Rust's dubious use of statx, as described here: + // https://github.com/rust-lang/rust/blob/ + // ccd238309f9dce92a05a23c2959e2819668c69a4/ + // src/libstd/sys/unix/fs.rs#L128-L142 + // in which it passes NULL for both filename and buf, and then looks at the + // return value, so as to determine whether or not this syscall is supported. + Bool both_filename_and_buf_are_null = ARG2 == 0 && ARG5 == 0; + if (!both_filename_and_buf_are_null) { + PRE_MEM_RASCIIZ( "statx(filename)", ARG2 ); + PRE_MEM_WRITE( "statx(buf)", ARG5, sizeof(struct vki_statx) ); + } } POST(sys_statx) { |
|
From: Tom H. <to...@co...> - 2020-01-02 11:01:41
|
On 02/01/2020 09:48, Julian Seward wrote: > Ryan, I am seeing this problem also when running Gecko now, as compiled > with rustc 1.40. > >> `statx` is a relatively new system call, only appearing in Linux 4.11. >> Rust uses a trick where it calls `statx` with two NULL pointers to see >> if the kernel/glibc support it because it's faster than calling it with >> a real filename. If it returns `EFAULT` the system call is supported, >> otherwise it returns `ENOSYS`. The comments in the Rust source code >> imply this is a known trick. > > Can you point me at the comments in the Rust source code? I read the man > page for statx pretty carefully, and saw nothing there implying that NULL > is an acceptable value for the |filename| argument. I also didn't see > anything like that in the kernel sources (but I could easily have missed > it, it's a twisty maze in there). Well it sounds like it's relying on side effects of the implementation. No doubt the kernel will try to call copy_from_user or something similar on the filename argument and that will trigger EFAULT if the address is not a valid address in the processes address space, which NULL won't be. So EFAULT means the system call is implemented as the kernel has tried to access the filename, and ENOSYS means it isn't implemented. You could do much the same thing with any system call that takes a user space address as an argument. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Julian S. <js...@ac...> - 2020-01-02 09:48:48
|
Ryan, I am seeing this problem also when running Gecko now, as compiled with rustc 1.40. > `statx` is a relatively new system call, only appearing in Linux 4.11. > Rust uses a trick where it calls `statx` with two NULL pointers to see > if the kernel/glibc support it because it's faster than calling it with > a real filename. If it returns `EFAULT` the system call is supported, > otherwise it returns `ENOSYS`. The comments in the Rust source code > imply this is a known trick. Can you point me at the comments in the Rust source code? I read the man page for statx pretty carefully, and saw nothing there implying that NULL is an acceptable value for the |filename| argument. I also didn't see anything like that in the kernel sources (but I could easily have missed it, it's a twisty maze in there). J |
|
From: Julian S. <se...@so...> - 2020-01-02 08:35:10
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=cadd90993504678607a4f95dfe5d1df5207c1eb0 commit cadd90993504678607a4f95dfe5d1df5207c1eb0 Author: Julian Seward <js...@ac...> Date: Thu Jan 2 09:32:19 2020 +0100 amd64 insn selector: improved handling of Or1/And1 trees. This splits function iselCondCode into iselCondCode_C and iselCondCode_R, the former of which is the old one that computes boolean expressions into an amd64 condition code, but the latter being new, and computes boolean expressions into the lowest bit of an integer register. This enables much better code generation for Or1/And1 trees, which now result quite commonly from the new &&-recovery machinery in the front end. Diff: --- VEX/priv/host_amd64_isel.c | 132 ++++++++++++++++++++++++++++++--------------- 1 file changed, 90 insertions(+), 42 deletions(-) diff --git a/VEX/priv/host_amd64_isel.c b/VEX/priv/host_amd64_isel.c index 6b70e54..c3cd61c 100644 --- a/VEX/priv/host_amd64_isel.c +++ b/VEX/priv/host_amd64_isel.c @@ -234,8 +234,11 @@ static void iselInt128Expr_wrk ( /*OUT*/HReg* rHi, HReg* rLo, static void iselInt128Expr ( /*OUT*/HReg* rHi, HReg* rLo, ISelEnv* env, const IRExpr* e ); -static AMD64CondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e ); -static AMD64CondCode iselCondCode ( ISelEnv* env, const IRExpr* e ); +static AMD64CondCode iselCondCode_C_wrk ( ISelEnv* env, const IRExpr* e ); +static AMD64CondCode iselCondCode_C ( ISelEnv* env, const IRExpr* e ); + +static HReg iselCondCode_R_wrk ( ISelEnv* env, const IRExpr* e ); +static HReg iselCondCode_R ( ISelEnv* env, const IRExpr* e ); static HReg iselDblExpr_wrk ( ISelEnv* env, const IRExpr* e ); static HReg iselDblExpr ( ISelEnv* env, const IRExpr* e ); @@ -649,7 +652,7 @@ void doHelperCall ( /*OUT*/UInt* stackAdjustAfterCall, && guard->Iex.Const.con->Ico.U1 == True) { /* unconditional -- do nothing */ } else { - cc = iselCondCode( env, guard ); + cc = iselCondCode_C( env, guard ); } } @@ -1611,7 +1614,7 @@ static HReg iselIntExpr_R_wrk ( ISelEnv* env, const IRExpr* e ) case Iop_1Uto32: case Iop_1Uto8: { HReg dst = newVRegI(env); - AMD64CondCode cond = iselCondCode(env, e->Iex.Unop.arg); + AMD64CondCode cond = iselCondCode_C(env, e->Iex.Unop.arg); addInstr(env, AMD64Instr_Set64(cond,dst)); return dst; } @@ -1619,10 +1622,9 @@ static HReg iselIntExpr_R_wrk ( ISelEnv* env, const IRExpr* e ) case Iop_1Sto16: case Iop_1Sto32: case Iop_1Sto64: { - /* could do better than this, but for now ... */ - HReg dst = newVRegI(env); - AMD64CondCode cond = iselCondCode(env, e->Iex.Unop.arg); - addInstr(env, AMD64Instr_Set64(cond,dst)); + HReg dst = newVRegI(env); + HReg tmp = iselCondCode_R(env, e->Iex.Unop.arg); + addInstr(env, mk_iMOVsd_RR(tmp, dst)); addInstr(env, AMD64Instr_Sh64(Ash_SHL, 63, dst)); addInstr(env, AMD64Instr_Sh64(Ash_SAR, 63, dst)); return dst; @@ -1955,7 +1957,7 @@ static HReg iselIntExpr_R_wrk ( ISelEnv* env, const IRExpr* e ) HReg r0 = iselIntExpr_R(env, e->Iex.ITE.iffalse); HReg dst = newVRegI(env); addInstr(env, mk_iMOVsd_RR(r1,dst)); - AMD64CondCode cc = iselCondCode(env, e->Iex.ITE.cond); + AMD64CondCode cc = iselCondCode_C(env, e->Iex.ITE.cond); addInstr(env, AMD64Instr_CMov64(cc ^ 1, r0, dst)); return dst; } @@ -2281,20 +2283,24 @@ static AMD64RM* iselIntExpr_RM_wrk ( ISelEnv* env, const IRExpr* e ) } -/* --------------------- CONDCODE --------------------- */ +/* --------------------- CONDCODE as %rflag test --------------------- */ /* Generate code to evaluated a bit-typed expression, returning the condition code which would correspond when the expression would - notionally have returned 1. */ + notionally have returned 1. -static AMD64CondCode iselCondCode ( ISelEnv* env, const IRExpr* e ) + Note that iselCondCode_C and iselCondCode_R are mutually recursive. For + future changes to either of them, take care not to introduce an infinite + loop involving the two of them. +*/ +static AMD64CondCode iselCondCode_C ( ISelEnv* env, const IRExpr* e ) { /* Uh, there's nothing we can sanity check here, unfortunately. */ - return iselCondCode_wrk(env,e); + return iselCondCode_C_wrk(env,e); } /* DO NOT CALL THIS DIRECTLY ! */ -static AMD64CondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e ) +static AMD64CondCode iselCondCode_C_wrk ( ISelEnv* env, const IRExpr* e ) { vassert(e); vassert(typeOfIRExpr(env->type_env,e) == Ity_I1); @@ -2321,7 +2327,7 @@ static AMD64CondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e ) /* Not1(...) */ if (e->tag == Iex_Unop && e->Iex.Unop.op == Iop_Not1) { /* Generate code for the arg, and negate the test condition */ - return 1 ^ iselCondCode(env, e->Iex.Unop.arg); + return 1 ^ iselCondCode_C(env, e->Iex.Unop.arg); } /* --- patterns rooted at: 64to1 --- */ @@ -2428,7 +2434,7 @@ static AMD64CondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e ) switch (e->Iex.Binop.op) { case Iop_CmpEQ8: case Iop_CasCmpEQ8: return Acc_Z; case Iop_CmpNE8: case Iop_CasCmpNE8: return Acc_NZ; - default: vpanic("iselCondCode(amd64): CmpXX8(expr,0:I8)"); + default: vpanic("iselCondCode_C(amd64): CmpXX8(expr,0:I8)"); } } else { HReg r1 = iselIntExpr_R(env, e->Iex.Binop.arg1); @@ -2440,7 +2446,7 @@ static AMD64CondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e ) switch (e->Iex.Binop.op) { case Iop_CmpEQ8: case Iop_CasCmpEQ8: return Acc_Z; case Iop_CmpNE8: case Iop_CasCmpNE8: return Acc_NZ; - default: vpanic("iselCondCode(amd64): CmpXX8(expr,expr)"); + default: vpanic("iselCondCode_C(amd64): CmpXX8(expr,expr)"); } } } @@ -2460,7 +2466,7 @@ static AMD64CondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e ) switch (e->Iex.Binop.op) { case Iop_CmpEQ16: case Iop_CasCmpEQ16: return Acc_Z; case Iop_CmpNE16: case Iop_CasCmpNE16: return Acc_NZ; - default: vpanic("iselCondCode(amd64): CmpXX16"); + default: vpanic("iselCondCode_C(amd64): CmpXX16"); } } @@ -2514,7 +2520,7 @@ static AMD64CondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e ) case Iop_CmpLT64U: return Acc_B; case Iop_CmpLE64S: return Acc_LE; case Iop_CmpLE64U: return Acc_BE; - default: vpanic("iselCondCode(amd64): CmpXX64"); + default: vpanic("iselCondCode_C(amd64): CmpXX64"); } } @@ -2540,31 +2546,73 @@ static AMD64CondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e ) case Iop_CmpLT32U: return Acc_B; case Iop_CmpLE32S: return Acc_LE; case Iop_CmpLE32U: return Acc_BE; - default: vpanic("iselCondCode(amd64): CmpXX32"); + default: vpanic("iselCondCode_C(amd64): CmpXX32"); } } /* And1(x,y), Or1(x,y) */ - /* FIXME: We could (and probably should) do a lot better here. If both args - are in temps already then we can just emit a reg-reg And/Or directly, - followed by the final Test. */ if (e->tag == Iex_Binop && (e->Iex.Binop.op == Iop_And1 || e->Iex.Binop.op == Iop_Or1)) { - // We could probably be cleverer about this. In the meantime .. - HReg x_as_64 = newVRegI(env); - AMD64CondCode cc_x = iselCondCode(env, e->Iex.Binop.arg1); - addInstr(env, AMD64Instr_Set64(cc_x, x_as_64)); - HReg y_as_64 = newVRegI(env); - AMD64CondCode cc_y = iselCondCode(env, e->Iex.Binop.arg2); - addInstr(env, AMD64Instr_Set64(cc_y, y_as_64)); - AMD64AluOp aop = e->Iex.Binop.op == Iop_And1 ? Aalu_AND : Aalu_OR; - addInstr(env, AMD64Instr_Alu64R(aop, AMD64RMI_Reg(x_as_64), y_as_64)); - addInstr(env, AMD64Instr_Test64(1, y_as_64)); + // Get the result in an int reg, then test the least significant bit. + HReg tmp = iselCondCode_R(env, e); + addInstr(env, AMD64Instr_Test64(1, tmp)); return Acc_NZ; } ppIRExpr(e); - vpanic("iselCondCode(amd64)"); + vpanic("iselCondCode_C(amd64)"); +} + + +/* --------------------- CONDCODE as int reg --------------------- */ + +/* Generate code to evaluated a bit-typed expression, returning the resulting + value in bit 0 of an integer register. WARNING: all of the other bits in the + register can be arbitrary. Callers must mask them off or otherwise ignore + them, as necessary. + + Note that iselCondCode_C and iselCondCode_R are mutually recursive. For + future changes to either of them, take care not to introduce an infinite + loop involving the two of them. +*/ +static HReg iselCondCode_R ( ISelEnv* env, const IRExpr* e ) +{ + /* Uh, there's nothing we can sanity check here, unfortunately. */ + return iselCondCode_R_wrk(env,e); +} + +/* DO NOT CALL THIS DIRECTLY ! */ +static HReg iselCondCode_R_wrk ( ISelEnv* env, const IRExpr* e ) +{ + vassert(e); + vassert(typeOfIRExpr(env->type_env,e) == Ity_I1); + + /* var */ + if (e->tag == Iex_RdTmp) { + return lookupIRTemp(env, e->Iex.RdTmp.tmp); + } + + /* And1(x,y), Or1(x,y) */ + if (e->tag == Iex_Binop + && (e->Iex.Binop.op == Iop_And1 || e->Iex.Binop.op == Iop_Or1)) { + HReg x_as_64 = iselCondCode_R(env, e->Iex.Binop.arg1); + HReg y_as_64 = iselCondCode_R(env, e->Iex.Binop.arg2); + HReg res = newVRegI(env); + addInstr(env, mk_iMOVsd_RR(y_as_64, res)); + AMD64AluOp aop = e->Iex.Binop.op == Iop_And1 ? Aalu_AND : Aalu_OR; + addInstr(env, AMD64Instr_Alu64R(aop, AMD64RMI_Reg(x_as_64), res)); + return res; + } + + /* Anything else, we hand off to iselCondCode_C and force the value into a + register. */ + HReg res = newVRegI(env); + AMD64CondCode cc = iselCondCode_C(env, e); + addInstr(env, AMD64Instr_Set64(cc, res)); + return res; + + ppIRExpr(e); + vpanic("iselCondCode_R(amd64)"); } @@ -2833,7 +2881,7 @@ static HReg iselFltExpr_wrk ( ISelEnv* env, const IRExpr* e ) r0 = iselFltExpr(env, e->Iex.ITE.iffalse); dst = newVRegV(env); addInstr(env, mk_vMOVsd_RR(r1,dst)); - AMD64CondCode cc = iselCondCode(env, e->Iex.ITE.cond); + AMD64CondCode cc = iselCondCode_C(env, e->Iex.ITE.cond); addInstr(env, AMD64Instr_SseCMov(cc ^ 1, r0, dst)); return dst; } @@ -3224,7 +3272,7 @@ static HReg iselDblExpr_wrk ( ISelEnv* env, const IRExpr* e ) r0 = iselDblExpr(env, e->Iex.ITE.iffalse); dst = newVRegV(env); addInstr(env, mk_vMOVsd_RR(r1,dst)); - AMD64CondCode cc = iselCondCode(env, e->Iex.ITE.cond); + AMD64CondCode cc = iselCondCode_C(env, e->Iex.ITE.cond); addInstr(env, AMD64Instr_SseCMov(cc ^ 1, r0, dst)); return dst; } @@ -3927,7 +3975,7 @@ static HReg iselVecExpr_wrk ( ISelEnv* env, const IRExpr* e ) HReg r0 = iselVecExpr(env, e->Iex.ITE.iffalse); HReg dst = newVRegV(env); addInstr(env, mk_vMOVsd_RR(r1,dst)); - AMD64CondCode cc = iselCondCode(env, e->Iex.ITE.cond); + AMD64CondCode cc = iselCondCode_C(env, e->Iex.ITE.cond); addInstr(env, AMD64Instr_SseCMov(cc ^ 1, r0, dst)); return dst; } @@ -4567,7 +4615,7 @@ static void iselDVecExpr_wrk ( /*OUT*/HReg* rHi, /*OUT*/HReg* rLo, HReg dstLo = newVRegV(env); addInstr(env, mk_vMOVsd_RR(r1Hi,dstHi)); addInstr(env, mk_vMOVsd_RR(r1Lo,dstLo)); - AMD64CondCode cc = iselCondCode(env, e->Iex.ITE.cond); + AMD64CondCode cc = iselCondCode_C(env, e->Iex.ITE.cond); addInstr(env, AMD64Instr_SseCMov(cc ^ 1, r0Hi, dstHi)); addInstr(env, AMD64Instr_SseCMov(cc ^ 1, r0Lo, dstLo)); *rHi = dstHi; @@ -4628,7 +4676,7 @@ static void iselStmt ( ISelEnv* env, IRStmt* stmt ) } else { addInstr(env, mk_iMOVsd_RR(rAlt, rDst)); } - AMD64CondCode cc = iselCondCode(env, lg->guard); + AMD64CondCode cc = iselCondCode_C(env, lg->guard); if (szB == 16) { addInstr(env, AMD64Instr_SseCLoad(cc, amAddr, rDst)); } else { @@ -4659,7 +4707,7 @@ static void iselStmt ( ISelEnv* env, IRStmt* stmt ) = szB == 16 ? iselVecExpr(env, sg->data) : iselIntExpr_R(env, sg->data); AMD64CondCode cc - = iselCondCode(env, sg->guard); + = iselCondCode_C(env, sg->guard); if (szB == 16) { addInstr(env, AMD64Instr_SseCStore(cc, rSrc, amAddr)); } else { @@ -4853,7 +4901,7 @@ static void iselStmt ( ISelEnv* env, IRStmt* stmt ) return; } if (ty == Ity_I1) { - AMD64CondCode cond = iselCondCode(env, stmt->Ist.WrTmp.data); + AMD64CondCode cond = iselCondCode_C(env, stmt->Ist.WrTmp.data); HReg dst = lookupIRTemp(env, tmp); addInstr(env, AMD64Instr_Set64(cond, dst)); return; @@ -5069,7 +5117,7 @@ static void iselStmt ( ISelEnv* env, IRStmt* stmt ) if (stmt->Ist.Exit.dst->tag != Ico_U64) vpanic("iselStmt(amd64): Ist_Exit: dst is not a 64-bit value"); - AMD64CondCode cc = iselCondCode(env, stmt->Ist.Exit.guard); + AMD64CondCode cc = iselCondCode_C(env, stmt->Ist.Exit.guard); AMD64AMode* amRIP = AMD64AMode_IR(stmt->Ist.Exit.offsIP, hregAMD64_RBP()); |
|
From: Julian S. <se...@so...> - 2020-01-02 08:26:29
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=4eaa80103df9d1d396cc4b7427ea99faac11329d commit 4eaa80103df9d1d396cc4b7427ea99faac11329d Author: Julian Seward <js...@ac...> Date: Thu Jan 2 09:23:46 2020 +0100 amd64 back end: generate 32-bit shift instructions for 32-bit IR shifts. Until now these have been handled by possibly widening the value to 64 bits, if necessary, followed by a 64-bit shift. That wastes instructions and code space. Diff: --- VEX/priv/host_amd64_defs.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++ VEX/priv/host_amd64_defs.h | 9 ++++++++- VEX/priv/host_amd64_isel.c | 30 +++++++++++++++++++---------- 3 files changed, 76 insertions(+), 11 deletions(-) diff --git a/VEX/priv/host_amd64_defs.c b/VEX/priv/host_amd64_defs.c index 29127c1..3d237e1 100644 --- a/VEX/priv/host_amd64_defs.c +++ b/VEX/priv/host_amd64_defs.c @@ -626,6 +626,14 @@ AMD64Instr* AMD64Instr_Sh64 ( AMD64ShiftOp op, UInt src, HReg dst ) { i->Ain.Sh64.dst = dst; return i; } +AMD64Instr* AMD64Instr_Sh32 ( AMD64ShiftOp op, UInt src, HReg dst ) { + AMD64Instr* i = LibVEX_Alloc_inline(sizeof(AMD64Instr)); + i->tag = Ain_Sh32; + i->Ain.Sh32.op = op; + i->Ain.Sh32.src = src; + i->Ain.Sh32.dst = dst; + return i; +} AMD64Instr* AMD64Instr_Test64 ( UInt imm32, HReg dst ) { AMD64Instr* i = LibVEX_Alloc_inline(sizeof(AMD64Instr)); i->tag = Ain_Test64; @@ -1090,6 +1098,14 @@ void ppAMD64Instr ( const AMD64Instr* i, Bool mode64 ) vex_printf("$%d,", (Int)i->Ain.Sh64.src); ppHRegAMD64(i->Ain.Sh64.dst); return; + case Ain_Sh32: + vex_printf("%sl ", showAMD64ShiftOp(i->Ain.Sh32.op)); + if (i->Ain.Sh32.src == 0) + vex_printf("%%cl,"); + else + vex_printf("$%d,", (Int)i->Ain.Sh32.src); + ppHRegAMD64_lo32(i->Ain.Sh32.dst); + return; case Ain_Test64: vex_printf("testq $%d,", (Int)i->Ain.Test64.imm32); ppHRegAMD64(i->Ain.Test64.dst); @@ -1471,6 +1487,11 @@ void getRegUsage_AMD64Instr ( HRegUsage* u, const AMD64Instr* i, Bool mode64 ) if (i->Ain.Sh64.src == 0) addHRegUse(u, HRmRead, hregAMD64_RCX()); return; + case Ain_Sh32: + addHRegUse(u, HRmModify, i->Ain.Sh32.dst); + if (i->Ain.Sh32.src == 0) + addHRegUse(u, HRmRead, hregAMD64_RCX()); + return; case Ain_Test64: addHRegUse(u, HRmRead, i->Ain.Test64.dst); return; @@ -1808,6 +1829,9 @@ void mapRegs_AMD64Instr ( HRegRemap* m, AMD64Instr* i, Bool mode64 ) case Ain_Sh64: mapReg(m, &i->Ain.Sh64.dst); return; + case Ain_Sh32: + mapReg(m, &i->Ain.Sh32.dst); + return; case Ain_Test64: mapReg(m, &i->Ain.Test64.dst); return; @@ -2762,6 +2786,30 @@ Int emit_AMD64Instr ( /*MB_MOD*/Bool* is_profInc, } break; + case Ain_Sh32: + opc_cl = opc_imm = subopc = 0; + switch (i->Ain.Sh32.op) { + case Ash_SHR: opc_cl = 0xD3; opc_imm = 0xC1; subopc = 5; break; + case Ash_SAR: opc_cl = 0xD3; opc_imm = 0xC1; subopc = 7; break; + case Ash_SHL: opc_cl = 0xD3; opc_imm = 0xC1; subopc = 4; break; + default: goto bad; + } + if (i->Ain.Sh32.src == 0) { + rex = clearWBit( rexAMode_R_enc_reg(0, i->Ain.Sh32.dst) ); + if (rex != 0x40) *p++ = rex; + *p++ = toUChar(opc_cl); + p = doAMode_R_enc_reg(p, subopc, i->Ain.Sh32.dst); + goto done; + } else { + rex = clearWBit( rexAMode_R_enc_reg(0, i->Ain.Sh32.dst) ); + if (rex != 0x40) *p++ = rex; + *p++ = toUChar(opc_imm); + p = doAMode_R_enc_reg(p, subopc, i->Ain.Sh32.dst); + *p++ = (UChar)(i->Ain.Sh32.src); + goto done; + } + break; + case Ain_Test64: /* testq sign-extend($imm32), %reg */ *p++ = rexAMode_R_enc_reg(0, i->Ain.Test64.dst); diff --git a/VEX/priv/host_amd64_defs.h b/VEX/priv/host_amd64_defs.h index 3dfa9fb..e2ed261 100644 --- a/VEX/priv/host_amd64_defs.h +++ b/VEX/priv/host_amd64_defs.h @@ -359,7 +359,8 @@ typedef Ain_Imm64, /* Generate 64-bit literal to register */ Ain_Alu64R, /* 64-bit mov/arith/logical, dst=REG */ Ain_Alu64M, /* 64-bit mov/arith/logical, dst=MEM */ - Ain_Sh64, /* 64-bit shift/rotate, dst=REG or MEM */ + Ain_Sh64, /* 64-bit shift, dst=REG */ + Ain_Sh32, /* 32-bit shift, dst=REG */ Ain_Test64, /* 64-bit test (AND, set flags, discard result) */ Ain_Unary64, /* 64-bit not and neg */ Ain_Lea64, /* 64-bit compute EA into a reg */ @@ -442,6 +443,11 @@ typedef HReg dst; } Sh64; struct { + AMD64ShiftOp op; + UInt src; /* shift amount, or 0 means %cl */ + HReg dst; + } Sh32; + struct { UInt imm32; HReg dst; } Test64; @@ -744,6 +750,7 @@ extern AMD64Instr* AMD64Instr_Unary64 ( AMD64UnaryOp op, HReg dst ); extern AMD64Instr* AMD64Instr_Lea64 ( AMD64AMode* am, HReg dst ); extern AMD64Instr* AMD64Instr_Alu32R ( AMD64AluOp, AMD64RMI*, HReg ); extern AMD64Instr* AMD64Instr_Sh64 ( AMD64ShiftOp, UInt, HReg ); +extern AMD64Instr* AMD64Instr_Sh32 ( AMD64ShiftOp, UInt, HReg ); extern AMD64Instr* AMD64Instr_Test64 ( UInt imm32, HReg dst ); extern AMD64Instr* AMD64Instr_MulL ( Bool syned, AMD64RM* ); extern AMD64Instr* AMD64Instr_Div ( Bool syned, Int sz, AMD64RM* ); diff --git a/VEX/priv/host_amd64_isel.c b/VEX/priv/host_amd64_isel.c index dfaabb4..6b70e54 100644 --- a/VEX/priv/host_amd64_isel.c +++ b/VEX/priv/host_amd64_isel.c @@ -1030,9 +1030,12 @@ static HReg iselIntExpr_R_wrk ( ISelEnv* env, const IRExpr* e ) HReg regL = iselIntExpr_R(env, e->Iex.Binop.arg1); addInstr(env, mk_iMOVsd_RR(regL,dst)); - /* Do any necessary widening for 32/16/8 bit operands */ + /* Do any necessary widening for 16/8 bit operands. Also decide on the + final width at which the shift is to be done. */ + Bool shift64 = False; switch (e->Iex.Binop.op) { case Iop_Shr64: case Iop_Shl64: case Iop_Sar64: + shift64 = True; break; case Iop_Shl32: case Iop_Shl16: case Iop_Shl8: break; @@ -1045,18 +1048,16 @@ static HReg iselIntExpr_R_wrk ( ISelEnv* env, const IRExpr* e ) Aalu_AND, AMD64RMI_Imm(0xFFFF), dst)); break; case Iop_Shr32: - addInstr(env, AMD64Instr_MovxLQ(False, dst, dst)); break; case Iop_Sar8: - addInstr(env, AMD64Instr_Sh64(Ash_SHL, 56, dst)); - addInstr(env, AMD64Instr_Sh64(Ash_SAR, 56, dst)); + addInstr(env, AMD64Instr_Sh32(Ash_SHL, 24, dst)); + addInstr(env, AMD64Instr_Sh32(Ash_SAR, 24, dst)); break; case Iop_Sar16: - addInstr(env, AMD64Instr_Sh64(Ash_SHL, 48, dst)); - addInstr(env, AMD64Instr_Sh64(Ash_SAR, 48, dst)); + addInstr(env, AMD64Instr_Sh32(Ash_SHL, 16, dst)); + addInstr(env, AMD64Instr_Sh32(Ash_SAR, 16, dst)); break; case Iop_Sar32: - addInstr(env, AMD64Instr_MovxLQ(True, dst, dst)); break; default: ppIROp(e->Iex.Binop.op); @@ -1071,14 +1072,23 @@ static HReg iselIntExpr_R_wrk ( ISelEnv* env, const IRExpr* e ) vassert(e->Iex.Binop.arg2->Iex.Const.con->tag == Ico_U8); nshift = e->Iex.Binop.arg2->Iex.Const.con->Ico.U8; vassert(nshift >= 0); - if (nshift > 0) + if (nshift > 0) { /* Can't allow nshift==0 since that means %cl */ - addInstr(env, AMD64Instr_Sh64(shOp, nshift, dst)); + if (shift64) { + addInstr(env, AMD64Instr_Sh64(shOp, nshift, dst)); + } else { + addInstr(env, AMD64Instr_Sh32(shOp, nshift, dst)); + } + } } else { /* General case; we have to force the amount into %cl. */ HReg regR = iselIntExpr_R(env, e->Iex.Binop.arg2); addInstr(env, mk_iMOVsd_RR(regR,hregAMD64_RCX())); - addInstr(env, AMD64Instr_Sh64(shOp, 0/* %cl */, dst)); + if (shift64) { + addInstr(env, AMD64Instr_Sh64(shOp, 0/* %cl */, dst)); + } else { + addInstr(env, AMD64Instr_Sh32(shOp, 0/* %cl */, dst)); + } } return dst; } |
|
From: Julian S. <se...@so...> - 2020-01-02 08:14:14
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=7239439e84881aeb1788bb31e59268533e3d78ff commit 7239439e84881aeb1788bb31e59268533e3d78ff Author: Julian Seward <js...@ac...> Date: Thu Jan 2 09:10:06 2020 +0100 Enable expensive handling of CmpEQ64/CmpNE64 for amd64 by default. This has unfortunately become necessary because optimising compilers are generating 64-bit equality comparisons on partially defined values on this target. There will shortly be two followup commits which partially mitigate the resulting performance loss. Diff: --- memcheck/mc_translate.c | 1 + 1 file changed, 1 insertion(+) diff --git a/memcheck/mc_translate.c b/memcheck/mc_translate.c index bd29ea0..87b8ac6 100644 --- a/memcheck/mc_translate.c +++ b/memcheck/mc_translate.c @@ -8480,6 +8480,7 @@ IRSB* MC_(instrument) ( VgCallbackClosure* closure, # elif defined(VGA_amd64) mce.dlbo.dl_Add64 = DLauto; mce.dlbo.dl_CmpEQ32_CmpNE32 = DLexpensive; + mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive; # elif defined(VGA_ppc64le) // Needed by (at least) set_AV_CR6() in the front end. mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive; |
|
From: Julian S. <se...@so...> - 2020-01-02 07:02:56
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=79dd0bd6e88a65f435799d5d84165c260c9bbda7 commit 79dd0bd6e88a65f435799d5d84165c260c9bbda7 Author: Julian Seward <js...@ac...> Date: Thu Jan 2 08:00:07 2020 +0100 Fold Iop_CmpEQ32x8(x,x) to all-1s .. .. hence treating it as a dependency-breaking idiom. Also handle the resulting IRConst_V256(0xFFFFFFFF) in the amd64 insn selector. (dup of 96de5118f5332ae145912ebe91b8fa143df74b8d from 'grail') Possibly fixes #409429. Diff: --- VEX/priv/host_amd64_isel.c | 8 ++++++++ VEX/priv/ir_opt.c | 5 ++++- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/VEX/priv/host_amd64_isel.c b/VEX/priv/host_amd64_isel.c index 8dc3068..dfaabb4 100644 --- a/VEX/priv/host_amd64_isel.c +++ b/VEX/priv/host_amd64_isel.c @@ -4003,6 +4003,14 @@ static void iselDVecExpr_wrk ( /*OUT*/HReg* rHi, /*OUT*/HReg* rLo, *rLo = vLo; return; } + case 0xFFFFFFFF: { + HReg vHi = generate_ones_V128(env); + HReg vLo = newVRegV(env); + addInstr(env, mk_vMOVsd_RR(vHi, vLo)); + *rHi = vHi; + *rLo = vLo; + return; + } default: break; /* give up. Until such time as is necessary. */ } diff --git a/VEX/priv/ir_opt.c b/VEX/priv/ir_opt.c index 37e39bc..c5b7a2f 100644 --- a/VEX/priv/ir_opt.c +++ b/VEX/priv/ir_opt.c @@ -1298,6 +1298,8 @@ static IRExpr* mkOnesOfPrimopResultType ( IROp op ) case Iop_CmpEQ16x8: case Iop_CmpEQ32x4: return IRExpr_Const(IRConst_V128(0xFFFF)); + case Iop_CmpEQ32x8: + return IRExpr_Const(IRConst_V256(0xFFFFFFFF)); default: ppIROp(op); vpanic("mkOnesOfPrimopResultType: bad primop"); @@ -2353,7 +2355,7 @@ static IRExpr* fold_Expr_WRK ( IRExpr** env, IRExpr* e ) case Iop_Xor64: case Iop_XorV128: case Iop_XorV256: - /* Xor8/16/32/64/V128(t,t) ==> 0, for some IRTemp t */ + /* Xor8/16/32/64/V128/V256(t,t) ==> 0, for some IRTemp t */ if (sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) { e2 = mkZeroOfPrimopResultType(e->Iex.Binop.op); break; @@ -2406,6 +2408,7 @@ static IRExpr* fold_Expr_WRK ( IRExpr** env, IRExpr* e ) case Iop_CmpEQ8x16: case Iop_CmpEQ16x8: case Iop_CmpEQ32x4: + case Iop_CmpEQ32x8: if (sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) { e2 = mkOnesOfPrimopResultType(e->Iex.Binop.op); break; |
|
From: Julian S. <se...@so...> - 2020-01-02 06:29:40
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=61a634b6077e9f0481e6b9a9d636ad3ee24faf6f commit 61a634b6077e9f0481e6b9a9d636ad3ee24faf6f Author: Julian Seward <js...@ac...> Date: Sun Dec 15 20:14:37 2019 +0100 'grail' fixes for MIPS: This isn't a good result. It merely disables the new functionality on MIPS because enabling it causes segfaults, even with --tool=none, the cause of which are not obvious. It is only chasing through conditional branches that is disabled, though. Chasing through unconditional branches (jumps and calls to known destinations) is still enabled. * guest_generic_bb_to_IR.c bb_to_IR(): Disable, hopefully temporarily, the key &&-recovery transformation on MIPS. * VEX/priv/host_mips_isel.c iselWordExpr_R_wrk(), iselCondCode_wrk(): - add support for Iop_And1, Iop_Or1, and IRConst_U1. This code is my best guess about what is correct, but is #if 0'd for now. - Properly guard some Iex_Binop cases that lacked a leading check that the expression actually was a Binop. Diff: --- VEX/priv/guest_generic_bb_to_IR.c | 5 +++- VEX/priv/host_mips_isel.c | 62 ++++++++++++++++++++++++++++++--------- 2 files changed, 52 insertions(+), 15 deletions(-) diff --git a/VEX/priv/guest_generic_bb_to_IR.c b/VEX/priv/guest_generic_bb_to_IR.c index 0b8f852..507c75e 100644 --- a/VEX/priv/guest_generic_bb_to_IR.c +++ b/VEX/priv/guest_generic_bb_to_IR.c @@ -1451,7 +1451,10 @@ IRSB* bb_to_IR ( Memcheck to crash, for as-yet unknown reasons. It also exposes some unhandled Iex_ITE cases in the s390x instruction selector. For now, disable. */ - && arch_guest != VexArchS390X) + && arch_guest != VexArchS390X + /* sewardj 2019Dec14: It also causes crashing on MIPS, even for + --tool=none. */ + && arch_guest != VexArchMIPS64 && arch_guest != VexArchMIPS32) { if (debug_print) { vex_printf("\n-+-+ (ext# %d) Considering cbranch to" diff --git a/VEX/priv/host_mips_isel.c b/VEX/priv/host_mips_isel.c index c49d152..f14f654 100644 --- a/VEX/priv/host_mips_isel.c +++ b/VEX/priv/host_mips_isel.c @@ -2364,6 +2364,13 @@ static HReg iselWordExpr_R_wrk(ISelEnv * env, IRExpr * e) case Ico_U8: l = (Long) (Int) (Char) con->Ico.U8; break; +#if 0 + // Not needed until chasing cond branches in bb_to_IR is enabled on + // MIPS. See comment on And1/Or1 below. + case Ico_U1: + l = con->Ico.U1 ? 1 : 0; + break; +#endif default: vpanic("iselIntExpr_R.const(mips)"); } @@ -2644,18 +2651,19 @@ static MIPSCondCode iselCondCode_wrk(ISelEnv * env, IRExpr * e) vassert(e); vassert(typeOfIRExpr(env->type_env, e) == Ity_I1); /* Cmp*32*(x,y) ? */ - if (e->Iex.Binop.op == Iop_CmpEQ32 - || e->Iex.Binop.op == Iop_CmpNE32 - || e->Iex.Binop.op == Iop_CmpNE64 - || e->Iex.Binop.op == Iop_CmpLT32S - || e->Iex.Binop.op == Iop_CmpLT32U - || e->Iex.Binop.op == Iop_CmpLT64U - || e->Iex.Binop.op == Iop_CmpLE32S - || e->Iex.Binop.op == Iop_CmpLE64S - || e->Iex.Binop.op == Iop_CmpLT64S - || e->Iex.Binop.op == Iop_CmpEQ64 - || e->Iex.Binop.op == Iop_CasCmpEQ32 - || e->Iex.Binop.op == Iop_CasCmpEQ64) { + if (e->tag == Iex_Binop + && (e->Iex.Binop.op == Iop_CmpEQ32 + || e->Iex.Binop.op == Iop_CmpNE32 + || e->Iex.Binop.op == Iop_CmpNE64 + || e->Iex.Binop.op == Iop_CmpLT32S + || e->Iex.Binop.op == Iop_CmpLT32U + || e->Iex.Binop.op == Iop_CmpLT64U + || e->Iex.Binop.op == Iop_CmpLE32S + || e->Iex.Binop.op == Iop_CmpLE64S + || e->Iex.Binop.op == Iop_CmpLT64S + || e->Iex.Binop.op == Iop_CmpEQ64 + || e->Iex.Binop.op == Iop_CasCmpEQ32 + || e->Iex.Binop.op == Iop_CasCmpEQ64)) { Bool syned = (e->Iex.Binop.op == Iop_CmpLT32S || e->Iex.Binop.op == Iop_CmpLE32S @@ -2726,7 +2734,7 @@ static MIPSCondCode iselCondCode_wrk(ISelEnv * env, IRExpr * e) dst, mode64)); return cc; } - if (e->Iex.Binop.op == Iop_Not1) { + if (e->tag == Iex_Unop && e->Iex.Binop.op == Iop_Not1) { HReg r_dst = newVRegI(env); HReg r_srcL = iselWordExpr_R(env, e->Iex.Unop.arg); MIPSRH *r_srcR = MIPSRH_Reg(r_srcL); @@ -2742,7 +2750,33 @@ static MIPSCondCode iselCondCode_wrk(ISelEnv * env, IRExpr * e) r_dst, mode64)); return MIPScc_NE; } - if (e->tag == Iex_RdTmp || e->tag == Iex_Unop) { +#if 0 + // sewardj 2019Dec14: this is my best attempt at And1/Or1, but I am not + // sure if it is correct. In any case it is not needed until chasing cond + // branches is enabled on MIPS. Currently it is disabled, in function bb_to_IR + // (see comments there). + if (e->tag == Iex_Binop + && (e->Iex.Binop.op == Iop_And1 || e->Iex.Binop.op == Iop_Or1)) { + HReg r_argL = iselWordExpr_R(env, e->Iex.Binop.arg1); + HReg r_argR = iselWordExpr_R(env, e->Iex.Binop.arg2); + HReg r_dst = newVRegI(env); + addInstr(env, MIPSInstr_Alu(e->Iex.Binop.op == Iop_And1 ? Malu_AND : Malu_OR, + r_dst, r_argL, MIPSRH_Reg(r_argR))); + addInstr(env, MIPSInstr_Alu(Malu_AND, r_dst, r_dst, MIPSRH_Imm(False, 1))); + /* Store result to guest_COND */ + /* sewardj 2019Dec13: this seems wrong to me. The host-side instruction + selector shouldn't touch the guest-side state, except in response to + Iex_Get and Ist_Put. */ + MIPSAMode *am_addr = MIPSAMode_IR(0, GuestStatePointer(mode64)); + + addInstr(env, MIPSInstr_Store(4, + MIPSAMode_IR(am_addr->Mam.IR.index + COND_OFFSET(mode64), + am_addr->Mam.IR.base), + r_dst, mode64)); + return MIPScc_EQ; + } +#endif + if (e->tag == Iex_RdTmp) { HReg r_dst = iselWordExpr_R_wrk(env, e); /* Store result to guest_COND */ MIPSAMode *am_addr = MIPSAMode_IR(0, GuestStatePointer(mode64)); |
|
From: Julian S. <se...@so...> - 2020-01-02 06:29:35
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=8d510c468a20b7528e2fca870f3236b4e2cb965f commit 8d510c468a20b7528e2fca870f3236b4e2cb965f Author: Julian Seward <js...@ac...> Date: Sun Dec 1 07:01:20 2019 +0100 'grail' fixes for s390x: This isn't a good result. It merely disables the new functionality on s390x, for the reason stated below. * guest_generic_bb_to_IR.c bb_to_IR(): Disable, hopefully temporarily, the key &&-recovery transformation on s390x, since it causes Memcheck to crash for reasons I couldn't figure out. It also exposes some missing Iex_ITE cases in the s390x insn selector, although those shouldn't be a big deal to fix. Maybe it's some strangeness to do with the s390x "ex" instruction. I don't exactly understand how that trickery works, but from some study of it, I didn't see anything obviously wrong. It is only chasing through conditional branches that is disabled for s390x. Chasing through unconditional branches (jumps and calls to known destinations) is still enabled. * host_s390_isel.c s390_isel_cc(): No functional change. Code has been added here to handle the new Iop_And1 and Iop_Or1, and it is somewhat tested, but is not needed until conditional branch chasing is enabled on s390x. Diff: --- VEX/priv/guest_generic_bb_to_IR.c | 8 +++++++- VEX/priv/host_s390_isel.c | 40 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/VEX/priv/guest_generic_bb_to_IR.c b/VEX/priv/guest_generic_bb_to_IR.c index 6a1d4dc..0b8f852 100644 --- a/VEX/priv/guest_generic_bb_to_IR.c +++ b/VEX/priv/guest_generic_bb_to_IR.c @@ -1446,7 +1446,13 @@ IRSB* bb_to_IR ( // Try for an extend based on a conditional branch, specifically in the // hope of identifying and recovering, an "A && B" condition spread across // two basic blocks. - if (irsb_be.tag == Be_Cond) { + if (irsb_be.tag == Be_Cond + /* sewardj 2019Nov30: Alas, chasing cond branches on s390 causes + Memcheck to crash, for as-yet unknown reasons. It also exposes + some unhandled Iex_ITE cases in the s390x instruction selector. + For now, disable. */ + && arch_guest != VexArchS390X) + { if (debug_print) { vex_printf("\n-+-+ (ext# %d) Considering cbranch to" " SX=0x%llx FT=0x%llx -+-+\n\n", diff --git a/VEX/priv/host_s390_isel.c b/VEX/priv/host_s390_isel.c index 30e5c76..97614c8 100644 --- a/VEX/priv/host_s390_isel.c +++ b/VEX/priv/host_s390_isel.c @@ -3535,6 +3535,46 @@ s390_isel_cc(ISelEnv *env, IRExpr *cond) IRExpr *arg2 = cond->Iex.Binop.arg2; HReg reg1, reg2; + /* sewardj 2019Nov30: This will be needed when chasing through conditional + branches in guest_generic_bb_to_IR.c is enabled on s390x. + Unfortunately that is currently disabled on s390x as it causes + mysterious segfaults and also exposes some unhandled Iex_ITE cases in + this instruction selector. The following Iop_And1/Iop_Or1 cases are + also needed when enabled. The code below is *believed* to be correct, + and has been lightly tested, but it is #if 0'd until such time as we + need it. */ +# if 0 + /* FIXME: We could (and probably should) do a lot better here, by using + the iselCondCode_C/_R scheme used in the amd64 insn selector. */ + if (cond->Iex.Binop.op == Iop_And1 || cond->Iex.Binop.op == Iop_Or1) { + /* In short: force both operands into registers, AND or OR them, mask + off all but the lowest bit, then convert the result back into a + condition code. */ + const s390_opnd_RMI one = s390_opnd_imm(1); + + HReg x_as_64 = newVRegI(env); + s390_cc_t cc_x = s390_isel_cc(env, arg1); + addInstr(env, s390_insn_cc2bool(x_as_64, cc_x)); + addInstr(env, s390_insn_alu(8, S390_ALU_AND, x_as_64, one)); + + HReg y_as_64 = newVRegI(env); + s390_cc_t cc_y = s390_isel_cc(env, arg2); + addInstr(env, s390_insn_cc2bool(y_as_64, cc_y)); + addInstr(env, s390_insn_alu(8, S390_ALU_AND, y_as_64, one)); + + s390_alu_t opkind + = cond->Iex.Binop.op == Iop_And1 ? S390_ALU_AND : S390_ALU_OR; + addInstr(env, s390_insn_alu(/*size=*/8, + opkind, x_as_64, s390_opnd_reg(y_as_64))); + + addInstr(env, s390_insn_alu(/*size=*/8, S390_ALU_AND, x_as_64, one)); + addInstr(env, s390_insn_test(/*size=*/8, s390_opnd_reg(x_as_64))); + return S390_CC_NE; + } +# endif /* 0 */ + + // |sizeofIRType| asserts on Ity_I1, so we can't do it until after we're + // sure that Iop_And1 and Iop_Or1 can't make it this far. size = sizeofIRType(typeOfIRExpr(env->type_env, arg1)); switch (cond->Iex.Binop.op) { |
|
From: Julian S. <se...@so...> - 2020-01-02 06:29:30
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=1fa3bc8f54b409929af2bbb9afd8916c982d70ee commit 1fa3bc8f54b409929af2bbb9afd8916c982d70ee Author: Julian Seward <js...@ac...> Date: Sun Nov 24 15:13:54 2019 +0100 'grail' fixes for arm64: * guest_arm64_toIR.c: use |sigill_diag| to guard auxiliary diagnostic printing in case of decode failure * guest_generic_bb_to_IR.c expr_is_guardable(), stmt_is_guardable(): handle a few more cases that didn't turn up so far on x86 or amd64 * host_arm64_defs.[ch]: - new instruction ARM64Instr_Set64, to copy a condition code value into a register (the CSET instruction) - use this to reimplement Iop_And1 and Iop_Or1 Diff: --- VEX/priv/guest_arm64_toIR.c | 45 +++++++++++++------- VEX/priv/guest_generic_bb_to_IR.c | 19 +++++++-- VEX/priv/host_arm64_defs.c | 27 ++++++++++++ VEX/priv/host_arm64_defs.h | 7 +++ VEX/priv/host_arm64_isel.c | 89 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 168 insertions(+), 19 deletions(-) diff --git a/VEX/priv/guest_arm64_toIR.c b/VEX/priv/guest_arm64_toIR.c index 6eb896c..bee348a 100644 --- a/VEX/priv/guest_arm64_toIR.c +++ b/VEX/priv/guest_arm64_toIR.c @@ -2402,7 +2402,7 @@ Bool dbm_DecodeBitMasks ( /*OUT*/ULong* wmask, /*OUT*/ULong* tmask, static Bool dis_ARM64_data_processing_immediate(/*MB_OUT*/DisResult* dres, - UInt insn) + UInt insn, Bool sigill_diag) { # define INSN(_bMax,_bMin) SLICE_UInt(insn, (_bMax), (_bMin)) @@ -2737,7 +2737,9 @@ Bool dis_ARM64_data_processing_immediate(/*MB_OUT*/DisResult* dres, } after_extr: - vex_printf("ARM64 front end: data_processing_immediate\n"); + if (sigill_diag) { + vex_printf("ARM64 front end: data_processing_immediate\n"); + } return False; # undef INSN } @@ -2804,7 +2806,7 @@ static IRTemp getShiftedIRegOrZR ( Bool is64, static Bool dis_ARM64_data_processing_register(/*MB_OUT*/DisResult* dres, - UInt insn) + UInt insn, Bool sigill_diag) { # define INSN(_bMax,_bMin) SLICE_UInt(insn, (_bMax), (_bMin)) @@ -3581,7 +3583,9 @@ Bool dis_ARM64_data_processing_register(/*MB_OUT*/DisResult* dres, /* fall through */ } - vex_printf("ARM64 front end: data_processing_register\n"); + if (sigill_diag) { + vex_printf("ARM64 front end: data_processing_register\n"); + } return False; # undef INSN } @@ -4646,7 +4650,9 @@ static IRTemp gen_indexed_EA ( /*OUT*/HChar* buf, UInt insn, Bool isInt ) return res; fail: - vex_printf("gen_indexed_EA: unhandled case optS == 0x%x\n", optS); + if (0 /*really, sigill_diag, but that causes too much plumbing*/) { + vex_printf("gen_indexed_EA: unhandled case optS == 0x%x\n", optS); + } return IRTemp_INVALID; } @@ -4717,8 +4723,7 @@ const HChar* nameArr_Q_SZ ( UInt bitQ, UInt size ) static Bool dis_ARM64_load_store(/*MB_OUT*/DisResult* dres, UInt insn, - const VexAbiInfo* abiinfo -) + const VexAbiInfo* abiinfo, Bool sigill_diag) { # define INSN(_bMax,_bMin) SLICE_UInt(insn, (_bMax), (_bMin)) @@ -6859,7 +6864,10 @@ Bool dis_ARM64_load_store(/*MB_OUT*/DisResult* dres, UInt insn, return True; } - vex_printf("ARM64 front end: load_store\n"); + if (sigill_diag) { + vex_printf("ARM64 front end: load_store\n"); + } + return False; # undef INSN } @@ -6872,7 +6880,7 @@ Bool dis_ARM64_load_store(/*MB_OUT*/DisResult* dres, UInt insn, static Bool dis_ARM64_branch_etc(/*MB_OUT*/DisResult* dres, UInt insn, const VexArchInfo* archinfo, - const VexAbiInfo* abiinfo) + const VexAbiInfo* abiinfo, Bool sigill_diag) { # define INSN(_bMax,_bMin) SLICE_UInt(insn, (_bMax), (_bMin)) @@ -7241,6 +7249,8 @@ Bool dis_ARM64_branch_etc(/*MB_OUT*/DisResult* dres, UInt insn, /* D5 0B 7B 001 Rt dc cvau, rT */ if ((INSN(31,0) & 0xFFFFFFE0) == 0xD50B7B20) { + /* JRS 2019Nov24: should we handle DC_CIVAC the same? + || (INSN(31,0) & 0xFFFFFFE0) == 0xD50B7E20 */ /* Exactly the same scheme as for IC IVAU, except we observe the dMinLine size, and request an Ijk_FlushDCache instead of Ijk_InvalICache. */ @@ -7360,7 +7370,9 @@ Bool dis_ARM64_branch_etc(/*MB_OUT*/DisResult* dres, UInt insn, return True; } - vex_printf("ARM64 front end: branch_etc\n"); + if (sigill_diag) { + vex_printf("ARM64 front end: branch_etc\n"); + } return False; # undef INSN } @@ -14798,7 +14810,8 @@ Bool disInstr_ARM64_WRK ( /*MB_OUT*/DisResult* dres, const UChar* guest_instr, const VexArchInfo* archinfo, - const VexAbiInfo* abiinfo + const VexAbiInfo* abiinfo, + Bool sigill_diag ) { // A macro to fish bits out of 'insn'. @@ -14922,20 +14935,20 @@ Bool disInstr_ARM64_WRK ( switch (INSN(28,25)) { case BITS4(1,0,0,0): case BITS4(1,0,0,1): // Data processing - immediate - ok = dis_ARM64_data_processing_immediate(dres, insn); + ok = dis_ARM64_data_processing_immediate(dres, insn, sigill_diag); break; case BITS4(1,0,1,0): case BITS4(1,0,1,1): // Branch, exception generation and system instructions - ok = dis_ARM64_branch_etc(dres, insn, archinfo, abiinfo); + ok = dis_ARM64_branch_etc(dres, insn, archinfo, abiinfo, sigill_diag); break; case BITS4(0,1,0,0): case BITS4(0,1,1,0): case BITS4(1,1,0,0): case BITS4(1,1,1,0): // Loads and stores - ok = dis_ARM64_load_store(dres, insn, abiinfo); + ok = dis_ARM64_load_store(dres, insn, abiinfo, sigill_diag); break; case BITS4(0,1,0,1): case BITS4(1,1,0,1): // Data processing - register - ok = dis_ARM64_data_processing_register(dres, insn); + ok = dis_ARM64_data_processing_register(dres, insn, sigill_diag); break; case BITS4(0,1,1,1): case BITS4(1,1,1,1): // Data processing - SIMD and floating point @@ -14998,7 +15011,7 @@ DisResult disInstr_ARM64 ( IRSB* irsb_IN, /* Try to decode */ Bool ok = disInstr_ARM64_WRK( &dres, &guest_code_IN[delta_IN], - archinfo, abiinfo ); + archinfo, abiinfo, sigill_diag_IN ); if (ok) { /* All decode successes end up here. */ vassert(dres.len == 4 || dres.len == 20); diff --git a/VEX/priv/guest_generic_bb_to_IR.c b/VEX/priv/guest_generic_bb_to_IR.c index 81cc493..677cfca 100644 --- a/VEX/priv/guest_generic_bb_to_IR.c +++ b/VEX/priv/guest_generic_bb_to_IR.c @@ -420,9 +420,12 @@ static Bool expr_is_guardable ( const IRExpr* e ) return !primopMightTrap(e->Iex.Unop.op); case Iex_Binop: return !primopMightTrap(e->Iex.Binop.op); + case Iex_Triop: + return !primopMightTrap(e->Iex.Triop.details->op); case Iex_ITE: case Iex_CCall: case Iex_Get: + case Iex_Const: return True; default: vex_printf("\n"); ppIRExpr(e); vex_printf("\n"); @@ -442,13 +445,23 @@ static Bool expr_is_guardable ( const IRExpr* e ) static Bool stmt_is_guardable ( const IRStmt* st ) { switch (st->tag) { + // These are easily guarded. case Ist_IMark: case Ist_Put: return True; - case Ist_Store: // definitely not - case Ist_CAS: // definitely not - case Ist_Exit: // We could in fact spec this, if required + // These are definitely not guardable, or at least it's way too much + // hassle to do so. + case Ist_CAS: + case Ist_LLSC: + case Ist_MBE: return False; + // These could be guarded, with some effort, if really needed, but + // currently aren't guardable. + case Ist_Store: + case Ist_Exit: + return False; + // This is probably guardable, but it depends on the RHS of the + // assignment. case Ist_WrTmp: return expr_is_guardable(st->Ist.WrTmp.data); default: diff --git a/VEX/priv/host_arm64_defs.c b/VEX/priv/host_arm64_defs.c index dba2f18..33acae9 100644 --- a/VEX/priv/host_arm64_defs.c +++ b/VEX/priv/host_arm64_defs.c @@ -870,6 +870,13 @@ ARM64Instr* ARM64Instr_Unary ( HReg dst, HReg src, ARM64UnaryOp op ) { i->ARM64in.Unary.op = op; return i; } +ARM64Instr* ARM64Instr_Set64 ( HReg dst, ARM64CondCode cond ) { + ARM64Instr* i = LibVEX_Alloc_inline(sizeof(ARM64Instr)); + i->tag = ARM64in_Set64; + i->ARM64in.Set64.dst = dst; + i->ARM64in.Set64.cond = cond; + return i; +} ARM64Instr* ARM64Instr_MovI ( HReg dst, HReg src ) { ARM64Instr* i = LibVEX_Alloc_inline(sizeof(ARM64Instr)); i->tag = ARM64in_MovI; @@ -1417,6 +1424,11 @@ void ppARM64Instr ( const ARM64Instr* i ) { vex_printf(", "); ppHRegARM64(i->ARM64in.Unary.src); return; + case ARM64in_Set64: + vex_printf("cset "); + ppHRegARM64(i->ARM64in.Set64.dst); + vex_printf(", %s", showARM64CondCode(i->ARM64in.Set64.cond)); + return; case ARM64in_MovI: vex_printf("mov "); ppHRegARM64(i->ARM64in.MovI.dst); @@ -1953,6 +1965,9 @@ void getRegUsage_ARM64Instr ( HRegUsage* u, const ARM64Instr* i, Bool mode64 ) addHRegUse(u, HRmWrite, i->ARM64in.Unary.dst); addHRegUse(u, HRmRead, i->ARM64in.Unary.src); return; + case ARM64in_Set64: + addHRegUse(u, HRmWrite, i->ARM64in.Set64.dst); + return; case ARM64in_MovI: addHRegUse(u, HRmWrite, i->ARM64in.MovI.dst); addHRegUse(u, HRmRead, i->ARM64in.MovI.src); @@ -2295,6 +2310,9 @@ void mapRegs_ARM64Instr ( HRegRemap* m, ARM64Instr* i, Bool mode64 ) i->ARM64in.Unary.dst = lookupHRegRemap(m, i->ARM64in.Unary.dst); i->ARM64in.Unary.src = lookupHRegRemap(m, i->ARM64in.Unary.src); return; + case ARM64in_Set64: + i->ARM64in.Set64.dst = lookupHRegRemap(m, i->ARM64in.Set64.dst); + return; case ARM64in_MovI: i->ARM64in.MovI.dst = lookupHRegRemap(m, i->ARM64in.MovI.dst); i->ARM64in.MovI.src = lookupHRegRemap(m, i->ARM64in.MovI.src); @@ -3482,6 +3500,15 @@ Int emit_ARM64Instr ( /*MB_MOD*/Bool* is_profInc, } goto bad; } + case ARM64in_Set64: { + /* 1 00 1101 0100 11111 invert(cond) 01 11111 Rd CSET Rd, Cond */ + UInt rDst = iregEnc(i->ARM64in.Set64.dst); + UInt cc = (UInt)i->ARM64in.Set64.cond; + vassert(cc < 14); + *p++ = X_3_8_5_6_5_5(X100, X11010100, X11111, + ((cc ^ 1) << 2) | X01, X11111, rDst); + goto done; + } case ARM64in_MovI: { /* We generate the "preferred form", ORR Xd, XZR, Xm 101 01010 00 0 m 000000 11111 d diff --git a/VEX/priv/host_arm64_defs.h b/VEX/priv/host_arm64_defs.h index aa4f943..db50056 100644 --- a/VEX/priv/host_arm64_defs.h +++ b/VEX/priv/host_arm64_defs.h @@ -463,6 +463,7 @@ typedef ARM64in_Test, ARM64in_Shift, ARM64in_Unary, + ARM64in_Set64, ARM64in_MovI, /* int reg-reg move */ ARM64in_Imm64, ARM64in_LdSt64, @@ -566,6 +567,11 @@ typedef HReg src; ARM64UnaryOp op; } Unary; + /* CSET -- Convert a condition code to a 64-bit value (0 or 1). */ + struct { + HReg dst; + ARM64CondCode cond; + } Set64; /* MOV dst, src -- reg-reg move for integer registers */ struct { HReg dst; @@ -915,6 +921,7 @@ extern ARM64Instr* ARM64Instr_Logic ( HReg, HReg, ARM64RIL*, ARM64LogicOp ); extern ARM64Instr* ARM64Instr_Test ( HReg, ARM64RIL* ); extern ARM64Instr* ARM64Instr_Shift ( HReg, HReg, ARM64RI6*, ARM64ShiftOp ); extern ARM64Instr* ARM64Instr_Unary ( HReg, HReg, ARM64UnaryOp ); +extern ARM64Instr* ARM64Instr_Set64 ( HReg, ARM64CondCode ); extern ARM64Instr* ARM64Instr_MovI ( HReg, HReg ); extern ARM64Instr* ARM64Instr_Imm64 ( HReg, ULong ); extern ARM64Instr* ARM64Instr_LdSt64 ( Bool isLoad, HReg, ARM64AMode* ); diff --git a/VEX/priv/host_arm64_isel.c b/VEX/priv/host_arm64_isel.c index 0fa16e7..eb7630e 100644 --- a/VEX/priv/host_arm64_isel.c +++ b/VEX/priv/host_arm64_isel.c @@ -1310,6 +1310,21 @@ static ARM64CondCode iselCondCode_wrk ( ISelEnv* env, IRExpr* e ) return ARM64cc_NE; } + /* Constant 1:Bit */ + if (e->tag == Iex_Const) { + /* This is a very stupid translation. Hopefully it doesn't occur much, + if ever. */ + vassert(e->Iex.Const.con->tag == Ico_U1); + vassert(e->Iex.Const.con->Ico.U1 == True + || e->Iex.Const.con->Ico.U1 == False); + HReg rTmp = newVRegI(env); + addInstr(env, ARM64Instr_Imm64(rTmp, 0)); + ARM64RIL* one = mb_mkARM64RIL_I(1); + vassert(one); + addInstr(env, ARM64Instr_Test(rTmp, one)); + return e->Iex.Const.con->Ico.U1 ? ARM64cc_EQ : ARM64cc_NE; + } + /* Not1(e) */ if (e->tag == Iex_Unop && e->Iex.Unop.op == Iop_Not1) { /* Generate code for the arg, and negate the test condition */ @@ -1452,6 +1467,31 @@ static ARM64CondCode iselCondCode_wrk ( ISelEnv* env, IRExpr* e ) } } + /* --- And1(x,y), Or1(x,y) --- */ + /* FIXME: We could (and probably should) do a lot better here, by using the + iselCondCode_C/_R scheme used in the amd64 insn selector. */ + if (e->tag == Iex_Binop + && (e->Iex.Binop.op == Iop_And1 || e->Iex.Binop.op == Iop_Or1)) { + HReg x_as_64 = newVRegI(env); + ARM64CondCode cc_x = iselCondCode(env, e->Iex.Binop.arg1); + addInstr(env, ARM64Instr_Set64(x_as_64, cc_x)); + + HReg y_as_64 = newVRegI(env); + ARM64CondCode cc_y = iselCondCode(env, e->Iex.Binop.arg2); + addInstr(env, ARM64Instr_Set64(y_as_64, cc_y)); + + HReg tmp = newVRegI(env); + ARM64LogicOp lop + = e->Iex.Binop.op == Iop_And1 ? ARM64lo_AND : ARM64lo_OR; + addInstr(env, ARM64Instr_Logic(tmp, x_as_64, ARM64RIL_R(y_as_64), lop)); + + ARM64RIL* one = mb_mkARM64RIL_I(1); + vassert(one); + addInstr(env, ARM64Instr_Test(tmp, one)); + + return ARM64cc_NE; + } + ppIRExpr(e); vpanic("iselCondCode"); } @@ -2995,6 +3035,55 @@ static HReg iselV128Expr_wrk ( ISelEnv* env, IRExpr* e ) } /* if (e->tag == Iex_Triop) */ + if (0 && e->tag == Iex_ITE) { + /* JRS 2019Nov24: I think this is right, and it is somewhat tested, but + not as much as I'd like. Hence disabled till it can be tested more. */ + // This is pretty feeble. We'd do better to generate BSL here. + HReg rX = newVRegI(env); + + ARM64CondCode cc = iselCondCode(env, e->Iex.ITE.cond); + addInstr(env, ARM64Instr_Set64(rX, cc)); + // cond: rX = 1 !cond: rX = 0 + + // Mask the Set64 result. This is paranoia (should be unnecessary). + ARM64RIL* one = mb_mkARM64RIL_I(1); + vassert(one); + addInstr(env, ARM64Instr_Logic(rX, rX, one, ARM64lo_AND)); + // cond: rX = 1 !cond: rX = 0 + + // Propagate to all bits in the 64 bit word by subtracting 1 from it. + // This also inverts the sense of the value. + addInstr(env, ARM64Instr_Arith(rX, rX, ARM64RIA_I12(1,0), + /*isAdd=*/False)); + // cond: rX = 0-(62)-0 !cond: rX = 1-(62)-1 + + // Duplicate rX into a vector register + HReg vMask = newVRegV(env); + addInstr(env, ARM64Instr_VQfromXX(vMask, rX, rX)); + // cond: vMask = 0-(126)-0 !cond: vMask = 1-(126)-1 + + HReg vIfTrue = iselV128Expr(env, e->Iex.ITE.iftrue); + HReg vIfFalse = iselV128Expr(env, e->Iex.ITE.iffalse); + + // Mask out iffalse value as needed + addInstr(env, + ARM64Instr_VBinV(ARM64vecb_AND, vIfFalse, vIfFalse, vMask)); + + // Invert the mask so we can use it for the iftrue value + addInstr(env, ARM64Instr_VUnaryV(ARM64vecu_NOT, vMask, vMask)); + // cond: vMask = 1-(126)-1 !cond: vMask = 0-(126)-0 + + // Mask out iftrue value as needed + addInstr(env, + ARM64Instr_VBinV(ARM64vecb_AND, vIfTrue, vIfTrue, vMask)); + + // Merge the masked iftrue and iffalse results. + HReg res = newVRegV(env); + addInstr(env, ARM64Instr_VBinV(ARM64vecb_ORR, res, vIfTrue, vIfFalse)); + + return res; + } + v128_expr_bad: ppIRExpr(e); vpanic("iselV128Expr_wrk"); |
|
From: Julian S. <se...@so...> - 2020-01-02 06:29:30
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=076a79a48e251067758e1e9d8e50681450ed3889 commit 076a79a48e251067758e1e9d8e50681450ed3889 Author: Julian Seward <js...@ac...> Date: Wed Nov 27 08:52:45 2019 +0100 'grail' fixes for ppc32 and ppc64: * do_minimal_initial_iropt_BB: for ppc64, flatten rather than assert flatness. (Kludge. Sigh.) * priv/host_ppc_isel.c iselCondCode_wrk(): handle And1 and Or1, the not-particularly-optimal way * priv/host_ppc_isel.c iselCondCode_wrk(): handle Ico_U1(0). Diff: --- VEX/priv/host_ppc_isel.c | 31 ++++++++++++++++++++++++++++--- VEX/priv/ir_opt.c | 11 ++++++++++- 2 files changed, 38 insertions(+), 4 deletions(-) diff --git a/VEX/priv/host_ppc_isel.c b/VEX/priv/host_ppc_isel.c index 5e2a3b8..9c954da 100644 --- a/VEX/priv/host_ppc_isel.c +++ b/VEX/priv/host_ppc_isel.c @@ -3095,13 +3095,15 @@ static PPCCondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e, vassert(typeOfIRExpr(env->type_env,e) == Ity_I1); /* Constant 1:Bit */ - if (e->tag == Iex_Const && e->Iex.Const.con->Ico.U1 == True) { - // Make a compare that will always be true: + if (e->tag == Iex_Const) { + // Make a compare that will always be true (or always false): + vassert(e->Iex.Const.con->Ico.U1 == True || e->Iex.Const.con->Ico.U1 == False); HReg r_zero = newVRegI(env); addInstr(env, PPCInstr_LI(r_zero, 0, env->mode64)); addInstr(env, PPCInstr_Cmp(False/*unsigned*/, True/*32bit cmp*/, 7/*cr*/, r_zero, PPCRH_Reg(r_zero))); - return mk_PPCCondCode( Pct_TRUE, Pcf_7EQ ); + return mk_PPCCondCode( e->Iex.Const.con->Ico.U1 ? Pct_TRUE : Pct_FALSE, + Pcf_7EQ ); } /* Not1(...) */ @@ -3260,6 +3262,29 @@ static PPCCondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e, return mk_PPCCondCode( Pct_TRUE, Pcf_7EQ ); } + /* --- And1(x,y), Or1(x,y) --- */ + /* FIXME: We could (and probably should) do a lot better here, by using the + iselCondCode_C/_R scheme used in the amd64 insn selector. */ + if (e->tag == Iex_Binop + && (e->Iex.Binop.op == Iop_And1 || e->Iex.Binop.op == Iop_Or1)) { + HReg x_as_int = newVRegI(env); + PPCCondCode cc_x = iselCondCode(env, e->Iex.Binop.arg1, IEndianess); + addInstr(env, PPCInstr_Set(cc_x, x_as_int)); + + HReg y_as_int = newVRegI(env); + PPCCondCode cc_y = iselCondCode(env, e->Iex.Binop.arg2, IEndianess); + addInstr(env, PPCInstr_Set(cc_y, y_as_int)); + + HReg tmp = newVRegI(env); + PPCAluOp op = e->Iex.Binop.op == Iop_And1 ? Palu_AND : Palu_OR; + addInstr(env, PPCInstr_Alu(op, tmp, x_as_int, PPCRH_Reg(y_as_int))); + + addInstr(env, PPCInstr_Alu(Palu_AND, tmp, tmp, PPCRH_Imm(False,1))); + addInstr(env, PPCInstr_Cmp(False/*unsigned*/, True/*32bit cmp*/, + 7/*cr*/, tmp, PPCRH_Imm(False,1))); + return mk_PPCCondCode( Pct_TRUE, Pcf_7EQ ); + } + vex_printf("iselCondCode(ppc): No such tag(%u)\n", e->tag); ppIRExpr(e); vpanic("iselCondCode(ppc)"); diff --git a/VEX/priv/ir_opt.c b/VEX/priv/ir_opt.c index cb75be8..37e39bc 100644 --- a/VEX/priv/ir_opt.c +++ b/VEX/priv/ir_opt.c @@ -6679,7 +6679,16 @@ IRSB* do_iropt_BB( processed by do_minimal_initial_iropt_BB. And that will have flattened them out. */ // FIXME Remove this assertion once the 'grail' machinery seems stable - vassert(isFlatIRSB(bb0)); + // FIXME2 The TOC-redirect-hacks generators in m_translate.c -- gen_PUSH() + // and gen_PO() -- don't generate flat IR, and so cause this assertion + // to fail. For the time being, hack around this by flattening, + // rather than asserting for flatness, on the afflicted platforms. + // This is a kludge, yes. + if (guest_arch == VexArchPPC64) { + bb0 = flatten_BB(bb0); // Kludge! + } else { + vassert(isFlatIRSB(bb0)); // How it Really Should Be (tm). + } /* If at level 0, stop now. */ if (vex_control.iropt_level <= 0) return bb0; |
|
From: Julian S. <se...@so...> - 2020-01-02 06:29:24
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=1df8c25b42b3834b65fffd34a445e2cf26179753 commit 1df8c25b42b3834b65fffd34a445e2cf26179753 Author: Julian Seward <js...@ac...> Date: Wed Nov 27 06:37:42 2019 +0100 'grail' fixes for arm32: * priv/guest_generic_bb_to_IR.c expr_is_guardable(), stmt_is_guardable(): add some missing cases * do_minimal_initial_iropt_BB: add comment (no functional change) * priv/host_arm_isel.c iselCondCode_wrk(): handle And1 and Or1, the not-particularly-optimal way Diff: --- VEX/priv/guest_generic_bb_to_IR.c | 4 ++++ VEX/priv/host_arm_isel.c | 24 ++++++++++++++++++++++++ VEX/priv/ir_opt.c | 2 ++ 3 files changed, 30 insertions(+) diff --git a/VEX/priv/guest_generic_bb_to_IR.c b/VEX/priv/guest_generic_bb_to_IR.c index 677cfca..6a1d4dc 100644 --- a/VEX/priv/guest_generic_bb_to_IR.c +++ b/VEX/priv/guest_generic_bb_to_IR.c @@ -426,6 +426,7 @@ static Bool expr_is_guardable ( const IRExpr* e ) case Iex_CCall: case Iex_Get: case Iex_Const: + case Iex_RdTmp: return True; default: vex_printf("\n"); ppIRExpr(e); vex_printf("\n"); @@ -446,6 +447,7 @@ static Bool stmt_is_guardable ( const IRStmt* st ) { switch (st->tag) { // These are easily guarded. + case Ist_NoOp: case Ist_IMark: case Ist_Put: return True; @@ -458,6 +460,7 @@ static Bool stmt_is_guardable ( const IRStmt* st ) // These could be guarded, with some effort, if really needed, but // currently aren't guardable. case Ist_Store: + case Ist_StoreG: case Ist_Exit: return False; // This is probably guardable, but it depends on the RHS of the @@ -492,6 +495,7 @@ static void add_guarded_stmt_to_end_of ( /*MOD*/IRSB* bb, /*IN*/ IRStmt* st, IRTemp guard ) { switch (st->tag) { + case Ist_NoOp: case Ist_IMark: case Ist_WrTmp: addStmtToIRSB(bb, st); diff --git a/VEX/priv/host_arm_isel.c b/VEX/priv/host_arm_isel.c index 510336b..acbd39a 100644 --- a/VEX/priv/host_arm_isel.c +++ b/VEX/priv/host_arm_isel.c @@ -1293,6 +1293,30 @@ static ARMCondCode iselCondCode_wrk ( ISelEnv* env, IRExpr* e ) return e->Iex.Const.con->Ico.U1 ? ARMcc_EQ : ARMcc_NE; } + /* --- And1(x,y), Or1(x,y) --- */ + /* FIXME: We could (and probably should) do a lot better here, by using the + iselCondCode_C/_R scheme used in the amd64 insn selector. */ + if (e->tag == Iex_Binop + && (e->Iex.Binop.op == Iop_And1 || e->Iex.Binop.op == Iop_Or1)) { + HReg x_as_32 = newVRegI(env); + ARMCondCode cc_x = iselCondCode(env, e->Iex.Binop.arg1); + addInstr(env, ARMInstr_Mov(x_as_32, ARMRI84_I84(0,0))); + addInstr(env, ARMInstr_CMov(cc_x, x_as_32, ARMRI84_I84(1,0))); + + HReg y_as_32 = newVRegI(env); + ARMCondCode cc_y = iselCondCode(env, e->Iex.Binop.arg2); + addInstr(env, ARMInstr_Mov(y_as_32, ARMRI84_I84(0,0))); + addInstr(env, ARMInstr_CMov(cc_y, y_as_32, ARMRI84_I84(1,0))); + + HReg tmp = newVRegI(env); + ARMAluOp aop = e->Iex.Binop.op == Iop_And1 ? ARMalu_AND : ARMalu_OR; + addInstr(env, ARMInstr_Alu(aop, tmp, x_as_32, ARMRI84_R(y_as_32))); + + ARMRI84* one = ARMRI84_I84(1,0); + addInstr(env, ARMInstr_CmpOrTst(False/*test*/, tmp, one)); + return ARMcc_NE; + } + // JRS 2013-Jan-03: this seems completely nonsensical /* --- CasCmpEQ* --- */ /* Ist_Cas has a dummy argument to compare with, so comparison is diff --git a/VEX/priv/ir_opt.c b/VEX/priv/ir_opt.c index aa259ae..cb75be8 100644 --- a/VEX/priv/ir_opt.c +++ b/VEX/priv/ir_opt.c @@ -6780,6 +6780,8 @@ IRSB* do_minimal_initial_iropt_BB(IRSB* bb0) { redundant_get_removal_BB ( bb ); // Do minimal constant prop: copy prop and constant prop only. No folding. + // JRS FIXME 2019Nov25: this is too weak to be effective on arm32. For that, + // specifying doFolding=True makes a huge difference. bb = cprop_BB_WRK ( bb, /*mustRetainNoOps=*/True, /*doFolding=*/False ); |
|
From: Julian S. <se...@so...> - 2020-01-02 06:29:18
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=e404fe924c08c0946b836aa6c1aa85f7850029e4 commit e404fe924c08c0946b836aa6c1aa85f7850029e4 Author: Julian Seward <js...@ac...> Date: Fri Nov 22 19:27:43 2019 +0100 bb_to_IR(): Avoid causing spurious SIGILL-diagnostic messages .. .. when speculating into conditional-branch destinations. A simple change requiring a big comment explaining the rationale. Diff: --- VEX/priv/guest_generic_bb_to_IR.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/VEX/priv/guest_generic_bb_to_IR.c b/VEX/priv/guest_generic_bb_to_IR.c index f890c33..81cc493 100644 --- a/VEX/priv/guest_generic_bb_to_IR.c +++ b/VEX/priv/guest_generic_bb_to_IR.c @@ -1358,6 +1358,34 @@ IRSB* bb_to_IR ( // Try for an extend. What kind we do depends on how the current trace // ends. + /* Regarding the use of |sigill_diag| in the extension logic below. This + is a Bool which controls whether or not the individual insn + disassemblers print an error message in the case where they don't + recognise an instruction. Generally speaking this is set to True, but + VEX's client can set it to False if it wants. + + Now that we are speculatively chasing both arms of a conditional + branch, this can lead to the following problem: one of those arms + contains an undecodable instruction. That insn is not reached at run + time, because the branch itself tests some CPU hwcaps info (or + whatever) and execution goes down the other path. However, it has the + bad side effect that the speculative disassembly will nevertheless + produce an error message when |sigill_diag| is True. + + To avoid this, in calls to |disassemble_basic_block_till_stop| for + speculative code, we pass False instead of |sigill_diag|. Note that + any (unconditional-chase) call to |disassemble_basic_block_till_stop| + that happens after a conditional chase that results in recovery of an + &&-idiom, is still really non-speculative, because the &&-idiom + translation can only happen when both paths lead to the same + continuation point. The result is that we know that the initial BB, + and BBs recovered via chasing an unconditional branch, are sure to be + executed, even if that unconditional branch follows a conditional + branch which got folded into an &&-idiom. So we don't need to change + the |sigill_diag| value used for them. It's only for the + conditional-branch SX and FT disassembly that it must be set to + |False|. + */ BlockEnd irsb_be; analyse_block_end(&irsb_be, irsb, guest_IP_sbstart, guest_word_type, chase_into_ok, callback_opaque, @@ -1423,7 +1451,8 @@ IRSB* bb_to_IR ( /*OUT*/ &sx_instrs_used, &sx_verbose_seen, &sx_base, &sx_len, /*MOD*/ emptyIRSB(), /*IN*/ irsb_be.Be.Cond.deltaSX, - instrs_avail_spec, guest_IP_sbstart, host_endness, sigill_diag, + instrs_avail_spec, guest_IP_sbstart, host_endness, + /*sigill_diag=*/False, // See comment above arch_guest, archinfo_guest, abiinfo_both, guest_word_type, debug_print, dis_instr_fn, guest_code, offB_GUEST_IP ); @@ -1445,7 +1474,8 @@ IRSB* bb_to_IR ( /*OUT*/ &ft_instrs_used, &ft_verbose_seen, &ft_base, &ft_len, /*MOD*/ emptyIRSB(), /*IN*/ irsb_be.Be.Cond.deltaFT, - instrs_avail_spec, guest_IP_sbstart, host_endness, sigill_diag, + instrs_avail_spec, guest_IP_sbstart, host_endness, + /*sigill_diag=*/False, // See comment above arch_guest, archinfo_guest, abiinfo_both, guest_word_type, debug_print, dis_instr_fn, guest_code, offB_GUEST_IP ); |
|
From: Julian S. <se...@so...> - 2020-01-02 06:29:09
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=0c4d8bd04a7fc773a2d01d390d2cc6a97d493bde commit 0c4d8bd04a7fc773a2d01d390d2cc6a97d493bde Author: Julian Seward <js...@ac...> Date: Thu Nov 21 08:55:43 2019 +0100 Add statistics printing for the new trace construction algorithm. Diff: --- VEX/priv/guest_generic_bb_to_IR.c | 6 ++++++ VEX/priv/guest_generic_bb_to_IR.h | 2 ++ VEX/priv/main_main.c | 4 ++++ VEX/pub/libvex.h | 6 ++++++ coregrind/m_translate.c | 17 +++++++++++++++++ 5 files changed, 35 insertions(+) diff --git a/VEX/priv/guest_generic_bb_to_IR.c b/VEX/priv/guest_generic_bb_to_IR.c index 16d83a8..7477641 100644 --- a/VEX/priv/guest_generic_bb_to_IR.c +++ b/VEX/priv/guest_generic_bb_to_IR.c @@ -1209,6 +1209,8 @@ IRSB* bb_to_IR ( /*OUT*/VexGuestExtents* vge, /*OUT*/UInt* n_sc_extents, /*OUT*/UInt* n_guest_instrs, /* stats only */ + /*OUT*/UShort* n_uncond_in_trace, /* stats only */ + /*OUT*/UShort* n_cond_in_trace, /* stats only */ /*MOD*/VexRegisterUpdates* pxControl, /*IN*/ void* callback_opaque, /*IN*/ DisOneInstrFn dis_instr_fn, @@ -1252,6 +1254,8 @@ IRSB* bb_to_IR ( vge->n_used = 0; *n_sc_extents = 0; *n_guest_instrs = 0; + *n_uncond_in_trace = 0; + *n_cond_in_trace = 0; /* And a new IR superblock to dump the result into. */ IRSB* irsb = emptyIRSB(); @@ -1375,6 +1379,7 @@ IRSB* bb_to_IR ( add_extent(vge, bb_base, bb_len); update_instr_budget(&instrs_avail, &verbose_mode, bb_instrs_used, bb_verbose_seen); + *n_uncond_in_trace += 1; } // if (be.tag == Be_Uncond) // Try for an extend based on a conditional branch, specifically in the @@ -1567,6 +1572,7 @@ IRSB* bb_to_IR ( add_extent(vge, sx_base, sx_len); update_instr_budget(&instrs_avail, &verbose_mode, sx_instrs_used, sx_verbose_seen); + *n_cond_in_trace += 1; } break; } // if (be.tag == Be_Cond) diff --git a/VEX/priv/guest_generic_bb_to_IR.h b/VEX/priv/guest_generic_bb_to_IR.h index 08d33ad..cad6768 100644 --- a/VEX/priv/guest_generic_bb_to_IR.h +++ b/VEX/priv/guest_generic_bb_to_IR.h @@ -143,6 +143,8 @@ IRSB* bb_to_IR ( /*OUT*/VexGuestExtents* vge, /*OUT*/UInt* n_sc_extents, /*OUT*/UInt* n_guest_instrs, /* stats only */ + /*OUT*/UShort* n_uncond_in_trace, /* stats only */ + /*OUT*/UShort* n_cond_in_trace, /* stats only */ /*MOD*/VexRegisterUpdates* pxControl, /*IN*/ void* callback_opaque, /*IN*/ DisOneInstrFn dis_instr_fn, diff --git a/VEX/priv/main_main.c b/VEX/priv/main_main.c index 0da2b46..5acab9e 100644 --- a/VEX/priv/main_main.c +++ b/VEX/priv/main_main.c @@ -554,6 +554,8 @@ IRSB* LibVEX_FrontEnd ( /*MOD*/ VexTranslateArgs* vta, res->n_sc_extents = 0; res->offs_profInc = -1; res->n_guest_instrs = 0; + res->n_uncond_in_trace = 0; + res->n_cond_in_trace = 0; #ifndef VEXMULTIARCH /* yet more sanity checks ... */ @@ -581,6 +583,8 @@ IRSB* LibVEX_FrontEnd ( /*MOD*/ VexTranslateArgs* vta, irsb = bb_to_IR ( vta->guest_extents, &res->n_sc_extents, &res->n_guest_instrs, + &res->n_uncond_in_trace, + &res->n_cond_in_trace, pxControl, vta->callback_opaque, disInstrFn, diff --git a/VEX/pub/libvex.h b/VEX/pub/libvex.h index 5a6a0e8..5d3733d 100644 --- a/VEX/pub/libvex.h +++ b/VEX/pub/libvex.h @@ -651,6 +651,12 @@ typedef /* Stats only: the number of guest insns included in the translation. It may be zero (!). */ UInt n_guest_instrs; + /* Stats only: the number of unconditional branches incorporated into the + trace. */ + UShort n_uncond_in_trace; + /* Stats only: the number of conditional branches incorporated into the + trace. */ + UShort n_cond_in_trace; } VexTranslateResult; diff --git a/coregrind/m_translate.c b/coregrind/m_translate.c index ae1cfcd..332202a 100644 --- a/coregrind/m_translate.c +++ b/coregrind/m_translate.c @@ -64,6 +64,11 @@ /*--- Stats ---*/ /*------------------------------------------------------------*/ +static ULong n_TRACE_total_constructed = 0; +static ULong n_TRACE_total_guest_insns = 0; +static ULong n_TRACE_total_uncond_branches_followed = 0; +static ULong n_TRACE_total_cond_branches_followed = 0; + static ULong n_SP_updates_new_fast = 0; static ULong n_SP_updates_new_generic_known = 0; static ULong n_SP_updates_die_fast = 0; @@ -77,6 +82,13 @@ static ULong n_PX_VexRegUpdAllregsAtEachInsn = 0; void VG_(print_translation_stats) ( void ) { + VG_(message) + (Vg_DebugMsg, + "translate: %'llu guest insns, %'llu traces, " + "%'llu uncond chased, %llu cond chased\n", + n_TRACE_total_guest_insns, n_TRACE_total_constructed, + n_TRACE_total_uncond_branches_followed, + n_TRACE_total_cond_branches_followed); UInt n_SP_updates = n_SP_updates_new_fast + n_SP_updates_new_generic_known + n_SP_updates_die_fast + n_SP_updates_die_generic_known + n_SP_updates_generic_unknown; @@ -1819,6 +1831,11 @@ Bool VG_(translate) ( ThreadId tid, vg_assert(tres.n_sc_extents >= 0 && tres.n_sc_extents <= 3); vg_assert(tmpbuf_used <= N_TMPBUF); vg_assert(tmpbuf_used > 0); + + n_TRACE_total_constructed += 1; + n_TRACE_total_guest_insns += tres.n_guest_instrs; + n_TRACE_total_uncond_branches_followed += tres.n_uncond_in_trace; + n_TRACE_total_cond_branches_followed += tres.n_cond_in_trace; } /* END new scope specially for 'seg' */ /* Tell aspacem of all segments that have had translations taken |
|
From: Julian S. <se...@so...> - 2020-01-02 06:29:04
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=20e2a5bcc9beb561911b26af50ac6e2a4cc1ee7c commit 20e2a5bcc9beb561911b26af50ac6e2a4cc1ee7c Author: Julian Seward <js...@ac...> Date: Fri Nov 22 08:32:03 2019 +0100 Implement And1 and Or1 for the x86 insn selector. Diff: --- VEX/priv/host_x86_isel.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/VEX/priv/host_x86_isel.c b/VEX/priv/host_x86_isel.c index c8fae08..50b3235 100644 --- a/VEX/priv/host_x86_isel.c +++ b/VEX/priv/host_x86_isel.c @@ -2053,6 +2053,25 @@ static X86CondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e ) } } + /* And1(x,y), Or1(x,y) */ + /* FIXME: We could (and probably should) do a lot better here. If both args + are in temps already then we can just emit a reg-reg And/Or directly, + followed by the final Test. */ + if (e->tag == Iex_Binop + && (e->Iex.Binop.op == Iop_And1 || e->Iex.Binop.op == Iop_Or1)) { + // We could probably be cleverer about this. In the meantime .. + HReg x_as_32 = newVRegI(env); + X86CondCode cc_x = iselCondCode(env, e->Iex.Binop.arg1); + addInstr(env, X86Instr_Set32(cc_x, x_as_32)); + HReg y_as_32 = newVRegI(env); + X86CondCode cc_y = iselCondCode(env, e->Iex.Binop.arg2); + addInstr(env, X86Instr_Set32(cc_y, y_as_32)); + X86AluOp aop = e->Iex.Binop.op == Iop_And1 ? Xalu_AND : Xalu_OR; + addInstr(env, X86Instr_Alu32R(aop, X86RMI_Reg(x_as_32), y_as_32)); + addInstr(env, X86Instr_Test32(1, X86RM_Reg(y_as_32))); + return Xcc_NZ; + } + ppIRExpr(e); vpanic("iselCondCode"); } |
|
From: Julian S. <se...@so...> - 2020-01-02 06:29:04
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=202184079b7539450a99d7cdbd07b0af4874d3de commit 202184079b7539450a99d7cdbd07b0af4874d3de Author: Julian Seward <js...@ac...> Date: Thu Nov 21 20:03:47 2019 +0100 Tidy up ir_opt.c aspects relating to the 'grail' work. In particular: * Rewrite do_minimal_initial_iropt_BB so it doesn't do full constant folding; that is unnecessary expense at this point, and later passes will do it anyway * do_iropt_BB: don't flatten the incoming block, because do_minimal_initial_iropt_BB will have run earlier and done so. But at least for the moment, assert that it really is flat. * VEX/priv/guest_generic_bb_to_IR.c create_self_checks_as_needed: generate flat IR so as not to fail the abovementioned assertion. I believe this completes the target-independent aspects of this work, and also the x86_64 specifics (of which there are very few). Diff: --- VEX/priv/guest_generic_bb_to_IR.c | 82 +++++++++++++--------- VEX/priv/ir_defs.c | 16 ++++- VEX/priv/ir_opt.c | 139 +++++++++++++++++++------------------- VEX/pub/libvex_ir.h | 3 +- 4 files changed, 136 insertions(+), 104 deletions(-) diff --git a/VEX/priv/guest_generic_bb_to_IR.c b/VEX/priv/guest_generic_bb_to_IR.c index 7477641..f890c33 100644 --- a/VEX/priv/guest_generic_bb_to_IR.c +++ b/VEX/priv/guest_generic_bb_to_IR.c @@ -128,7 +128,7 @@ static void create_self_checks_as_needed( Addr base2check; UInt len2check; HWord expectedhW; - IRTemp tistart_tmp, tilen_tmp; + IRTemp tistart_tmp, tilen_tmp, callresult_tmp, exitguard_tmp; HWord VEX_REGPARM(2) (*fn_generic)(HWord, HWord); HWord VEX_REGPARM(1) (*fn_spec)(HWord); const HChar* nm_generic; @@ -161,7 +161,7 @@ static void create_self_checks_as_needed( const Int n_extent_slots = sizeof(vge->base) / sizeof(vge->base[0]); vassert(n_extent_slots == 3); - vassert(selfcheck_idx + (n_extent_slots - 1) * 5 + 4 < irsb->stmts_used); + vassert(selfcheck_idx + (n_extent_slots - 1) * 7 + 6 < irsb->stmts_used); for (Int i = 0; i < vge->n_used; i++) { /* Do we need to generate a check for this extent? */ @@ -297,16 +297,20 @@ static void create_self_checks_as_needed( = guest_word_type==Ity_I32 ? IRConst_U32(len2check) : IRConst_U64(len2check); - IRStmt** stmt0 = &irsb->stmts[selfcheck_idx + i * 5 + 0]; - IRStmt** stmt1 = &irsb->stmts[selfcheck_idx + i * 5 + 1]; - IRStmt** stmt2 = &irsb->stmts[selfcheck_idx + i * 5 + 2]; - IRStmt** stmt3 = &irsb->stmts[selfcheck_idx + i * 5 + 3]; - IRStmt** stmt4 = &irsb->stmts[selfcheck_idx + i * 5 + 4]; + IRStmt** stmt0 = &irsb->stmts[selfcheck_idx + i * 7 + 0]; + IRStmt** stmt1 = &irsb->stmts[selfcheck_idx + i * 7 + 1]; + IRStmt** stmt2 = &irsb->stmts[selfcheck_idx + i * 7 + 2]; + IRStmt** stmt3 = &irsb->stmts[selfcheck_idx + i * 7 + 3]; + IRStmt** stmt4 = &irsb->stmts[selfcheck_idx + i * 7 + 4]; + IRStmt** stmt5 = &irsb->stmts[selfcheck_idx + i * 7 + 5]; + IRStmt** stmt6 = &irsb->stmts[selfcheck_idx + i * 7 + 6]; vassert((*stmt0)->tag == Ist_NoOp); vassert((*stmt1)->tag == Ist_NoOp); vassert((*stmt2)->tag == Ist_NoOp); vassert((*stmt3)->tag == Ist_NoOp); vassert((*stmt4)->tag == Ist_NoOp); + vassert((*stmt5)->tag == Ist_NoOp); + vassert((*stmt6)->tag == Ist_NoOp); *stmt0 = IRStmt_WrTmp(tistart_tmp, IRExpr_Const(base2check_IRConst) ); *stmt1 = IRStmt_WrTmp(tilen_tmp, IRExpr_Const(len2check_IRConst) ); @@ -332,6 +336,8 @@ static void create_self_checks_as_needed( } } + /* Generate the call to the relevant function, and the comparison of + the result against the expected value. */ IRExpr* callexpr = NULL; if (fn_spec) { callexpr = mkIRExprCCall( @@ -352,36 +358,46 @@ static void create_self_checks_as_needed( ); } - *stmt4 - = IRStmt_Exit( - IRExpr_Binop( - host_word_type==Ity_I64 ? Iop_CmpNE64 : Iop_CmpNE32, - callexpr, - host_word_type==Ity_I64 - ? IRExpr_Const(IRConst_U64(expectedhW)) - : IRExpr_Const(IRConst_U32(expectedhW)) - ), - Ijk_InvalICache, - /* Where we must restart if there's a failure: at the - first extent, regardless of which extent the - failure actually happened in. */ - guest_IP_sbstart_IRConst, - offB_GUEST_IP - ); + callresult_tmp = newIRTemp(irsb->tyenv, host_word_type); + *stmt4 = IRStmt_WrTmp(callresult_tmp, callexpr); + + exitguard_tmp = newIRTemp(irsb->tyenv, Ity_I1); + *stmt5 = IRStmt_WrTmp( + exitguard_tmp, + IRExpr_Binop( + host_word_type==Ity_I64 ? Iop_CmpNE64 : Iop_CmpNE32, + IRExpr_RdTmp(callresult_tmp), + host_word_type==Ity_I64 + ? IRExpr_Const(IRConst_U64(expectedhW)) + : IRExpr_Const(IRConst_U32(expectedhW)))); + + *stmt6 = IRStmt_Exit( + IRExpr_RdTmp(exitguard_tmp), + Ijk_InvalICache, + /* Where we must restart if there's a failure: at the + first extent, regardless of which extent the failure + actually happened in. */ + guest_IP_sbstart_IRConst, + offB_GUEST_IP + ); } /* for (i = 0; i < vge->n_used; i++) */ for (Int i = vge->n_used; i < sizeof(vge->base) / sizeof(vge->base[0]); i++) { - IRStmt* stmt0 = irsb->stmts[selfcheck_idx + i * 5 + 0]; - IRStmt* stmt1 = irsb->stmts[selfcheck_idx + i * 5 + 1]; - IRStmt* stmt2 = irsb->stmts[selfcheck_idx + i * 5 + 2]; - IRStmt* stmt3 = irsb->stmts[selfcheck_idx + i * 5 + 3]; - IRStmt* stmt4 = irsb->stmts[selfcheck_idx + i * 5 + 4]; + IRStmt* stmt0 = irsb->stmts[selfcheck_idx + i * 7 + 0]; + IRStmt* stmt1 = irsb->stmts[selfcheck_idx + i * 7 + 1]; + IRStmt* stmt2 = irsb->stmts[selfcheck_idx + i * 7 + 2]; + IRStmt* stmt3 = irsb->stmts[selfcheck_idx + i * 7 + 3]; + IRStmt* stmt4 = irsb->stmts[selfcheck_idx + i * 7 + 4]; + IRStmt* stmt5 = irsb->stmts[selfcheck_idx + i * 7 + 5]; + IRStmt* stmt6 = irsb->stmts[selfcheck_idx + i * 7 + 6]; vassert(stmt0->tag == Ist_NoOp); vassert(stmt1->tag == Ist_NoOp); vassert(stmt2->tag == Ist_NoOp); vassert(stmt3->tag == Ist_NoOp); vassert(stmt4->tag == Ist_NoOp); + vassert(stmt5->tag == Ist_NoOp); + vassert(stmt6->tag == Ist_NoOp); } } } @@ -1260,13 +1276,13 @@ IRSB* bb_to_IR ( /* And a new IR superblock to dump the result into. */ IRSB* irsb = emptyIRSB(); - /* Leave 15 spaces in which to put the check statements for a self - checking translation (up to 3 extents, and 5 stmts required for + /* Leave 21 spaces in which to put the check statements for a self + checking translation (up to 3 extents, and 7 stmts required for each). We won't know until later the extents and checksums of the areas, if any, that need to be checked. */ IRStmt* nop = IRStmt_NoOp(); Int selfcheck_idx = irsb->stmts_used; - for (Int i = 0; i < 3 * 5; i++) + for (Int i = 0; i < 3 * 7; i++) addStmtToIRSB( irsb, nop ); /* If the caller supplied a function to add its own preamble, use @@ -1277,7 +1293,7 @@ IRSB* bb_to_IR ( /* The callback has completed the IR block without any guest insns being disassembled into it, so just return it at this point, even if a self-check was requested - as there - is nothing to self-check. The 15 self-check no-ops will + is nothing to self-check. The 21 self-check no-ops will still be in place, but they are harmless. */ vge->n_used = 1; vge->base[0] = guest_IP_sbstart; @@ -1586,7 +1602,7 @@ IRSB* bb_to_IR ( /* We're almost done. The only thing that might need attending to is that a self-checking preamble may need to be created. If so it gets placed - in the 15 slots reserved above. */ + in the 21 slots reserved above. */ create_self_checks_as_needed( irsb, n_sc_extents, pxControl, callback_opaque, needs_self_check, vge, abiinfo_both, guest_word_type, selfcheck_idx, offB_GUEST_CMSTART, diff --git a/VEX/priv/ir_defs.c b/VEX/priv/ir_defs.c index d687d8f..7b6e847 100644 --- a/VEX/priv/ir_defs.c +++ b/VEX/priv/ir_defs.c @@ -4270,7 +4270,7 @@ static inline Bool isIRAtom_or_VECRET_or_GSPTR ( const IRExpr* e ) return UNLIKELY(is_IRExpr_VECRET_or_GSPTR(e)); } -Bool isFlatIRStmt ( const IRStmt* st ) +inline Bool isFlatIRStmt ( const IRStmt* st ) { Int i; const IRExpr* e; @@ -4374,6 +4374,20 @@ Bool isFlatIRStmt ( const IRStmt* st ) } } +Bool isFlatIRSB ( const IRSB* sb ) +{ + for (Int i = 0; i < sb->stmts_used; i++) { + if (!isFlatIRStmt(sb->stmts[i])) + return False; + } + + if (!isIRAtom(sb->next)) { + return False; + } + + return True; +} + /*---------------------------------------------------------------*/ /*--- Sanity checking ---*/ diff --git a/VEX/priv/ir_opt.c b/VEX/priv/ir_opt.c index 9e9c026..aa259ae 100644 --- a/VEX/priv/ir_opt.c +++ b/VEX/priv/ir_opt.c @@ -1374,7 +1374,8 @@ static IRExpr* chase1 ( IRExpr** env, IRExpr* e ) return env[(Int)e->Iex.RdTmp.tmp]; } -static IRExpr* fold_Expr ( IRExpr** env, IRExpr* e ) +__attribute__((noinline)) +static IRExpr* fold_Expr_WRK ( IRExpr** env, IRExpr* e ) { Int shift; IRExpr* e2 = e; /* e2 is the result of folding e, if possible */ @@ -2460,7 +2461,7 @@ static IRExpr* fold_Expr ( IRExpr** env, IRExpr* e ) && !debug_only_hack_sameIRExprs_might_assert(e->Iex.Binop.arg1, e->Iex.Binop.arg2) && sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) { - vex_printf("vex iropt: fold_Expr: no ident rule for: "); + vex_printf("vex iropt: fold_Expr_WRK: no ident rule for: "); ppIRExpr(e); vex_printf("\n"); } @@ -2481,7 +2482,7 @@ static IRExpr* fold_Expr ( IRExpr** env, IRExpr* e ) vpanic("fold_Expr: no rule for the above"); # else if (vex_control.iropt_verbosity > 0) { - vex_printf("vex iropt: fold_Expr: no const rule for: "); + vex_printf("vex iropt: fold_Expr_WRK: no const rule for: "); ppIRExpr(e); vex_printf("\n"); } @@ -2489,6 +2490,14 @@ static IRExpr* fold_Expr ( IRExpr** env, IRExpr* e ) # endif } +/* Fold |e| as much as possible, given the bindings in |env|. If no folding is + possible, just return |e|. Also, if |env| is NULL, don't even try to + fold; just return |e| directly. */ +inline +static IRExpr* fold_Expr ( IRExpr** env, IRExpr* e ) +{ + return env == NULL ? e : fold_Expr_WRK(env, e); +} /* Apply the subst to a simple 1-level expression -- guaranteed to be 1-level due to previous flattening pass. */ @@ -2604,32 +2613,36 @@ static IRExpr* subst_Expr ( IRExpr** env, IRExpr* ex ) } -/* Apply the subst to stmt, then fold the result as much as possible. - Much simplified due to stmt being previously flattened. As a - result of this, the stmt may wind up being turned into a no-op. +/* Apply the subst to stmt, then, if |doFolding| is |True|, fold the result as + much as possible. Much simplified due to stmt being previously flattened. + As a result of this, the stmt may wind up being turned into a no-op. */ -static IRStmt* subst_and_fold_Stmt ( IRExpr** env, IRStmt* st ) +static IRStmt* subst_and_maybe_fold_Stmt ( Bool doFolding, + IRExpr** env, IRStmt* st ) { # if 0 - vex_printf("\nsubst and fold stmt\n"); + vex_printf("\nsubst and maybe fold stmt\n"); ppIRStmt(st); vex_printf("\n"); # endif + IRExpr** s_env = env; + IRExpr** f_env = doFolding ? env : NULL; + switch (st->tag) { case Ist_AbiHint: vassert(isIRAtom(st->Ist.AbiHint.base)); vassert(isIRAtom(st->Ist.AbiHint.nia)); return IRStmt_AbiHint( - fold_Expr(env, subst_Expr(env, st->Ist.AbiHint.base)), + fold_Expr(f_env, subst_Expr(s_env, st->Ist.AbiHint.base)), st->Ist.AbiHint.len, - fold_Expr(env, subst_Expr(env, st->Ist.AbiHint.nia)) + fold_Expr(f_env, subst_Expr(s_env, st->Ist.AbiHint.nia)) ); case Ist_Put: vassert(isIRAtom(st->Ist.Put.data)); return IRStmt_Put( st->Ist.Put.offset, - fold_Expr(env, subst_Expr(env, st->Ist.Put.data)) + fold_Expr(f_env, subst_Expr(s_env, st->Ist.Put.data)) ); case Ist_PutI: { @@ -2638,9 +2651,9 @@ static IRStmt* subst_and_fold_Stmt ( IRExpr** env, IRStmt* st ) vassert(isIRAtom(puti->ix)); vassert(isIRAtom(puti->data)); puti2 = mkIRPutI(puti->descr, - fold_Expr(env, subst_Expr(env, puti->ix)), + fold_Expr(f_env, subst_Expr(s_env, puti->ix)), puti->bias, - fold_Expr(env, subst_Expr(env, puti->data))); + fold_Expr(f_env, subst_Expr(s_env, puti->data))); return IRStmt_PutI(puti2); } @@ -2649,7 +2662,7 @@ static IRStmt* subst_and_fold_Stmt ( IRExpr** env, IRStmt* st ) allowed to be more than just a constant or a tmp. */ return IRStmt_WrTmp( st->Ist.WrTmp.tmp, - fold_Expr(env, subst_Expr(env, st->Ist.WrTmp.data)) + fold_Expr(f_env, subst_Expr(s_env, st->Ist.WrTmp.data)) ); case Ist_Store: @@ -2657,8 +2670,8 @@ static IRStmt* subst_and_fold_Stmt ( IRExpr** env, IRStmt* st ) vassert(isIRAtom(st->Ist.Store.data)); return IRStmt_Store( st->Ist.Store.end, - fold_Expr(env, subst_Expr(env, st->Ist.Store.addr)), - fold_Expr(env, subst_Expr(env, st->Ist.Store.data)) + fold_Expr(f_env, subst_Expr(s_env, st->Ist.Store.addr)), + fold_Expr(f_env, subst_Expr(s_env, st->Ist.Store.data)) ); case Ist_StoreG: { @@ -2666,9 +2679,9 @@ static IRStmt* subst_and_fold_Stmt ( IRExpr** env, IRStmt* st ) vassert(isIRAtom(sg->addr)); vassert(isIRAtom(sg->data)); vassert(isIRAtom(sg->guard)); - IRExpr* faddr = fold_Expr(env, subst_Expr(env, sg->addr)); - IRExpr* fdata = fold_Expr(env, subst_Expr(env, sg->data)); - IRExpr* fguard = fold_Expr(env, subst_Expr(env, sg->guard)); + IRExpr* faddr = fold_Expr(f_env, subst_Expr(s_env, sg->addr)); + IRExpr* fdata = fold_Expr(f_env, subst_Expr(s_env, sg->data)); + IRExpr* fguard = fold_Expr(f_env, subst_Expr(s_env, sg->guard)); if (fguard->tag == Iex_Const) { /* The condition on this store has folded down to a constant. */ vassert(fguard->Iex.Const.con->tag == Ico_U1); @@ -2693,9 +2706,9 @@ static IRStmt* subst_and_fold_Stmt ( IRExpr** env, IRStmt* st ) vassert(isIRAtom(lg->addr)); vassert(isIRAtom(lg->alt)); vassert(isIRAtom(lg->guard)); - IRExpr* faddr = fold_Expr(env, subst_Expr(env, lg->addr)); - IRExpr* falt = fold_Expr(env, subst_Expr(env, lg->alt)); - IRExpr* fguard = fold_Expr(env, subst_Expr(env, lg->guard)); + IRExpr* faddr = fold_Expr(f_env, subst_Expr(s_env, lg->addr)); + IRExpr* falt = fold_Expr(f_env, subst_Expr(s_env, lg->alt)); + IRExpr* fguard = fold_Expr(f_env, subst_Expr(s_env, lg->guard)); if (fguard->tag == Iex_Const) { /* The condition on this load has folded down to a constant. */ vassert(fguard->Iex.Const.con->tag == Ico_U1); @@ -2727,13 +2740,15 @@ static IRStmt* subst_and_fold_Stmt ( IRExpr** env, IRStmt* st ) vassert(isIRAtom(cas->dataLo)); cas2 = mkIRCAS( cas->oldHi, cas->oldLo, cas->end, - fold_Expr(env, subst_Expr(env, cas->addr)), - cas->expdHi ? fold_Expr(env, subst_Expr(env, cas->expdHi)) + fold_Expr(f_env, subst_Expr(s_env, cas->addr)), + cas->expdHi ? fold_Expr(f_env, + subst_Expr(s_env, cas->expdHi)) : NULL, - fold_Expr(env, subst_Expr(env, cas->expdLo)), - cas->dataHi ? fold_Expr(env, subst_Expr(env, cas->dataHi)) + fold_Expr(f_env, subst_Expr(s_env, cas->expdLo)), + cas->dataHi ? fold_Expr(f_env, + subst_Expr(s_env, cas->dataHi)) : NULL, - fold_Expr(env, subst_Expr(env, cas->dataLo)) + fold_Expr(f_env, subst_Expr(s_env, cas->dataLo)) ); return IRStmt_CAS(cas2); } @@ -2745,9 +2760,10 @@ static IRStmt* subst_and_fold_Stmt ( IRExpr** env, IRStmt* st ) return IRStmt_LLSC( st->Ist.LLSC.end, st->Ist.LLSC.result, - fold_Expr(env, subst_Expr(env, st->Ist.LLSC.addr)), + fold_Expr(f_env, subst_Expr(s_env, st->Ist.LLSC.addr)), st->Ist.LLSC.storedata - ? fold_Expr(env, subst_Expr(env, st->Ist.LLSC.storedata)) + ? fold_Expr(f_env, + subst_Expr(s_env, st->Ist.LLSC.storedata)) : NULL ); @@ -2760,15 +2776,15 @@ static IRStmt* subst_and_fold_Stmt ( IRExpr** env, IRStmt* st ) d2->args = shallowCopyIRExprVec(d2->args); if (d2->mFx != Ifx_None) { vassert(isIRAtom(d2->mAddr)); - d2->mAddr = fold_Expr(env, subst_Expr(env, d2->mAddr)); + d2->mAddr = fold_Expr(f_env, subst_Expr(s_env, d2->mAddr)); } vassert(isIRAtom(d2->guard)); - d2->guard = fold_Expr(env, subst_Expr(env, d2->guard)); + d2->guard = fold_Expr(f_env, subst_Expr(s_env, d2->guard)); for (i = 0; d2->args[i]; i++) { IRExpr* arg = d2->args[i]; if (LIKELY(!is_IRExpr_VECRET_or_GSPTR(arg))) { vassert(isIRAtom(arg)); - d2->args[i] = fold_Expr(env, subst_Expr(env, arg)); + d2->args[i] = fold_Expr(f_env, subst_Expr(s_env, arg)); } } return IRStmt_Dirty(d2); @@ -2788,7 +2804,7 @@ static IRStmt* subst_and_fold_Stmt ( IRExpr** env, IRStmt* st ) case Ist_Exit: { IRExpr* fcond; vassert(isIRAtom(st->Ist.Exit.guard)); - fcond = fold_Expr(env, subst_Expr(env, st->Ist.Exit.guard)); + fcond = fold_Expr(f_env, subst_Expr(s_env, st->Ist.Exit.guard)); if (fcond->tag == Iex_Const) { /* Interesting. The condition on this exit has folded down to a constant. */ @@ -2819,7 +2835,8 @@ static IRStmt* subst_and_fold_Stmt ( IRExpr** env, IRStmt* st ) } -static IRSB* cprop_BB_wrk ( IRSB* in, Bool mustRetainNoOps ) +__attribute__((noinline)) +static IRSB* cprop_BB_WRK ( IRSB* in, Bool mustRetainNoOps, Bool doFolding ) { Int i; IRSB* out; @@ -2856,7 +2873,7 @@ static IRSB* cprop_BB_wrk ( IRSB* in, Bool mustRetainNoOps ) /* perhaps st2 is already a no-op? */ if (st2->tag == Ist_NoOp && !mustRetainNoOps) continue; - st2 = subst_and_fold_Stmt( env, st2 ); + st2 = subst_and_maybe_fold_Stmt( doFolding, env, st2 ); /* Deal with some post-folding special cases. */ switch (st2->tag) { @@ -2899,7 +2916,7 @@ static IRSB* cprop_BB_wrk ( IRSB* in, Bool mustRetainNoOps ) IRExpr* guard = lg->guard; if (guard->tag == Iex_Const) { /* The guard has folded to a constant, and that - constant must be 1:I1, since subst_and_fold_Stmt + constant must be 1:I1, since subst_and_maybe_fold_Stmt folds out the case 0:I1 by itself. */ vassert(guard->Iex.Const.con->tag == Ico_U1); vassert(guard->Iex.Const.con->Ico.U1 == True); @@ -2987,7 +3004,7 @@ static IRSB* cprop_BB_wrk ( IRSB* in, Bool mustRetainNoOps ) IRSB* cprop_BB ( IRSB* in ) { - return cprop_BB_wrk(in, /*mustRetainNoOps=*/False); + return cprop_BB_WRK(in, /*mustRetainNoOps=*/False, /*doFolding=*/True); } @@ -6654,23 +6671,18 @@ IRSB* do_iropt_BB( static Int n_expensive = 0; Bool hasGetIorPutI, hasVorFtemps; - IRSB *bb, *bb2; n_total++; - /* First flatten the block out, since all other - phases assume flat code. */ - // FIXME this is no longer necessary, since minimal_iropt should have - // flattened it - bb = flatten_BB ( bb0 ); - - if (iropt_verbose) { - vex_printf("\n========= FLAT\n\n" ); - ppIRSB(bb); - } + /* Flatness: this function assumes that the incoming block is already flat. + That's because all blocks that arrive here should already have been + processed by do_minimal_initial_iropt_BB. And that will have flattened + them out. */ + // FIXME Remove this assertion once the 'grail' machinery seems stable + vassert(isFlatIRSB(bb0)); /* If at level 0, stop now. */ - if (vex_control.iropt_level <= 0) return bb; + if (vex_control.iropt_level <= 0) return bb0; /* Now do a preliminary cleanup pass, and figure out if we also need to do 'expensive' optimisations. Expensive optimisations @@ -6678,7 +6690,8 @@ IRSB* do_iropt_BB( If needed, do expensive transformations and then another cheap cleanup pass. */ - bb = cheap_transformations( bb, specHelper, preciseMemExnsFn, pxControl ); + IRSB* bb = cheap_transformations( bb0, specHelper, + preciseMemExnsFn, pxControl ); if (guest_arch == VexArchARM) { /* Translating Thumb2 code produces a lot of chaff. We have to @@ -6733,7 +6746,7 @@ IRSB* do_iropt_BB( /* Now have a go at unrolling simple (single-BB) loops. If successful, clean up the results as much as possible. */ - bb2 = maybe_loop_unroll_BB( bb, guest_addr ); + IRSB* bb2 = maybe_loop_unroll_BB( bb, guest_addr ); if (bb2) { bb = cheap_transformations( bb2, specHelper, preciseMemExnsFn, pxControl ); @@ -6754,22 +6767,7 @@ IRSB* do_iropt_BB( return bb; } -//static Bool alwaysPrecise ( Int minoff, Int maxoff, -// VexRegisterUpdates pxControl ) -//{ -// return True; -//} - -// FIXME make this as cheap as possible -IRSB* do_minimal_initial_iropt_BB( - IRSB* bb0 - //IRExpr* (*specHelper) (const HChar*, IRExpr**, IRStmt**, Int), - //Bool (*preciseMemExnsFn)(Int,Int,VexRegisterUpdates), - //VexRegisterUpdates pxControl, - //Addr guest_addr, - //VexArch guest_arch - ) -{ +IRSB* do_minimal_initial_iropt_BB(IRSB* bb0) { /* First flatten the block out, since all other phases assume flat code. */ IRSB* bb = flatten_BB ( bb0 ); @@ -6778,9 +6776,12 @@ IRSB* do_minimal_initial_iropt_BB( ppIRSB(bb); } + // Remove redundant GETs redundant_get_removal_BB ( bb ); - bb = cprop_BB_wrk ( bb, /*mustRetainNoOps=*/True ); // FIXME - // This is overkill. We only really want constant prop, not folding + + // Do minimal constant prop: copy prop and constant prop only. No folding. + bb = cprop_BB_WRK ( bb, /*mustRetainNoOps=*/True, + /*doFolding=*/False ); // Minor tidying of the block end, to remove a redundant Put of the IP right // at the end: diff --git a/VEX/pub/libvex_ir.h b/VEX/pub/libvex_ir.h index 9120a49..6a854e4 100644 --- a/VEX/pub/libvex_ir.h +++ b/VEX/pub/libvex_ir.h @@ -2388,7 +2388,7 @@ IRExpr* mkIRExprCCall ( IRType retty, /* Convenience functions for atoms (IRExprs which are either Iex_Tmp or * Iex_Const). */ static inline Bool isIRAtom ( const IRExpr* e ) { - return toBool(e->tag == Iex_RdTmp || e->tag == Iex_Const); + return e->tag == Iex_RdTmp || e->tag == Iex_Const; } /* Are these two IR atoms identical? Causes an assertion @@ -3195,6 +3195,7 @@ extern void sanityCheckIRSB ( const IRSB* bb, Bool require_flatness, IRType guest_word_size ); extern Bool isFlatIRStmt ( const IRStmt* ); +extern Bool isFlatIRSB ( const IRSB* ); /* Is this any value actually in the enumeration 'IRType' ? */ extern Bool isPlausibleIRType ( IRType ty ); |
|
From: Julian S. <se...@so...> - 2020-01-02 06:29:04
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=afbebbf6abbbd78800f5577fe52f34e28bfd8ed8 commit afbebbf6abbbd78800f5577fe52f34e28bfd8ed8 Author: Julian Seward <js...@ac...> Date: Tue Nov 19 08:20:31 2019 +0100 Add a change that should have been part of 6e4db6e9172a55a983105c8e73c89987ce97308a. Diff: --- VEX/priv/main_main.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/VEX/priv/main_main.c b/VEX/priv/main_main.c index eb77bfd..0da2b46 100644 --- a/VEX/priv/main_main.c +++ b/VEX/priv/main_main.c @@ -191,8 +191,7 @@ void LibVEX_default_VexControl ( /*OUT*/ VexControl* vcon ) vcon->iropt_register_updates_default = VexRegUpdUnwindregsAtMemAccess; vcon->iropt_unroll_thresh = 120; vcon->guest_max_insns = 60; - vcon->guest_chase_thresh = 10; - vcon->guest_chase_cond = False; + vcon->guest_chase = True; vcon->regalloc_version = 3; } @@ -229,10 +228,7 @@ void LibVEX_Init ( vassert(vcon->iropt_unroll_thresh <= 400); vassert(vcon->guest_max_insns >= 1); vassert(vcon->guest_max_insns <= 100); - vassert(vcon->guest_chase_thresh >= 0); - vassert(vcon->guest_chase_thresh < vcon->guest_max_insns); - vassert(vcon->guest_chase_cond == True - || vcon->guest_chase_cond == False); + vassert(vcon->guest_chase == False || vcon->guest_chase == True); vassert(vcon->regalloc_version == 2 || vcon->regalloc_version == 3); /* Check that Vex has been built with sizes of basic types as |
|
From: Julian S. <se...@so...> - 2020-01-02 06:28:49
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=0d8e6482be47f29a8527e9f8da16c3689f954c2f commit 0d8e6482be47f29a8527e9f8da16c3689f954c2f Author: Julian Seward <js...@ac...> Date: Wed Nov 13 15:45:11 2019 +0100 insn_has_no_other_exits_or_PUTs_to_PC: also check Ist_PutI and Ist_Dirty for writes to the PC. Diff: --- VEX/priv/guest_generic_bb_to_IR.c | 38 +++++++++++++++++++++++++++++++++++--- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/VEX/priv/guest_generic_bb_to_IR.c b/VEX/priv/guest_generic_bb_to_IR.c index 67e506f..3da99bc 100644 --- a/VEX/priv/guest_generic_bb_to_IR.c +++ b/VEX/priv/guest_generic_bb_to_IR.c @@ -622,9 +622,8 @@ static Bool any_overlap ( Int start1, Int len1, Int start2, Int len2 ) words, it scans backwards through some prefix of an instruction's IR to see if there is an exit there. - It also checks for explicit PUTs to the PC. - - FIXME: also check PutI and dirty helper calls for such PUTs. */ + It also checks for explicit PUTs to the PC, via Ist_Put, Ist_PutI or + Ist_Dirty. I suspect this is ridiculous overkill, but is here for safety. */ static Bool insn_has_no_other_exits_or_PUTs_to_PC ( IRStmt** const stmts, Int scan_start, Int offB_GUEST_IP, Int szB_GUEST_IP, @@ -654,6 +653,39 @@ static Bool insn_has_no_other_exits_or_PUTs_to_PC ( break; } } + if (st->tag == Ist_PutI) { + const IRPutI* details = st->Ist.PutI.details; + const IRRegArray* descr = details->descr; + Int offB = descr->base; + Int szB = descr->nElems * sizeofIRType(descr->elemTy); + if (any_overlap(offB, szB, offB_GUEST_IP, szB_GUEST_IP)) { + found_PUT_to_PC = True; + break; + } + } + if (st->tag == Ist_Dirty) { + vassert(!found_PUT_to_PC); + const IRDirty* details = st->Ist.Dirty.details; + for (Int j = 0; j < details->nFxState; j++) { + const IREffect fx = details->fxState[j].fx; + const Int offset = details->fxState[j].offset; + const Int size = details->fxState[j].size; + const Int nRepeats = details->fxState[j].nRepeats; + const Int repeatLen = details->fxState[j].repeatLen; + if (fx == Ifx_Write || fx == Ifx_Modify) { + for (Int k = 0; k < nRepeats; k++) { + Int offB = offset + k * repeatLen; + Int szB = size; + if (any_overlap(offB, szB, offB_GUEST_IP, szB_GUEST_IP)) { + found_PUT_to_PC = True; + } + } + } + } + if (found_PUT_to_PC) { + break; + } + } i--; } // We expect IR for all instructions to start with an IMark. |
|
From: Julian S. <se...@so...> - 2020-01-02 06:28:49
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=56e04256a7a317d15240677e5bd322c0a039f66b commit 56e04256a7a317d15240677e5bd322c0a039f66b Author: Julian Seward <js...@ac...> Date: Mon Nov 18 19:12:49 2019 +0100 Rationalise --vex-guest* flags in the new IRSB construction framework * removes --vex-guest-chase-cond=no|yes. This was never used in practice. * rename --vex-guest-chase-thresh=<0..99> to --vex-guest-chase=no|yes. In otherwords, downgrade it from a numeric flag to a boolean one, that can simply disable all chasing if required. (Some tools, notably Callgrind, force-disable block chasing, so this functionality at least needs to be retained). Diff: --- VEX/priv/guest_generic_bb_to_IR.c | 5 ++--- VEX/pub/libvex.h | 17 ++++++----------- callgrind/main.c | 10 +++++----- coregrind/m_main.c | 19 +++---------------- docs/xml/manual-core-adv.xml | 3 +-- exp-bbv/bbv_main.c | 4 ++-- exp-sgcheck/pc_main.c | 2 +- helgrind/hg_main.c | 9 ++++----- none/tests/cmdline2.stdout.exp | 3 +-- 9 files changed, 25 insertions(+), 47 deletions(-) diff --git a/VEX/priv/guest_generic_bb_to_IR.c b/VEX/priv/guest_generic_bb_to_IR.c index 3da99bc..16d83a8 100644 --- a/VEX/priv/guest_generic_bb_to_IR.c +++ b/VEX/priv/guest_generic_bb_to_IR.c @@ -1237,8 +1237,7 @@ IRSB* bb_to_IR ( vassert(sizeof(HWord) == sizeof(void*)); vassert(vex_control.guest_max_insns >= 1); vassert(vex_control.guest_max_insns <= 100); - vassert(vex_control.guest_chase_thresh >= 0); - vassert(vex_control.guest_chase_thresh < vex_control.guest_max_insns); + vassert(vex_control.guest_chase == False || vex_control.guest_chase == True); vassert(guest_word_type == Ity_I32 || guest_word_type == Ity_I64); if (guest_word_type == Ity_I32) { @@ -1324,7 +1323,7 @@ IRSB* bb_to_IR ( // Reasons to give up immediately: // User or tool asked us not to chase - if (vex_control.guest_chase_thresh == 0) + if (!vex_control.guest_chase) break; // Out of extent slots diff --git a/VEX/pub/libvex.h b/VEX/pub/libvex.h index 5a76066..5a6a0e8 100644 --- a/VEX/pub/libvex.h +++ b/VEX/pub/libvex.h @@ -512,17 +512,12 @@ typedef BBs longer than this are split up. Default=60 (guest insns). */ Int guest_max_insns; - /* How aggressive should front ends be in following - unconditional branches to known destinations? Default=10, - meaning that if a block contains less than 10 guest insns so - far, the front end(s) will attempt to chase into its - successor. A setting of zero disables chasing. */ - // FIXME change this to a Bool - Int guest_chase_thresh; - /* EXPERIMENTAL: chase across conditional branches? Not all - front ends honour this. Default: NO. */ - // FIXME remove this completely. - Bool guest_chase_cond; + /* Should Vex try to construct superblocks, by chasing unconditional + branches/calls to known destinations, and performing AND/OR idiom + recognition? It is recommended to set this to True as that possibly + improves performance a bit, and also is important for avoiding certain + kinds of false positives in Memcheck. Default=True. */ + Bool guest_chase; /* Register allocator version. Allowed values are: - '2': previous, good and slow implementation. - '3': current, faster implementation; perhaps producing slightly worse diff --git a/callgrind/main.c b/callgrind/main.c index 47369d1..904eb42 100644 --- a/callgrind/main.c +++ b/callgrind/main.c @@ -2062,11 +2062,11 @@ void CLG_(post_clo_init)(void) "=> resetting it back to 0\n"); VG_(clo_vex_control).iropt_unroll_thresh = 0; // cannot be overridden. } - if (VG_(clo_vex_control).guest_chase_thresh != 0) { + if (VG_(clo_vex_control).guest_chase) { VG_(message)(Vg_UserMsg, - "callgrind only works with --vex-guest-chase-thresh=0\n" - "=> resetting it back to 0\n"); - VG_(clo_vex_control).guest_chase_thresh = 0; // cannot be overridden. + "callgrind only works with --vex-guest-chase=no\n" + "=> resetting it back to 'no'\n"); + VG_(clo_vex_control).guest_chase = False; // cannot be overridden. } CLG_DEBUG(1, " dump threads: %s\n", CLG_(clo).separate_threads ? "Yes":"No"); @@ -2120,7 +2120,7 @@ void CLG_(pre_clo_init)(void) = VexRegUpdSpAtMemAccess; // overridable by the user. VG_(clo_vex_control).iropt_unroll_thresh = 0; // cannot be overridden. - VG_(clo_vex_control).guest_chase_thresh = 0; // cannot be overridden. + VG_(clo_vex_control).guest_chase = False; // cannot be overridden. VG_(basic_tool_funcs) (CLG_(post_clo_init), CLG_(instrument), diff --git a/coregrind/m_main.c b/coregrind/m_main.c index 6987236..c182fd9 100644 --- a/coregrind/m_main.c +++ b/coregrind/m_main.c @@ -271,8 +271,7 @@ static void usage_NORETURN ( int need_help ) " --vex-iropt-level=<0..2> [2]\n" " --vex-iropt-unroll-thresh=<0..400> [120]\n" " --vex-guest-max-insns=<1..100> [50]\n" -" --vex-guest-chase-thresh=<0..99> [10]\n" -" --vex-guest-chase-cond=no|yes [no]\n" +" --vex-guest-chase=no|yes [yes]\n" " Precise exception control. Possible values for 'mode' are as follows\n" " and specify the minimum set of registers guaranteed to be correct\n" " immediately prior to memory access instructions:\n" @@ -723,10 +722,8 @@ static void process_option (Clo_Mode mode, VG_(clo_vex_control).iropt_unroll_thresh, 0, 400) {} else if VG_BINT_CLO(arg, "--vex-guest-max-insns", VG_(clo_vex_control).guest_max_insns, 1, 100) {} - else if VG_BINT_CLO(arg, "--vex-guest-chase-thresh", - VG_(clo_vex_control).guest_chase_thresh, 0, 99) {} - else if VG_BOOL_CLO(arg, "--vex-guest-chase-cond", - VG_(clo_vex_control).guest_chase_cond) {} + else if VG_BOOL_CLO(arg, "--vex-guest-chase", + VG_(clo_vex_control).guest_chase) {} else if VG_INT_CLO(arg, "--log-fd", pos->tmp_log_fd) { pos->log_to = VgLogTo_Fd; @@ -974,16 +971,6 @@ void main_process_cmd_line_options( void ) if (VG_(clo_vgdb_prefix) == NULL) VG_(clo_vgdb_prefix) = VG_(vgdb_prefix_default)(); - /* Make VEX control parameters sane */ - - if (VG_(clo_vex_control).guest_chase_thresh - >= VG_(clo_vex_control).guest_max_insns) - VG_(clo_vex_control).guest_chase_thresh - = VG_(clo_vex_control).guest_max_insns - 1; - - if (VG_(clo_vex_control).guest_chase_thresh < 0) - VG_(clo_vex_control).guest_chase_thresh = 0; - /* Check various option values */ if (VG_(clo_verbosity) < 0) diff --git a/docs/xml/manual-core-adv.xml b/docs/xml/manual-core-adv.xml index 11afbeb..362b916 100644 --- a/docs/xml/manual-core-adv.xml +++ b/docs/xml/manual-core-adv.xml @@ -1563,8 +1563,7 @@ $3 = {lwpid = 0x4688, threadgroup = 0x4688, parent = 0x0, (gdb) p vex_control $5 = {iropt_verbosity = 0, iropt_level = 2, iropt_register_updates = VexRegUpdUnwindregsAtMemAccess, - iropt_unroll_thresh = 120, guest_max_insns = 60, guest_chase_thresh = 10, - guest_chase_cond = 0 '\000'} + iropt_unroll_thresh = 120, guest_max_insns = 60, guest_chase_thresh = 10} (gdb) ]]></screen> </listitem> diff --git a/exp-bbv/bbv_main.c b/exp-bbv/bbv_main.c index 438edbf..e632ea1 100644 --- a/exp-bbv/bbv_main.c +++ b/exp-bbv/bbv_main.c @@ -514,8 +514,8 @@ static void bbv_post_clo_init(void) /* Try a closer approximation of basic blocks */ /* This is the same as the command line option */ - /* --vex-guest-chase-thresh=0 */ - VG_(clo_vex_control).guest_chase_thresh = 0; + /* --vex-guest-chase=no */ + VG_(clo_vex_control).guest_chase = False; } /* Parse the command line options */ diff --git a/exp-sgcheck/pc_main.c b/exp-sgcheck/pc_main.c index 93f4b1e..fc13d6a 100644 --- a/exp-sgcheck/pc_main.c +++ b/exp-sgcheck/pc_main.c @@ -147,7 +147,7 @@ static void pc_pre_clo_init(void) sg_pre_clo_init(); VG_(clo_vex_control).iropt_unroll_thresh = 0; - VG_(clo_vex_control).guest_chase_thresh = 0; + VG_(clo_vex_control).guest_chase = False; } VG_DETERMINE_INTERFACE_VERSION(pc_pre_clo_init) diff --git a/helgrind/hg_main.c b/helgrind/hg_main.c index ddf582f..8b8dd05 100644 --- a/helgrind/hg_main.c +++ b/helgrind/hg_main.c @@ -5954,14 +5954,13 @@ static void hg_post_clo_init ( void ) { Thr* hbthr_root; - if (HG_(clo_delta_stacktrace) - && VG_(clo_vex_control).guest_chase_thresh != 0) { + if (HG_(clo_delta_stacktrace) && VG_(clo_vex_control).guest_chase) { if (VG_(clo_verbosity) >= 2) VG_(message)(Vg_UserMsg, "helgrind --delta-stacktrace=yes only works with " - "--vex-guest-chase-thresh=0\n" - "=> (re-setting it to 0)\n"); - VG_(clo_vex_control).guest_chase_thresh = 0; + "--vex-guest-chase=no\n" + "=> (re-setting it to 'no')\n"); + VG_(clo_vex_control).guest_chase = False; } diff --git a/none/tests/cmdline2.stdout.exp b/none/tests/cmdline2.stdout.exp index cfa3060..9e8e3df 100644 --- a/none/tests/cmdline2.stdout.exp +++ b/none/tests/cmdline2.stdout.exp @@ -184,8 +184,7 @@ usage: valgrind [options] prog-and-args --vex-iropt-level=<0..2> [2] --vex-iropt-unroll-thresh=<0..400> [120] --vex-guest-max-insns=<1..100> [50] - --vex-guest-chase-thresh=<0..99> [10] - --vex-guest-chase-cond=no|yes [no] + --vex-guest-chase=no|yes [yes] Precise exception control. Possible values for 'mode' are as follows and specify the minimum set of registers guaranteed to be correct immediately prior to memory access instructions: |
|
From: Julian S. <se...@so...> - 2020-01-02 06:28:34
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=7ed802d0f87b7d24d0c48dfd4d57dda138cb5012 commit 7ed802d0f87b7d24d0c48dfd4d57dda138cb5012 Author: Julian Seward <js...@ac...> Date: Tue Nov 12 20:16:54 2019 +0100 analyse_block_end: tidy this up .. .. and check more carefully for unexpected control flow in the blocks being analysed. Diff: --- VEX/priv/guest_generic_bb_to_IR.c | 133 +++++++++++++++++++++++++------------- 1 file changed, 87 insertions(+), 46 deletions(-) diff --git a/VEX/priv/guest_generic_bb_to_IR.c b/VEX/priv/guest_generic_bb_to_IR.c index 7782bcf..67e506f 100644 --- a/VEX/priv/guest_generic_bb_to_IR.c +++ b/VEX/priv/guest_generic_bb_to_IR.c @@ -509,9 +509,9 @@ static void add_guarded_stmt_to_end_of ( /*MOD*/IRSB* bb, typedef enum { - Be_Unknown=1, // Unknown end - Be_UnCond, // Unconditional branch to known destination, unassisted - Be_Cond // Conditional branch to known destinations, unassisted + Be_Other=1, // Block end isn't of interest to us + Be_Uncond, // Unconditional branch to known destination, unassisted + Be_Cond // Conditional branch to known destinations, unassisted } BlockEndTag; @@ -520,10 +520,10 @@ typedef BlockEndTag tag; union { struct { - } Unknown; + } Other; struct { Long delta; - } UnCond; + } Uncond; struct { IRTemp condSX; Long deltaSX; @@ -536,11 +536,11 @@ typedef static void ppBlockEnd ( const BlockEnd* be ) { switch (be->tag) { - case Be_Unknown: - vex_printf("!!Unknown!!"); + case Be_Other: + vex_printf("Other"); break; - case Be_UnCond: - vex_printf("UnCond{delta=%lld}", be->Be.UnCond.delta); + case Be_Uncond: + vex_printf("Uncond{delta=%lld}", be->Be.Uncond.delta); break; case Be_Cond: vex_printf("Cond{condSX="); @@ -558,11 +558,28 @@ static void ppBlockEnd ( const BlockEnd* be ) static Bool definitely_does_not_jump_to_delta ( const BlockEnd* be, Long delta ) { switch (be->tag) { - case Be_Unknown: return False; - case Be_UnCond: return be->Be.UnCond.delta != delta; - case Be_Cond: return be->Be.Cond.deltaSX != delta - && be->Be.Cond.deltaFT != delta; - default: vassert(0); + case Be_Other: + return False; + case Be_Uncond: + return be->Be.Uncond.delta != delta; + case Be_Cond: + return be->Be.Cond.deltaSX != delta && be->Be.Cond.deltaFT != delta; + default: + vassert(0); + } +} + +static Addr irconst_to_Addr ( const IRConst* con, const IRType guest_word_type ) +{ + switch (con->tag) { + case Ico_U32: + vassert(guest_word_type == Ity_I32); + return con->Ico.U32; + case Ico_U64: + vassert(guest_word_type == Ity_I64); + return con->Ico.U64; + default: + vassert(0); } } @@ -578,19 +595,7 @@ static Bool irconst_to_maybe_delta ( /*OUT*/Long* delta, *delta = 0; // Extract the destination guest address. - Addr dst_ga = 0; - switch (known_dst->tag) { - case Ico_U32: - vassert(guest_word_type == Ity_I32); - dst_ga = known_dst->Ico.U32; - break; - case Ico_U64: - vassert(guest_word_type == Ity_I64); - dst_ga = known_dst->Ico.U64; - break; - default: - vassert(0); - } + Addr dst_ga = irconst_to_Addr(known_dst, guest_word_type); // Check we're allowed to chase into it. if (!chase_into_ok(callback_opaque, dst_ga)) @@ -603,38 +608,67 @@ static Bool irconst_to_maybe_delta ( /*OUT*/Long* delta, return True; } +static Bool any_overlap ( Int start1, Int len1, Int start2, Int len2 ) +{ + vassert(len1 > 0 && len2 > 0); + vassert(start1 >= 0 && start2 >= 0); + if (start1 + len1 <= start2) return False; + if (start2 + len2 <= start1) return False; + return True; +} + /* Scan |stmts|, starting at |scan_start| and working backwards, to detect the case where there are no IRStmt_Exits before we find the IMark. In other words, it scans backwards through some prefix of an instruction's IR to see - if there is an exit there. */ -static Bool insn_has_no_other_exits ( IRStmt** const stmts, Int scan_start ) + if there is an exit there. + + It also checks for explicit PUTs to the PC. + + FIXME: also check PutI and dirty helper calls for such PUTs. */ +static Bool insn_has_no_other_exits_or_PUTs_to_PC ( + IRStmt** const stmts, Int scan_start, + Int offB_GUEST_IP, Int szB_GUEST_IP, + const IRTypeEnv* tyenv + ) { Bool found_exit = False; + Bool found_PUT_to_PC = False; Int i = scan_start; while (True) { if (i < 0) break; const IRStmt* st = stmts[i]; - if (st->tag == Ist_IMark) + if (st->tag == Ist_IMark) { + // We're back at the start of the insn. Stop searching. break; + } if (st->tag == Ist_Exit) { found_exit = True; break; } + if (st->tag == Ist_Put) { + Int offB = st->Ist.Put.offset; + Int szB = sizeofIRType(typeOfIRExpr(tyenv, st->Ist.Put.data)); + if (any_overlap(offB, szB, offB_GUEST_IP, szB_GUEST_IP)) { + found_PUT_to_PC = True; + break; + } + } i--; } // We expect IR for all instructions to start with an IMark. vassert(i >= 0); - return !found_exit; + return !found_exit && !found_PUT_to_PC; } -// FIXME make this able to recognise all block ends static void analyse_block_end ( /*OUT*/BlockEnd* be, const IRSB* irsb, const Addr guest_IP_sbstart, const IRType guest_word_type, Bool (*chase_into_ok)(void*,Addr), void* callback_opaque, - Bool debug_print ) + Int offB_GUEST_IP, + Int szB_GUEST_IP, + Bool debug_print ) { vex_bzero(be, sizeof(*be)); @@ -657,7 +691,9 @@ static void analyse_block_end ( /*OUT*/BlockEnd* be, const IRSB* irsb, && maybe_exit->Ist.Exit.guard->tag == Iex_RdTmp && maybe_exit->Ist.Exit.jk == Ijk_Boring && irsb->next->tag == Iex_Const - && insn_has_no_other_exits(irsb->stmts, irsb->stmts_used - 2)) { + && insn_has_no_other_exits_or_PUTs_to_PC( + irsb->stmts, irsb->stmts_used - 2, + offB_GUEST_IP, szB_GUEST_IP, irsb->tyenv)) { vassert(maybe_exit->Ist.Exit.offsIP == irsb->offsIP); IRConst* dst_SX = maybe_exit->Ist.Exit.dst; IRConst* dst_FT = irsb->next->Iex.Const.con; @@ -692,7 +728,9 @@ static void analyse_block_end ( /*OUT*/BlockEnd* be, const IRSB* irsb, */ if ((irsb->jumpkind == Ijk_Boring || irsb->jumpkind == Ijk_Call) && irsb->next->tag == Iex_Const) { - if (insn_has_no_other_exits(irsb->stmts, irsb->stmts_used - 1)) { + if (insn_has_no_other_exits_or_PUTs_to_PC( + irsb->stmts, irsb->stmts_used - 1, + offB_GUEST_IP, szB_GUEST_IP, irsb->tyenv)) { // We've got the right pattern. Check whether we can chase into the // destination, and if so convert that to a delta value. const IRConst* known_dst = irsb->next->Iex.Const.con; @@ -703,15 +741,15 @@ static void analyse_block_end ( /*OUT*/BlockEnd* be, const IRSB* irsb, guest_IP_sbstart, guest_word_type, chase_into_ok, callback_opaque); if (ok) { - be->tag = Be_UnCond; - be->Be.UnCond.delta = delta; + be->tag = Be_Uncond; + be->Be.Uncond.delta = delta; goto out; } } } - be->tag = Be_Unknown; - // Not identified as anything in particular. + // Not identified as anything of interest to us. + be->tag = Be_Other; out: if (debug_print) { @@ -1271,16 +1309,17 @@ IRSB* bb_to_IR ( // ends. BlockEnd irsb_be; analyse_block_end(&irsb_be, irsb, guest_IP_sbstart, guest_word_type, - chase_into_ok, callback_opaque, debug_print); + chase_into_ok, callback_opaque, + offB_GUEST_IP, szB_GUEST_IP, debug_print); // Try for an extend based on an unconditional branch or call to a known // destination. - if (irsb_be.tag == Be_UnCond) { + if (irsb_be.tag == Be_Uncond) { if (debug_print) { vex_printf("\n-+-+ Unconditional follow (ext# %d) to 0x%llx " "-+-+\n\n", (Int)vge->n_used, - (ULong)((Long)guest_IP_sbstart+ irsb_be.Be.UnCond.delta)); + (ULong)((Long)guest_IP_sbstart+ irsb_be.Be.Uncond.delta)); } Int bb_instrs_used = 0; Bool bb_verbose_seen = False; @@ -1290,7 +1329,7 @@ IRSB* bb_to_IR ( = disassemble_basic_block_till_stop( /*OUT*/ &bb_instrs_used, &bb_verbose_seen, &bb_base, &bb_len, /*MOD*/ emptyIRSB(), - /*IN*/ irsb_be.Be.UnCond.delta, + /*IN*/ irsb_be.Be.Uncond.delta, instrs_avail, guest_IP_sbstart, host_endness, sigill_diag, arch_guest, archinfo_guest, abiinfo_both, guest_word_type, debug_print, dis_instr_fn, guest_code, offB_GUEST_IP @@ -1305,7 +1344,7 @@ IRSB* bb_to_IR ( add_extent(vge, bb_base, bb_len); update_instr_budget(&instrs_avail, &verbose_mode, bb_instrs_used, bb_verbose_seen); - } // if (be.tag == Be_UnCond) + } // if (be.tag == Be_Uncond) // Try for an extend based on a conditional branch, specifically in the // hope of identifying and recovering, an "A && B" condition spread across @@ -1339,7 +1378,8 @@ IRSB* bb_to_IR ( vassert(sx_instrs_used <= instrs_avail_spec); BlockEnd sx_be; analyse_block_end(&sx_be, sx_bb, guest_IP_sbstart, guest_word_type, - chase_into_ok, callback_opaque, debug_print); + chase_into_ok, callback_opaque, + offB_GUEST_IP, szB_GUEST_IP, debug_print); if (debug_print) { vex_printf("\n-+-+ SPEC fall through -+-+\n\n"); @@ -1360,7 +1400,8 @@ IRSB* bb_to_IR ( vassert(ft_instrs_used <= instrs_avail_spec); BlockEnd ft_be; analyse_block_end(&ft_be, ft_bb, guest_IP_sbstart, guest_word_type, - chase_into_ok, callback_opaque, debug_print); + chase_into_ok, callback_opaque, + offB_GUEST_IP, szB_GUEST_IP, debug_print); /* In order for the transformation to be remotely valid, we need: - At least one of the sx_bb or ft_bb to be have a Be_Cond end. |
|
From: Julian S. <se...@so...> - 2020-01-02 06:28:29
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=ffac99695e2ad3bd452ab17ee94a9c9da38f173a commit ffac99695e2ad3bd452ab17ee94a9c9da38f173a Author: Julian Seward <js...@ac...> Date: Mon Nov 11 17:06:54 2019 +0100 iselFltExpr_wrk: handle Iex_ITE, presumably caused by newly-created guarding machinery. Diff: --- VEX/priv/host_amd64_isel.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/VEX/priv/host_amd64_isel.c b/VEX/priv/host_amd64_isel.c index a389e81..8dc3068 100644 --- a/VEX/priv/host_amd64_isel.c +++ b/VEX/priv/host_amd64_isel.c @@ -2815,6 +2815,19 @@ static HReg iselFltExpr_wrk ( ISelEnv* env, const IRExpr* e ) return dst; } + if (e->tag == Iex_ITE) { // VFD + HReg r1, r0, dst; + vassert(ty == Ity_F32); + vassert(typeOfIRExpr(env->type_env,e->Iex.ITE.cond) == Ity_I1); + r1 = iselFltExpr(env, e->Iex.ITE.iftrue); + r0 = iselFltExpr(env, e->Iex.ITE.iffalse); + dst = newVRegV(env); + addInstr(env, mk_vMOVsd_RR(r1,dst)); + AMD64CondCode cc = iselCondCode(env, e->Iex.ITE.cond); + addInstr(env, AMD64Instr_SseCMov(cc ^ 1, r0, dst)); + return dst; + } + ppIRExpr(e); vpanic("iselFltExpr_wrk"); } |
|
From: Julian S. <se...@so...> - 2020-01-02 06:28:29
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=eaf8385b5ec1f6bf0f884b07d58ddb3f25ee5245 commit eaf8385b5ec1f6bf0f884b07d58ddb3f25ee5245 Author: Julian Seward <js...@ac...> Date: Mon Nov 11 16:11:20 2019 +0100 Clean up machinery to do with conditionalising IRStmts: * document some functions * change naming and terminology from 'speculation' (which it isn't) to 'guarding' (which it is) * add a new function |primopMightTrap| so as to avoid conditionalising IRExprs involving potentially trappy IROps Diff: --- VEX/priv/guest_generic_bb_to_IR.c | 75 ++++--- VEX/priv/ir_defs.c | 426 ++++++++++++++++++++++++++++++++++++++ VEX/pub/libvex_ir.h | 5 + 3 files changed, 476 insertions(+), 30 deletions(-) diff --git a/VEX/priv/guest_generic_bb_to_IR.c b/VEX/priv/guest_generic_bb_to_IR.c index 4cb813f..7782bcf 100644 --- a/VEX/priv/guest_generic_bb_to_IR.c +++ b/VEX/priv/guest_generic_bb_to_IR.c @@ -388,29 +388,42 @@ static void create_self_checks_as_needed( /*--------------------------------------------------------------*/ -/*--- To do with speculation of IRStmts ---*/ +/*--- To do with guarding (conditionalisation) of IRStmts ---*/ /*--------------------------------------------------------------*/ -static Bool expr_is_speculatable ( const IRExpr* e ) +// Is it possible to guard |e|? Meaning, is it safe (exception-free) to compute +// |e| and ignore the result? Since |e| is by definition otherwise +// side-effect-free, we don't have to ask about any other effects caused by +// first computing |e| and then ignoring the result. +static Bool expr_is_guardable ( const IRExpr* e ) { switch (e->tag) { case Iex_Load: return False; - case Iex_Unop: // FIXME BOGUS, since it might trap - case Iex_Binop: // FIXME ditto - case Iex_ITE: // this is OK - return True; + case Iex_Unop: + return !primopMightTrap(e->Iex.Unop.op); + case Iex_Binop: + return !primopMightTrap(e->Iex.Binop.op); + case Iex_ITE: case Iex_CCall: - return True; // This is probably correct case Iex_Get: return True; default: - vex_printf("\n"); ppIRExpr(e); - vpanic("expr_is_speculatable: unhandled expr"); + vex_printf("\n"); ppIRExpr(e); vex_printf("\n"); + vpanic("expr_is_guardable: unhandled expr"); } } -static Bool stmt_is_speculatable ( const IRStmt* st ) +// Is it possible to guard |st|? Meaning, is it possible to replace |st| by +// some other sequence of IRStmts which have the same effect on the architected +// state when the guard is true, but when it is false, have no effect on the +// architected state and are guaranteed not to cause any exceptions? +// +// Note that this isn't as aggressive as it could be: it sometimes returns False +// in cases where |st| is actually guardable. This routine must coordinate +// closely with add_guarded_stmt_to_end_of below, in the sense that that routine +// must be able to handle any |st| for which this routine returns True. +static Bool stmt_is_guardable ( const IRStmt* st ) { switch (st->tag) { case Ist_IMark: @@ -421,35 +434,37 @@ static Bool stmt_is_speculatable ( const IRStmt* st ) case Ist_Exit: // We could in fact spec this, if required return False; case Ist_WrTmp: - return expr_is_speculatable(st->Ist.WrTmp.data); + return expr_is_guardable(st->Ist.WrTmp.data); default: - vex_printf("\n"); ppIRStmt(st); - vpanic("stmt_is_speculatable: unhandled stmt"); + vex_printf("\n"); ppIRStmt(st); vex_printf("\n"); + vpanic("stmt_is_guardable: unhandled stmt"); } } -static Bool block_is_speculatable ( const IRSB* bb ) +// Are all stmts (but not the end dst value) in |bb| guardable, per +// stmt_is_guardable? +static Bool block_is_guardable ( const IRSB* bb ) { Int i = bb->stmts_used; - vassert(i >= 2); // Must have at least: IMark, final Exit + vassert(i >= 2); // Must have at least: IMark, side Exit (at the end) i--; vassert(bb->stmts[i]->tag == Ist_Exit); i--; for (; i >= 0; i--) { - if (!stmt_is_speculatable(bb->stmts[i])) + if (!stmt_is_guardable(bb->stmts[i])) return False; } return True; } -static void speculate_stmt_to_end_of ( /*MOD*/IRSB* bb, - /*IN*/ IRStmt* st, IRTemp guard ) +// Guard |st| with |guard| and add it to |bb|. This must be able to handle any +// |st| for which stmt_is_guardable returns True. +static void add_guarded_stmt_to_end_of ( /*MOD*/IRSB* bb, + /*IN*/ IRStmt* st, IRTemp guard ) { - // We assume all stmts we're presented with here have previously been OK'd by - // stmt_is_speculatable above. switch (st->tag) { case Ist_IMark: - case Ist_WrTmp: // FIXME is this ok? + case Ist_WrTmp: addStmtToIRSB(bb, st); break; case Ist_Put: { @@ -472,7 +487,7 @@ static void speculate_stmt_to_end_of ( /*MOD*/IRSB* bb, case Ist_Exit: { // Exit(xguard, dst, jk, offsIP) // ==> t1 = And1(xguard, guard) - // Exit(And1(xguard, guard), dst, jk, offsIP) + // Exit(t1, dst, jk, offsIP) IRExpr* xguard = st->Ist.Exit.guard; IRTemp t1 = newIRTemp(bb->tyenv, Ity_I1); addStmtToIRSB(bb, IRStmt_WrTmp(t1, IRExpr_Binop(Iop_And1, xguard, @@ -482,8 +497,8 @@ static void speculate_stmt_to_end_of ( /*MOD*/IRSB* bb, break; } default: - vex_printf("\n"); ppIRStmt(st); - vpanic("speculate_stmt_to_end_of: unhandled stmt"); + vex_printf("\n"); ppIRStmt(st); vex_printf("\n"); + vpanic("add_guarded_stmt_to_end_of: unhandled stmt"); } } @@ -1435,10 +1450,10 @@ IRSB* bb_to_IR ( ppBlockEnd(&sx_be); vex_printf("\n"); } - // Finally, check the sx block actually is speculatable. - ok = block_is_speculatable(sx_bb); + // Finally, check the sx block actually is guardable. + ok = block_is_guardable(sx_bb); if (!ok && debug_print) { - vex_printf("\n-+-+ SX not speculatable, giving up. -+-+\n\n"); + vex_printf("\n-+-+ SX not guardable, giving up. -+-+\n\n"); } } @@ -1450,10 +1465,10 @@ IRSB* bb_to_IR ( // 0. remove the last Exit on irsb. // 1. Add irsb->tyenv->types_used to all the tmps in sx_bb, // by calling deltaIRStmt on all stmts. - // 2. Speculate all stmts in sx_bb on irsb_be.Be.Cond.condSX, + // 2. Guard all stmts in sx_bb on irsb_be.Be.Cond.condSX, // **including** the last stmt (which must be an Exit). It's // here that the And1 is generated. - // 3. Copy all speculated stmts to the end of irsb. + // 3. Copy all guarded stmts to the end of irsb. vassert(irsb->stmts_used >= 2); irsb->stmts_used--; Int delta = irsb->tyenv->types_used; @@ -1466,7 +1481,7 @@ IRSB* bb_to_IR ( for (Int i = 0; i < sx_bb->stmts_used; i++) { IRStmt* st = deepCopyIRStmt(sx_bb->stmts[i]); deltaIRStmt(st, delta); - speculate_stmt_to_end_of(irsb, st, irsb_be.Be.Cond.condSX); + add_guarded_stmt_to_end_of(irsb, st, irsb_be.Be.Cond.condSX); } if (debug_print) { diff --git a/VEX/priv/ir_defs.c b/VEX/priv/ir_defs.c index 6035574..d687d8f 100644 --- a/VEX/priv/ir_defs.c +++ b/VEX/priv/ir_defs.c @@ -1345,6 +1345,432 @@ void ppIROp ( IROp op ) } } +// A very few primops might trap (eg, divide by zero). We need to be able to +// identify them. +Bool primopMightTrap ( IROp op ) +{ + switch (op) { + + // The few potentially trapping ones + case Iop_DivU32: case Iop_DivS32: case Iop_DivU64: case Iop_DivS64: + case Iop_DivU64E: case Iop_DivS64E: case Iop_DivU32E: case Iop_DivS32E: + case Iop_DivModU64to32: case Iop_DivModS64to32: case Iop_DivModU128to64: + case Iop_DivModS128to64: case Iop_DivModS64to64: case Iop_DivModU64to64: + case Iop_DivModS32to32: case Iop_DivModU32to32: + return True; + + // All the rest are non-trapping + case Iop_Add8: case Iop_Add16: case Iop_Add32: case Iop_Add64: + case Iop_Sub8: case Iop_Sub16: case Iop_Sub32: case Iop_Sub64: + case Iop_Mul8: case Iop_Mul16: case Iop_Mul32: case Iop_Mul64: + case Iop_Or8: case Iop_Or16: case Iop_Or32: case Iop_Or64: + case Iop_And8: case Iop_And16: case Iop_And32: case Iop_And64: + case Iop_Xor8: case Iop_Xor16: case Iop_Xor32: case Iop_Xor64: + case Iop_Shl8: case Iop_Shl16: case Iop_Shl32: case Iop_Shl64: + case Iop_Shr8: case Iop_Shr16: case Iop_Shr32: case Iop_Shr64: + case Iop_Sar8: case Iop_Sar16: case Iop_Sar32: case Iop_Sar64: + case Iop_CmpEQ8: case Iop_CmpEQ16: case Iop_CmpEQ32: case Iop_CmpEQ64: + case Iop_CmpNE8: case Iop_CmpNE16: case Iop_CmpNE32: case Iop_CmpNE64: + case Iop_Not8: case Iop_Not16: case Iop_Not32: case Iop_Not64: + case Iop_CasCmpEQ8: case Iop_CasCmpEQ16: case Iop_CasCmpEQ32: + case Iop_CasCmpEQ64: case Iop_CasCmpNE8: case Iop_CasCmpNE16: + case Iop_CasCmpNE32: case Iop_CasCmpNE64: case Iop_ExpCmpNE8: + case Iop_ExpCmpNE16: case Iop_ExpCmpNE32: case Iop_ExpCmpNE64: + case Iop_MullS8: case Iop_MullS16: case Iop_MullS32: case Iop_MullS64: + case Iop_MullU8: case Iop_MullU16: case Iop_MullU32: case Iop_MullU64: + case Iop_Clz64: case Iop_Clz32: case Iop_Ctz64: case Iop_Ctz32: + case Iop_ClzNat64: case Iop_ClzNat32: case Iop_CtzNat64: case Iop_CtzNat32: + case Iop_PopCount64: case Iop_PopCount32: + case Iop_CmpLT32S: case Iop_CmpLT64S: case Iop_CmpLE32S: case Iop_CmpLE64S: + case Iop_CmpLT32U: case Iop_CmpLT64U: case Iop_CmpLE32U: case Iop_CmpLE64U: + case Iop_CmpNEZ8: case Iop_CmpNEZ16: case Iop_CmpNEZ32: case Iop_CmpNEZ64: + case Iop_CmpwNEZ32: case Iop_CmpwNEZ64: + case Iop_Left8: case Iop_Left16: case Iop_Left32: case Iop_Left64: + case Iop_Max32U: case Iop_CmpORD32U: case Iop_CmpORD64U: + case Iop_CmpORD32S: case Iop_CmpORD64S: + case Iop_8Uto16: case Iop_8Uto32: case Iop_8Uto64: + case Iop_16Uto32: case Iop_16Uto64: case Iop_32Uto64: + case Iop_8Sto16: case Iop_8Sto32: case Iop_8Sto64: + case Iop_16Sto32: case Iop_16Sto64: case Iop_32Sto64: + case Iop_64to8: case Iop_32to8: case Iop_64to16: + case Iop_16to8: case Iop_16HIto8: case Iop_8HLto16: case Iop_32to16: + case Iop_32HIto16: case Iop_16HLto32: case Iop_64to32: case Iop_64HIto32: + case Iop_32HLto64: case Iop_128to64: case Iop_128HIto64: case Iop_64HLto128: + case Iop_Not1: case Iop_And1: case Iop_Or1: case Iop_32to1: case Iop_64to1: + case Iop_1Uto8: case Iop_1Uto32: case Iop_1Uto64: case Iop_1Sto8: + case Iop_1Sto16: case Iop_1Sto32: case Iop_1Sto64: + case Iop_AddF64: case Iop_SubF64: case Iop_MulF64: case Iop_DivF64: + case Iop_AddF32: case Iop_SubF32: case Iop_MulF32: case Iop_DivF32: + case Iop_AddF64r32: case Iop_SubF64r32: case Iop_MulF64r32: + case Iop_DivF64r32: case Iop_NegF64: case Iop_AbsF64: + case Iop_NegF32: case Iop_AbsF32: case Iop_SqrtF64: case Iop_SqrtF32: + case Iop_CmpF64: case Iop_CmpF32: case Iop_CmpF128: case Iop_F64toI16S: + case Iop_F64toI32S: case Iop_F64toI64S: case Iop_F64toI64U: + case Iop_F64toI32U: case Iop_I32StoF64: case Iop_I64StoF64: + case Iop_I64UtoF64: case Iop_I64UtoF32: case Iop_I32UtoF32: + case Iop_I32UtoF64: case Iop_F32toI32S: case Iop_F32toI64S: + case Iop_F32toI32U: case Iop_F32toI64U: case Iop_I32StoF32: + case Iop_I64StoF32: case Iop_F32toF64: case Iop_F64toF32: + case Iop_ReinterpF64asI64: case Iop_ReinterpI64asF64: + case Iop_ReinterpF32asI32: case Iop_ReinterpI32asF32: + case Iop_F64HLtoF128: case Iop_F128HItoF64: case Iop_F128LOtoF64: + case Iop_AddF128: case Iop_SubF128: case Iop_MulF128: case Iop_DivF128: + case Iop_MAddF128: case Iop_MSubF128: case Iop_NegMAddF128: + case Iop_NegMSubF128: case Iop_NegF128: case Iop_AbsF128: + case Iop_SqrtF128: case Iop_I32StoF128: case Iop_I64StoF128: + case Iop_I32UtoF128: case Iop_I64UtoF128: case Iop_F32toF128: + case Iop_F64toF128: case Iop_F128toI32S: case Iop_F128toI64S: + case Iop_F128toI32U: case Iop_F128toI64U: case Iop_F128toI128S: + case Iop_F128toF64: case Iop_F128toF32: case Iop_RndF128: + case Iop_TruncF128toI32S: case Iop_TruncF128toI32U: case Iop_TruncF128toI64U: + case Iop_TruncF128toI64S: case Iop_AtanF64: case Iop_Yl2xF64: + case Iop_Yl2xp1F64: case Iop_PRemF64: case Iop_PRemC3210F64: + case Iop_PRem1F64: case Iop_PRem1C3210F64: case Iop_ScaleF64: + case Iop_SinF64: case Iop_CosF64: case Iop_TanF64: + case Iop_2xm1F64: case Iop_RoundF128toInt: case Iop_RoundF64toInt: + case Iop_RoundF32toInt: case Iop_MAddF32: case Iop_MSubF32: + case Iop_MAddF64: case Iop_MSubF64: + case Iop_MAddF64r32: case Iop_MSubF64r32: + case Iop_RSqrtEst5GoodF64: case Iop_RoundF64toF64_NEAREST: + case Iop_RoundF64toF64_NegINF: case Iop_RoundF64toF64_PosINF: + case Iop_RoundF64toF64_ZERO: case Iop_TruncF64asF32: case Iop_RoundF64toF32: + case Iop_RecpExpF64: case Iop_RecpExpF32: case Iop_MaxNumF64: + case Iop_MinNumF64: case Iop_MaxNumF32: case Iop_MinNumF32: + case Iop_F16toF64: case Iop_F64toF16: case Iop_F16toF32: + case Iop_F32toF16: case Iop_QAdd32S: case Iop_QSub32S: + case Iop_Add16x2: case Iop_Sub16x2: + case Iop_QAdd16Sx2: case Iop_QAdd16Ux2: + case Iop_QSub16Sx2: case Iop_QSub16Ux2: + case Iop_HAdd16Ux2: case Iop_HAdd16Sx2: + case Iop_HSub16Ux2: case Iop_HSub16Sx2: + case Iop_Add8x4: case Iop_Sub8x4: + case Iop_QAdd8Sx4: case Iop_QAdd8Ux4: + case Iop_QSub8Sx4: case Iop_QSub8Ux4: + case Iop_HAdd8Ux4: case Iop_HAdd8Sx4: + case Iop_HSub8Ux4: case Iop_HSub8Sx4: case Iop_Sad8Ux4: + case Iop_CmpNEZ16x2: case Iop_CmpNEZ8x4: case Iop_Reverse8sIn32_x1: + case Iop_I32UtoF32x2_DEP: case Iop_I32StoF32x2_DEP: + case Iop_F32toI32Ux2_RZ: case Iop_F32toI32Sx2_RZ: + case Iop_F32ToFixed32Ux2_RZ: case Iop_F32ToFixed32Sx2_RZ: + case Iop_Fixed32UToF32x2_RN: case Iop_Fixed32SToF32x2_RN: + case Iop_Max32Fx2: case Iop_Min32Fx2: + case Iop_PwMax32Fx2: case Iop_PwMin32Fx2: + case Iop_CmpEQ32Fx2: case Iop_CmpGT32Fx2: case Iop_CmpGE32Fx2: + case Iop_RecipEst32Fx2: case Iop_RecipStep32Fx2: case Iop_RSqrtEst32Fx2: + case Iop_RSqrtStep32Fx2: case Iop_Neg32Fx2: case Iop_Abs32Fx2: + case Iop_CmpNEZ8x8: case Iop_CmpNEZ16x4: case Iop_CmpNEZ32x2: + case Iop_Add8x8: case Iop_Add16x4: case Iop_Add32x2: + case Iop_QAdd8Ux8: case Iop_QAdd16Ux4: case Iop_QAdd32Ux2: case Iop_QAdd64Ux1: + case Iop_QAdd8Sx8: case Iop_QAdd16Sx4: case Iop_QAdd32Sx2: case Iop_QAdd64Sx1: + case Iop_PwAdd8x8: case Iop_PwAdd16x4: case Iop_PwAdd32x2: + case Iop_PwMax8Sx8: case Iop_PwMax16Sx4: case Iop_PwMax32Sx2: + case Iop_PwMax8Ux8: case Iop_PwMax16Ux4: case Iop_PwMax32Ux2: + case Iop_PwMin8Sx8: case Iop_PwMin16Sx4: case Iop_PwMin32Sx2: + case Iop_PwMin8Ux8: case Iop_PwMin16Ux4: case Iop_PwMin32Ux2: + case Iop_PwAddL8Ux8: case Iop_PwAddL16Ux4: case Iop_PwAddL32Ux2: + case Iop_PwAddL8Sx8: case Iop_PwAddL16Sx4: case Iop_PwAddL32Sx2: + case Iop_Sub8x8: case Iop_Sub16x4: case Iop_Sub32x2: + case Iop_QSub8Ux8: case Iop_QSub16Ux4: case Iop_QSub32Ux2: case Iop_QSub64Ux1: + case Iop_QSub8Sx8: case Iop_QSub16Sx4: case Iop_QSub32Sx2: case Iop_QSub64Sx1: + case Iop_Abs8x8: case Iop_Abs16x4: case Iop_Abs32x2: + case Iop_Mul8x8: case Iop_Mul16x4: case Iop_Mul32x2: + case Iop_Mul32Fx2: case Iop_MulHi16Ux4: case Iop_MulHi16Sx4: + case Iop_PolynomialMul8x8: case Iop_QDMulHi16Sx4: case Iop_QDMulHi32Sx2: + case Iop_QRDMulHi16Sx4: case Iop_QRDMulHi32Sx2: case Iop_Avg8Ux8: + case Iop_Avg16Ux4: case Iop_Max8Sx8: case Iop_Max16Sx4: case Iop_Max32Sx2: + case Iop_Max8Ux8: case Iop_Max16Ux4: case Iop_Max32Ux2: + case Iop_Min8Sx8: case Iop_Min16Sx4: case Iop_Min32Sx2: + case Iop_Min8Ux8: case Iop_Min16Ux4: case Iop_Min32Ux2: + case Iop_CmpEQ8x8: case Iop_CmpEQ16x4: case Iop_CmpEQ32x2: + case Iop_CmpGT8Ux8: case Iop_CmpGT16Ux4: case Iop_CmpGT32Ux2: + case Iop_CmpGT8Sx8: case Iop_CmpGT16Sx4: case Iop_CmpGT32Sx2: + case Iop_Cnt8x8: case Iop_Clz8x8: case Iop_Clz16x4: case Iop_Clz32x2: + case Iop_Cls8x8: case Iop_Cls16x4: case Iop_Cls32x2: case Iop_Clz64x2: + case Iop_Ctz8x16: case Iop_Ctz16x8: case Iop_Ctz32x4: case Iop_Ctz64x2: + case Iop_Shl8x8: case Iop_Shl16x4: case Iop_Shl32x2: + case Iop_Shr8x8: case Iop_Shr16x4: case Iop_Shr32x2: + case Iop_Sar8x8: case Iop_Sar16x4: case Iop_Sar32x2: + case Iop_Sal8x8: case Iop_Sal16x4: case Iop_Sal32x2: case Iop_Sal64x1: + case Iop_ShlN8x8: case Iop_ShlN16x4: case Iop_ShlN32x2: + case Iop_ShrN8x8: case Iop_ShrN16x4: case Iop_ShrN32x2: + case Iop_SarN8x8: case Iop_SarN16x4: case Iop_SarN32x2: + case Iop_QShl8x8: case Iop_QShl16x4: case Iop_QShl32x2: case Iop_QShl64x1: + case Iop_QSal8x8: case Iop_QSal16x4: case Iop_QSal32x2: case Iop_QSal64x1: + case Iop_QShlNsatSU8x8: case Iop_QShlNsatSU16x4: + case Iop_QShlNsatSU32x2: case Iop_QShlNsatSU64x1: + case Iop_QShlNsatUU8x8: case Iop_QShlNsatUU16x4: + case Iop_QShlNsatUU32x2: case Iop_QShlNsatUU64x1: + case Iop_QShlNsatSS8x8: case Iop_QShlNsatSS16x4: + case Iop_QShlNsatSS32x2: case Iop_QShlNsatSS64x1: + case Iop_QNarrowBin16Sto8Ux8: + case Iop_QNarrowBin16Sto8Sx8: case Iop_QNarrowBin32Sto16Sx4: + case Iop_NarrowBin16to8x8: case Iop_NarrowBin32to16x4: + case Iop_InterleaveHI8x8: case Iop_InterleaveHI16x4: + case Iop_InterleaveHI32x2: + case Iop_InterleaveLO8x8: case Iop_InterleaveLO16x4: + case Iop_InterleaveLO32x2: + case Iop_InterleaveOddLanes8x8: case Iop_InterleaveEvenLanes8x8: + case Iop_InterleaveOddLanes16x4: case Iop_InterleaveEvenLanes16x4: + case Iop_CatOddLanes8x8: case Iop_CatOddLanes16x4: + case Iop_CatEvenLanes8x8: case Iop_CatEvenLanes16x4: + case Iop_GetElem8x8: case Iop_GetElem16x4: case Iop_GetElem32x2: + case Iop_SetElem8x8: case Iop_SetElem16x4: case Iop_SetElem32x2: + case Iop_Dup8x8: case Iop_Dup16x4: case Iop_Dup32x2: + case Iop_Slice64: case Iop_Reverse8sIn16_x4: + case Iop_Reverse8sIn32_x2: case Iop_Reverse16sIn32_x2: + case Iop_Reverse8sIn64_x1: case Iop_Reverse16sIn64_x1: + case Iop_Reverse32sIn64_x1: case Iop_Perm8x8: case Iop_PermOrZero8x8: + case Iop_GetMSBs8x8: case Iop_RecipEst32Ux2: case Iop_RSqrtEst32Ux2: + case Iop_AddD64: case Iop_SubD64: case Iop_MulD64: case Iop_DivD64: + case Iop_AddD128: case Iop_SubD128: case Iop_MulD128: case Iop_DivD128: + case Iop_ShlD64: case Iop_ShrD64: + case Iop_ShlD128: case Iop_ShrD128: + case Iop_D32toD64: case Iop_D64toD128: case Iop_I32StoD128: + case Iop_I32UtoD128: case Iop_I64StoD128: case Iop_I64UtoD128: + case Iop_D64toD32: case Iop_D128toD64: case Iop_I32StoD64: + case Iop_I32UtoD64: case Iop_I64StoD64: case Iop_I64UtoD64: + case Iop_D64toI32S: case Iop_D64toI32U: case Iop_D64toI64S: + case Iop_D64toI64U: case Iop_D128toI32S: case Iop_D128toI32U: + case Iop_D128toI64S: case Iop_D128toI64U: case Iop_F32toD32: + case Iop_F32toD64: case Iop_F32toD128: case Iop_F64toD32: + case Iop_F64toD64: case Iop_F64toD128: case Iop_F128toD32: + case Iop_F128toD64: case Iop_F128toD128: case Iop_D32toF32: + case Iop_D32toF64: case Iop_D32toF128: case Iop_D64toF32: case Iop_D64toF64: + case Iop_D64toF128: case Iop_D128toF32: case Iop_D128toF64: + case Iop_D128toF128: case Iop_RoundD64toInt: case Iop_RoundD128toInt: + case Iop_CmpD64: case Iop_CmpD128: case Iop_CmpExpD64: + case Iop_CmpExpD128: case Iop_QuantizeD64: case Iop_QuantizeD128: + case Iop_SignificanceRoundD64: case Iop_SignificanceRoundD128: + case Iop_ExtractExpD64: case Iop_ExtractExpD128: case Iop_ExtractSigD64: + case Iop_ExtractSigD128: case Iop_InsertExpD64: case Iop_InsertExpD128: + case Iop_D64HLtoD128: case Iop_D128HItoD64: case Iop_D128LOtoD64: + case Iop_DPBtoBCD: case Iop_BCDtoDPB: case Iop_BCDAdd: case Iop_BCDSub: + case Iop_I128StoBCD128: case Iop_BCD128toI128S: case Iop_ReinterpI64asD64: + case Iop_ReinterpD64asI64: + case Iop_Add32Fx4: case Iop_Sub32Fx4: case Iop_Mul32Fx4: case Iop_Div32Fx4: + case Iop_Max32Fx4: case Iop_Min32Fx4: + case Iop_Add32Fx2: case Iop_Sub32Fx2: + case Iop_CmpEQ32Fx4: case Iop_CmpLT32Fx4: + case Iop_CmpLE32Fx4: case Iop_CmpUN32Fx4: + case Iop_CmpGT32Fx4: case Iop_CmpGE32Fx4: + case Iop_PwMax32Fx4: case Iop_PwMin32Fx4: + case Iop_Abs32Fx4: case Iop_Neg32Fx4: case Iop_Sqrt32Fx4: + case Iop_RecipEst32Fx4: case Iop_RecipStep32Fx4: case Iop_RSqrtEst32Fx4: + case Iop_Scale2_32Fx4: case Iop_Log2_32Fx4: case Iop_Exp2_32Fx4: + case Iop_RSqrtStep32Fx4: + case Iop_I32UtoF32x4_DEP: case Iop_I32StoF32x4_DEP: case Iop_I32StoF32x4: + case Iop_F32toI32Sx4: case Iop_F32toI32Ux4_RZ: case Iop_F32toI32Sx4_RZ: + case Iop_QF32toI32Ux4_RZ: case Iop_QF32toI32Sx4_RZ: + case Iop_RoundF32x4_RM: case Iop_RoundF32x4_RP: + case Iop_RoundF32x4_RN: case Iop_RoundF32x4_RZ: + case Iop_F32ToFixed32Ux4_RZ: case Iop_F32ToFixed32Sx4_RZ: + case Iop_Fixed32UToF32x4_RN: case Iop_Fixed32SToF32x4_RN: + case Iop_F32toF16x4_DEP: case Iop_F32toF16x4: case Iop_F16toF32x4: + case Iop_F64toF16x2_DEP: case Iop_F16toF64x2: case Iop_F32x4_2toQ16x8: + case Iop_Add32F0x4: case Iop_Sub32F0x4: case Iop_Mul32F0x4: + case Iop_Div32F0x4: case Iop_Max32F0x4: case Iop_Min32F0x4: + case Iop_CmpEQ32F0x4: case Iop_CmpLT32F0x4: case Iop_CmpLE32F0x4: + case Iop_CmpUN32F0x4: + case Iop_RecipEst32F0x4: case Iop_Sqrt32F0x4: case Iop_RSqrtEst32F0x4: + case Iop_Add64Fx2: case Iop_Sub64Fx2: case Iop_Mul64Fx2: case Iop_Div64Fx2: + case Iop_Max64Fx2: case Iop_Min64Fx2: + case Iop_CmpEQ64Fx2: case Iop_CmpLT64Fx2: case Iop_CmpLE64Fx2: + case Iop_CmpUN64Fx2: case Iop_Abs64Fx2: case Iop_Neg64Fx2: + case Iop_Sqrt64Fx2: case Iop_Scale2_64Fx2: case Iop_Log2_64Fx2: + case Iop_RecipEst64Fx2: case Iop_RecipStep64Fx2: case Iop_RSqrtEst64Fx2: + case Iop_RSqrtStep64Fx2: case Iop_F64x2_2toQ32x4: + case Iop_Add64F0x2: case Iop_Sub64F0x2: case Iop_Mul64F0x2: + case Iop_Div64F0x2: case Iop_Max64F0x2: case Iop_Min64F0x2: + case Iop_CmpEQ64F0x2: case Iop_CmpLT64F0x2: case Iop_CmpLE64F0x2: + case Iop_CmpUN64F0x2: case Iop_Sqrt64F0x2: case Iop_V128to64: + case Iop_V128HIto64: case Iop_64HLtoV128: case Iop_64UtoV128: + case Iop_SetV128lo64: case Iop_ZeroHI64ofV128: case Iop_ZeroHI96ofV128: + case Iop_ZeroHI112ofV128: case Iop_ZeroHI120ofV128: case Iop_32UtoV128: + case Iop_V128to32: case Iop_SetV128lo32: case Iop_NotV128: + case Iop_AndV128: case Iop_OrV128: case Iop_XorV128: + case Iop_ShlV128: case Iop_ShrV128: case Iop_SarV128: + case Iop_CmpNEZ8x16: case Iop_CmpNEZ16x8: case Iop_CmpNEZ32x4: + case Iop_CmpNEZ64x2: case Iop_CmpNEZ128x1: + case Iop_Add8x16: case Iop_Add16x8: case Iop_Add32x4: + case Iop_Add64x2: case Iop_Add128x1: + case Iop_QAdd8Ux16: case Iop_QAdd16Ux8: case Iop_QAdd32Ux4: + case Iop_QAdd64Ux2: + case Iop_QAdd8Sx16: case Iop_QAdd16Sx8: case Iop_QAdd32Sx4: + case Iop_QAdd64Sx2: + case Iop_QAddExtUSsatSS8x16: case Iop_QAddExtUSsatSS16x8: + case Iop_QAddExtUSsatSS32x4: case Iop_QAddExtUSsatSS64x2: + case Iop_QAddExtSUsatUU8x16: case Iop_QAddExtSUsatUU16x8: + case Iop_QAddExtSUsatUU32x4: case Iop_QAddExtSUsatUU64x2: + case Iop_Sub8x16: case Iop_Sub16x8: case Iop_Sub32x4: + case Iop_Sub64x2: case Iop_Sub128x1: + case Iop_QSub8Ux16: case Iop_QSub16Ux8: case Iop_QSub32Ux4: + case Iop_QSub64Ux2: + case Iop_QSub8Sx16: case Iop_QSub16Sx8: case Iop_QSub32Sx4: + case Iop_QSub64Sx2: + case Iop_Mul8x16: case Iop_Mul16x8: case Iop_Mul32x4: + case Iop_MulHi8Ux16: case Iop_MulHi16Ux8: case Iop_MulHi32Ux4: + case Iop_MulHi8Sx16: case Iop_MulHi16Sx8: case Iop_MulHi32Sx4: + case Iop_MullEven8Ux16: case Iop_MullEven16Ux8: case Iop_MullEven32Ux4: + case Iop_MullEven8Sx16: case Iop_MullEven16Sx8: case Iop_MullEven32Sx4: + case Iop_Mull8Ux8: case Iop_Mull8Sx8: + case Iop_Mull16Ux4: case Iop_Mull16Sx4: + case Iop_Mull32Ux2: case Iop_Mull32Sx2: + case Iop_QDMull16Sx4: case Iop_QDMull32Sx2: + case Iop_QDMulHi16Sx8: case Iop_QDMulHi32Sx4: + case Iop_QRDMulHi16Sx8: case Iop_QRDMulHi32Sx4: + case Iop_PolynomialMul8x16: case Iop_PolynomialMull8x8: + case Iop_PolynomialMulAdd8x16: case Iop_PolynomialMulAdd16x8: + case Iop_PolynomialMulAdd32x4: case Iop_PolynomialMulAdd64x2: + case Iop_PwAdd8x16: case Iop_PwAdd16x8: case Iop_PwAdd32x4: + case Iop_PwAdd32Fx2: case Iop_PwAddL8Ux16: case Iop_PwAddL16Ux8: + case Iop_PwAddL32Ux4: case Iop_PwAddL64Ux2: + case Iop_PwAddL8Sx16: case Iop_PwAddL16Sx8: case Iop_PwAddL32Sx4: + case Iop_PwExtUSMulQAdd8x16: + case Iop_PwBitMtxXpose64x2: + case Iop_Abs8x16: case Iop_Abs16x8: case Iop_Abs32x4: case Iop_Abs64x2: + case Iop_Avg8Ux16: case Iop_Avg16Ux8: case Iop_Avg32Ux4: case Iop_Avg64Ux2: + case Iop_Avg8Sx16: case Iop_Avg16Sx8: case Iop_Avg32Sx4: case Iop_Avg64Sx2: + case Iop_Max8Sx16: case Iop_Max16Sx8: case Iop_Max32Sx4: case Iop_Max64Sx2: + case Iop_Max8Ux16: case Iop_Max16Ux8: case Iop_Max32Ux4: case Iop_Max64Ux2: + case Iop_Min8Sx16: case Iop_Min16Sx8: case Iop_Min32Sx4: case Iop_Min64Sx2: + case Iop_Min8Ux16: case Iop_Min16Ux8: case Iop_Min32Ux4: case Iop_Min64Ux2: + case Iop_CmpEQ8x16: case Iop_CmpEQ16x8: case Iop_CmpEQ32x4: + case Iop_CmpEQ64x2: + case Iop_CmpGT8Sx16: case Iop_CmpGT16Sx8: case Iop_CmpGT32Sx4: + case Iop_CmpGT64Sx2: + case Iop_CmpGT8Ux16: case Iop_CmpGT16Ux8: case Iop_CmpGT32Ux4: + case Iop_CmpGT64Ux2: + case Iop_Cnt8x16: + case Iop_Clz8x16: case Iop_Clz16x8: case Iop_Clz32x4: + case Iop_Cls8x16: case Iop_Cls16x8: case Iop_Cls32x4: + case Iop_ShlN8x16: case Iop_ShlN16x8: case Iop_ShlN32x4: case Iop_ShlN64x2: + case Iop_ShrN8x16: case Iop_ShrN16x8: case Iop_ShrN32x4: case Iop_ShrN64x2: + case Iop_SarN8x16: case Iop_SarN16x8: case Iop_SarN32x4: case Iop_SarN64x2: + case Iop_Shl8x16: case Iop_Shl16x8: case Iop_Shl32x4: case Iop_Shl64x2: + case Iop_Shr8x16: case Iop_Shr16x8: case Iop_Shr32x4: case Iop_Shr64x2: + case Iop_Sar8x16: case Iop_Sar16x8: case Iop_Sar32x4: case Iop_Sar64x2: + case Iop_Sal8x16: case Iop_Sal16x8: case Iop_Sal32x4: case Iop_Sal64x2: + case Iop_Rol8x16: case Iop_Rol16x8: case Iop_Rol32x4: case Iop_Rol64x2: + case Iop_QShl8x16: case Iop_QShl16x8: case Iop_QShl32x4: case Iop_QShl64x2: + case Iop_QSal8x16: case Iop_QSal16x8: case Iop_QSal32x4: case Iop_QSal64x2: + case Iop_QShlNsatSU8x16: case Iop_QShlNsatSU16x8: + case Iop_QShlNsatSU32x4: case Iop_QShlNsatSU64x2: + case Iop_QShlNsatUU8x16: case Iop_QShlNsatUU16x8: + case Iop_QShlNsatUU32x4: case Iop_QShlNsatUU64x2: + case Iop_QShlNsatSS8x16: case Iop_QShlNsatSS16x8: + case Iop_QShlNsatSS32x4: case Iop_QShlNsatSS64x2: + case Iop_QandUQsh8x16: case Iop_QandUQsh16x8: + case Iop_QandUQsh32x4: case Iop_QandUQsh64x2: + case Iop_QandSQsh8x16: case Iop_QandSQsh16x8: + case Iop_QandSQsh32x4: case Iop_QandSQsh64x2: + case Iop_QandUQRsh8x16: case Iop_QandUQRsh16x8: + case Iop_QandUQRsh32x4: case Iop_QandUQRsh64x2: + case Iop_QandSQRsh8x16: case Iop_QandSQRsh16x8: + case Iop_QandSQRsh32x4: case Iop_QandSQRsh64x2: + case Iop_Sh8Sx16: case Iop_Sh16Sx8: case Iop_Sh32Sx4: case Iop_Sh64Sx2: + case Iop_Sh8Ux16: case Iop_Sh16Ux8: case Iop_Sh32Ux4: case Iop_Sh64Ux2: + case Iop_Rsh8Sx16: case Iop_Rsh16Sx8: case Iop_Rsh32Sx4: case Iop_Rsh64Sx2: + case Iop_Rsh8Ux16: case Iop_Rsh16Ux8: case Iop_Rsh32Ux4: case Iop_Rsh64Ux2: + case Iop_QandQShrNnarrow16Uto8Ux8: + case Iop_QandQShrNnarrow32Uto16Ux4: case Iop_QandQShrNnarrow64Uto32Ux2: + case Iop_QandQSarNnarrow16Sto8Sx8: + case Iop_QandQSarNnarrow32Sto16Sx4: case Iop_QandQSarNnarrow64Sto32Sx2: + case Iop_QandQSarNnarrow16Sto8Ux8: + case Iop_QandQSarNnarrow32Sto16Ux4: case Iop_QandQSarNnarrow64Sto32Ux2: + case Iop_QandQRShrNnarrow16Uto8Ux8: + case Iop_QandQRShrNnarrow32Uto16Ux4: case Iop_QandQRShrNnarrow64Uto32Ux2: + case Iop_QandQRSarNnarrow16Sto8Sx8: + case Iop_QandQRSarNnarrow32Sto16Sx4: case Iop_QandQRSarNnarrow64Sto32Sx2: + case Iop_QandQRSarNnarrow16Sto8Ux8: + case Iop_QandQRSarNnarrow32Sto16Ux4: case Iop_QandQRSarNnarrow64Sto32Ux2: + case Iop_QNarrowBin16Sto8Ux16: case Iop_QNarrowBin32Sto16Ux8: + case Iop_QNarrowBin16Sto8Sx16: case Iop_QNarrowBin32Sto16Sx8: + case Iop_QNarrowBin16Uto8Ux16: case Iop_QNarrowBin32Uto16Ux8: + case Iop_NarrowBin16to8x16: case Iop_NarrowBin32to16x8: + case Iop_QNarrowBin64Sto32Sx4: case Iop_QNarrowBin64Uto32Ux4: + case Iop_NarrowBin64to32x4: + case Iop_NarrowUn16to8x8: case Iop_NarrowUn32to16x4: + case Iop_NarrowUn64to32x2: + case Iop_QNarrowUn16Sto8Sx8: case Iop_QNarrowUn32Sto16Sx4: + case Iop_QNarrowUn64Sto32Sx2: + case Iop_QNarrowUn16Sto8Ux8: case Iop_QNarrowUn32Sto16Ux4: + case Iop_QNarrowUn64Sto32Ux2: + case Iop_QNarrowUn16Uto8Ux8: case Iop_QNarrowUn32Uto16Ux4: + case Iop_QNarrowUn64Uto32Ux2: + case Iop_Widen8Uto16x8: case Iop_Widen16Uto32x4: case Iop_Widen32Uto64x2: + case Iop_Widen8Sto16x8: case Iop_Widen16Sto32x4: case Iop_Widen32Sto64x2: + case Iop_InterleaveHI8x16: case Iop_InterleaveHI16x8: + case Iop_InterleaveHI32x4: case Iop_InterleaveHI64x2: + case Iop_InterleaveLO8x16: case Iop_InterleaveLO16x8: + case Iop_InterleaveLO32x4: case Iop_InterleaveLO64x2: + case Iop_InterleaveOddLanes8x16: case Iop_InterleaveEvenLanes8x16: + case Iop_InterleaveOddLanes16x8: case Iop_InterleaveEvenLanes16x8: + case Iop_InterleaveOddLanes32x4: case Iop_InterleaveEvenLanes32x4: + case Iop_PackOddLanes8x16: case Iop_PackEvenLanes8x16: + case Iop_PackOddLanes16x8: case Iop_PackEvenLanes16x8: + case Iop_PackOddLanes32x4: case Iop_PackEvenLanes32x4: + case Iop_CatOddLanes8x16: case Iop_CatOddLanes16x8: case Iop_CatOddLanes32x4: + case Iop_CatEvenLanes8x16: case Iop_CatEvenLanes16x8: + case Iop_CatEvenLanes32x4: + case Iop_GetElem8x16: case Iop_GetElem16x8: case Iop_GetElem32x4: + case Iop_GetElem64x2: + case Iop_SetElem8x16: case Iop_SetElem16x8: case Iop_SetElem32x4: + case Iop_SetElem64x2: + case Iop_Dup8x16: case Iop_Dup16x8: case Iop_Dup32x4: + case Iop_SliceV128: case Iop_Reverse8sIn16_x8: + case Iop_Reverse8sIn32_x4: case Iop_Reverse16sIn32_x4: + case Iop_Reverse8sIn64_x2: case Iop_Reverse16sIn64_x2: + case Iop_Reverse32sIn64_x2: case Iop_Reverse1sIn8_x16: case Iop_Perm8x16: + case Iop_Perm32x4: case Iop_PermOrZero8x16: case Iop_Perm8x16x2: + case Iop_GetMSBs8x16: case Iop_RecipEst32Ux4: case Iop_RSqrtEst32Ux4: + case Iop_MulI128by10: case Iop_MulI128by10Carry: case Iop_MulI128by10E: + case Iop_MulI128by10ECarry: case Iop_V256to64_0: case Iop_V256to64_1: + case Iop_V256to64_2: case Iop_V256to64_3: case Iop_64x4toV256: + case Iop_V256toV128_0: case Iop_V256toV128_1: case Iop_V128HLtoV256: + case Iop_AndV256: case Iop_OrV256: case Iop_XorV256: + case Iop_NotV256: + case Iop_CmpNEZ8x32: case Iop_CmpNEZ16x16: case Iop_CmpNEZ32x8: + case Iop_CmpNEZ64x4: + case Iop_Add8x32: case Iop_Add16x16: case Iop_Add32x8: case Iop_Add64x4: + case Iop_Sub8x32: case Iop_Sub16x16: case Iop_Sub32x8: case Iop_Sub64x4: + case Iop_CmpEQ8x32: case Iop_CmpEQ16x16: case Iop_CmpEQ32x8: + case Iop_CmpEQ64x4: + case Iop_CmpGT8Sx32: case Iop_CmpGT16Sx16: case Iop_CmpGT32Sx8: + case Iop_CmpGT64Sx4: + case Iop_ShlN16x16: case Iop_ShlN32x8: case Iop_ShlN64x4: + case Iop_ShrN16x16: case Iop_ShrN32x8: case Iop_ShrN64x4: + case Iop_SarN16x16: case Iop_SarN32x8: + case Iop_Max8Sx32: case Iop_Max16Sx16: case Iop_Max32Sx8: + case Iop_Max8Ux32: case Iop_Max16Ux16: case Iop_Max32Ux8: + case Iop_Min8Sx32: case Iop_Min16Sx16: case Iop_Min32Sx8: + case Iop_Min8Ux32: case Iop_Min16Ux16: case Iop_Min32Ux8: + case Iop_Mul16x16: case Iop_Mul32x8: + case Iop_MulHi16Ux16: case Iop_MulHi16Sx16: + case Iop_QAdd8Ux32: case Iop_QAdd16Ux16: + case Iop_QAdd8Sx32: case Iop_QAdd16Sx16: + case Iop_QSub8Ux32: case Iop_QSub16Ux16: + case Iop_QSub8Sx32: case Iop_QSub16Sx16: + case Iop_Avg8Ux32: case Iop_Avg16Ux16: + case Iop_Perm32x8: + case Iop_CipherV128: case Iop_CipherLV128: case Iop_CipherSV128: + case Iop_NCipherV128: case Iop_NCipherLV128: + case Iop_SHA512: case Iop_SHA256: + case Iop_Add64Fx4: case Iop_Sub64Fx4: case Iop_Mul64Fx4: case Iop_Div64Fx4: + case Iop_Add32Fx8: case Iop_Sub32Fx8: case Iop_Mul32Fx8: case Iop_Div32Fx8: + case Iop_I32StoF32x8: case Iop_F32toI32Sx8: case Iop_F32toF16x8: + case Iop_F16toF32x8: case Iop_Sqrt32Fx8: case Iop_Sqrt64Fx4: + case Iop_RSqrtEst32Fx8: case Iop_RecipEst32Fx8: + case Iop_Max32Fx8: case Iop_Min32Fx8: + case Iop_Max64Fx4: case Iop_Min64Fx4: + case Iop_Rotx32: case Iop_Rotx64: + return False; + + default: + vpanic("primopMightTrap"); + + } +} + void ppIRExpr ( const IRExpr* e ) { Int i; diff --git a/VEX/pub/libvex_ir.h b/VEX/pub/libvex_ir.h index 087a414..9120a49 100644 --- a/VEX/pub/libvex_ir.h +++ b/VEX/pub/libvex_ir.h @@ -2013,6 +2013,11 @@ extern void typeOfPrimop ( IROp op, /*OUTs*/ IRType* t_dst, IRType* t_arg1, IRType* t_arg2, IRType* t_arg3, IRType* t_arg4 ); +/* Might the given primop trap (eg, attempt integer division by zero)? If in + doubt returns True. However, the vast majority of primops will never + trap. */ +extern Bool primopMightTrap ( IROp op ); + /* Encoding of IEEE754-specified rounding modes. Note, various front and back ends rely on the actual numerical values of these, so do not change them. */ |
|
From: Julian S. <se...@so...> - 2020-01-02 06:28:20
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=558f5e9517a4b6acf915d5f2d8083722c327a487 commit 558f5e9517a4b6acf915d5f2d8083722c327a487 Author: Julian Seward <js...@ac...> Date: Mon Oct 21 11:19:59 2019 +0200 Initial implementation of C-source-level &&-idiom recovery This branch contains code which avoids Memcheck false positives resulting from gcc and clang creating branches on uninitialised data. For example: bool isClosed; if (src.isRect(..., &isClosed, ...) && isClosed) { clang9 -O2 compiles this as: callq 7e7cdc0 <_ZNK6SkPath6isRectEP6SkRectPbPNS_9DirectionE> cmpb $0x0,-0x60(%rbp) // "if (isClosed) { .." je 7ed9e08 // "je after" test %al,%al // "if (return value of call is nonzero) { .." je 7ed9e08 // "je after" .. after: That is, the && has been evaluated right-to-left. This is a correct transformation if the compiler can prove that the call to |isRect| returns |false| along any path on which it does not write its out-parameter |&isClosed|. In general, for the lazy-semantics (L->R) C-source-level && operator, we have |A && B| == |B && A| if you can prove that |B| is |false| whenever A is undefined. I assume that clang has some kind of interprocedural analysis that tells it that. The compiler is further obliged to show that |B| won't trap, since it is now being evaluated speculatively, but that's no big deal to prove. A similar result holds, per de Morgan, for transformations involving the C language ||. Memcheck correctly handles bitwise &&/|| in the presence of undefined inputs. It has done so since the beginning. However, it assumes that every conditional branch in the program is important -- any branch on uninitialised data is an error. However, this idiom demonstrates otherwise. It defeats Memcheck's existing &&/|| handling because the &&/|| is spread across two basic blocks, rather than being bitwise. This initial commit contains a complete initial implementation to fix that. The basic idea is to detect the && condition spread across two blocks, and transform it into a single block using bitwise &&. Then Memcheck's existing accurate instrumentation of bitwise && will correctly handle it. The transformation is <contents of basic block A> C1 = ... if (!C1) goto after .. falls through to .. <contents of basic block B> C2 = ... if (!C2) goto after .. falls through to .. after: ===> <contents of basic block A> C1 = ... <contents of basic block B, conditional on C1> C2 = ... if (!C1 && !C2) goto after .. falls through to .. after: This assumes that <contents of basic block B> can be conditionalised, at the IR level, so that the guest state is not modified if C1 is |false|. That's not possible for all IRStmt kinds, but it is possible for a large enough subset to make this transformation feasible. There is no corresponding transformation that recovers an || condition, because, per de Morgan, that merely corresponds to swapping the side exits vs fallthoughs, and inverting the sense of the tests, and the pattern-recogniser as implemented checks all possible combinations already. The analysis and block-building is performed on the IR returned by the architecture specific front ends. So they are almost not modified at all: in fact they are simplified because all logic related to chasing through unconditional and conditional branches has been removed from them, redone at the IR level, and centralised. The only file with big changes is the IRSB constructor logic, guest_generic_bb_to_IR.c (a.k.a the "trace builder"). This is a complete rewrite. There is some additional work for the IR optimiser (ir_opt.c), since that needs to do a quick initial simplification pass of the basic blocks, in order to reduce the number of different IR variants that the trace-builder has to pattern match on. An important followup task is to further reduce this cost. There are two new IROps to support this: And1 and Or1, which both operate on Ity_I1. They are regarded as evaluating both arguments, consistent with AndXX and OrXX for all other sizes. It is possible to synthesise at the IR level by widening the value to Ity_I8 or above, doing bitwise And/Or, and re-narrowing it, but this gives inefficient code, so I chose to represent them directly. The transformation appears to work for amd64-linux. In principle -- because it operates entirely at the IR level -- it should work for all targets, providing the initial pre-simplification pass can normalise the block ends into the required form. That will no doubt require some tuning. And1 and Or1 will have to be implemented in all instruction selectors, but that's easy enough. Remaining FIXMEs in the code: * Rename `expr_is_speculatable` et al to `expr_is_conditionalisable`. These functions merely conditionalise code; the speculation has already been done by gcc/clang. * `expr_is_speculatable`: properly check that Iex_Unop/Binop don't contain operatins that might trap (Div, Rem, etc). * `analyse_block_end`: recognise all block ends, and abort on ones that can't be recognised. Needed to ensure we don't miss any cases. * maybe: guest_amd64_toIR.c: generate better code for And1/Or1 * ir_opt.c, do_iropt_BB: remove the initial flattening pass since presimp will already have done it * ir_opt.c, do_minimal_initial_iropt_BB (a.k.a. presimp). Make this as cheap as possible. In particular, calling `cprop_BB_wrk` is total overkill since we only need copy propagation. * ir_opt.c: once the above is done, remove boolean parameter for `cprop_BB_wrk`. * ir_opt.c: concatenate_irsbs: maybe de-dup w.r.t. maybe_unroll_loop_BB. * remove option `guest_chase_cond` from VexControl (?). It was never used. * convert option `guest_chase_thresh` from VexControl (?) into a Bool, since the revised code here only cares about the 0-vs-nonzero distinction now. Diff: --- VEX/priv/guest_amd64_defs.h | 3 - VEX/priv/guest_amd64_toIR.c | 186 +--- VEX/priv/guest_arm64_defs.h | 3 - VEX/priv/guest_arm64_toIR.c | 15 - VEX/priv/guest_arm_defs.h | 3 - VEX/priv/guest_arm_toIR.c | 108 +-- VEX/priv/guest_generic_bb_to_IR.c | 1874 +++++++++++++++++++++++++------------ VEX/priv/guest_generic_bb_to_IR.h | 49 +- VEX/priv/guest_mips_defs.h | 3 - VEX/priv/guest_mips_toIR.c | 85 +- VEX/priv/guest_nanomips_defs.h | 3 - VEX/priv/guest_nanomips_toIR.c | 5 - VEX/priv/guest_ppc_defs.h | 3 - VEX/priv/guest_ppc_toIR.c | 42 +- VEX/priv/guest_s390_defs.h | 3 - VEX/priv/guest_s390_toIR.c | 39 +- VEX/priv/guest_x86_defs.h | 3 - VEX/priv/guest_x86_toIR.c | 153 +-- VEX/priv/host_amd64_isel.c | 23 +- VEX/priv/ir_defs.c | 9 +- VEX/priv/ir_opt.c | 99 +- VEX/priv/ir_opt.h | 6 + VEX/pub/libvex.h | 2 + VEX/pub/libvex_ir.h | 2 + memcheck/mc_translate.c | 46 +- memcheck/tests/vbit-test/binary.c | 6 + memcheck/tests/vbit-test/irops.c | 4 +- memcheck/tests/vbit-test/vbits.c | 2 + memcheck/tests/vbit-test/vbits.h | 4 +- 29 files changed, 1571 insertions(+), 1212 deletions(-) diff --git a/VEX/priv/guest_amd64_defs.h b/VEX/priv/guest_amd64_defs.h index a5de527..54672dc 100644 --- a/VEX/priv/guest_amd64_defs.h +++ b/VEX/priv/guest_amd64_defs.h @@ -49,9 +49,6 @@ guest_generic_bb_to_IR.h. */ extern DisResult disInstr_AMD64 ( IRSB* irbb, - Bool (*resteerOkFn) ( void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const UChar* guest_code, Long delta, Addr guest_IP, diff --git a/VEX/priv/guest_amd64_toIR.c b/VEX/priv/guest_amd64_toIR.c index 1092419..fadf47d 100644 --- a/VEX/priv/guest_amd64_toIR.c +++ b/VEX/priv/guest_amd64_toIR.c @@ -2291,7 +2291,6 @@ static void jmp_lit( /*MOD*/DisResult* dres, { vassert(dres->whatNext == Dis_Continue); vassert(dres->len == 0); - vassert(dres->continueAt == 0); vassert(dres->jk_StopHere == Ijk_INVALID); dres->whatNext = Dis_StopHere; dres->jk_StopHere = kind; @@ -2303,7 +2302,6 @@ static void jmp_treg( /*MOD*/DisResult* dres, { vassert(dres->whatNext == Dis_Continue); vassert(dres->len == 0); - vassert(dres->continueAt == 0); vassert(dres->jk_StopHere == Ijk_INVALID); dres->whatNext = Dis_StopHere; dres->jk_StopHere = kind; @@ -2318,7 +2316,6 @@ void jcc_01 ( /*MOD*/DisResult* dres, AMD64Condcode condPos; vassert(dres->whatNext == Dis_Continue); vassert(dres->len == 0); - vassert(dres->continueAt == 0); vassert(dres->jk_StopHere == Ijk_INVALID); dres->whatNext = Dis_StopHere; dres->jk_StopHere = Ijk_Boring; @@ -19846,9 +19843,6 @@ static Long dis_ESC_NONE ( /*MB_OUT*/DisResult* dres, /*MB_OUT*/Bool* expect_CAS, - Bool (*resteerOkFn) ( /*opaque*/void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const VexArchInfo* archinfo, const VexAbiInfo* vbi, Prefix pfx, Int sz, Long deltaIN @@ -20258,53 +20252,10 @@ Long dis_ESC_NONE ( vassert(-128 <= jmpDelta && jmpDelta < 128); d64 = (guest_RIP_bbstart+delta+1) + jmpDelta; delta++; - if (resteerCisOk - && vex_control.guest_chase_cond - && (Addr64)d64 != (Addr64)guest_RIP_bbstart - && jmpDelta < 0 - && resteerOkFn( callback_opaque, (Addr64)d64) ) { - /* Speculation: assume this backward branch is taken. So we - need to emit a side-exit to the insn following this one, - on the negation of the condition, and continue at the - branch target address (d64). If we wind up back at the - first instruction of the trace, just stop; it's better to - let the IR loop unroller handle that case. */ - stmt( IRStmt_Exit( - mk_amd64g_calculate_condition( - (AMD64Condcode)(1 ^ (opc - 0x70))), - Ijk_Boring, - IRConst_U64(guest_RIP_bbstart+delta), - OFFB_RIP ) ); - dres->whatNext = Dis_ResteerC; - dres->continueAt = d64; - comment = "(assumed taken)"; - } - else - if (resteerCisOk - && vex_control.guest_chase_cond - && (Addr64)d64 != (Addr64)guest_RIP_bbstart - && jmpDelta >= 0 - && resteerOkFn( callback_opaque, guest_RIP_bbstart+delta ) ) { - /* Speculation: assume this forward branch is not taken. So - we need to emit a side-exit to d64 (the dest) and continue - disassembling at the insn immediately following this - one. */ - stmt( IRStmt_Exit( - mk_amd64g_calculate_condition((AMD64Condcode)(opc - 0x70)), - Ijk_Boring, - IRConst_U64(d64), - OFFB_RIP ) ); - dres->whatNext = Dis_ResteerC; - dres->continueAt = guest_RIP_bbstart+delta; - comment = "(assumed not taken)"; - } - else { - /* Conservative default translation - end the block at this - point. */ - jcc_01( dres, (AMD64Condcode)(opc - 0x70), - guest_RIP_bbstart+delta, d64 ); - vassert(dres->whatNext == Dis_StopHere); - } + /* End the block at this point. */ + jcc_01( dres, (AMD64Condcode)(opc - 0x70), + guest_RIP_bbstart+delta, d64 ); + vassert(dres->whatNext == Dis_StopHere); DIP("j%s-8 0x%llx %s\n", name_AMD64Condcode(opc - 0x70), (ULong)d64, comment); return delta; @@ -21434,14 +21385,8 @@ Long dis_ESC_NONE ( t2 = newTemp(Ity_I64); assign(t2, mkU64((Addr64)d64)); make_redzone_AbiHint(vbi, t1, t2/*nia*/, "call-d32"); - if (resteerOkFn( callback_opaque, (Addr64)d64) ) { - /* follow into the call target. */ - dres->whatNext = Dis_ResteerU; - dres->continueAt = d64; - } else { - jmp_lit(dres, Ijk_Call, d64); - vassert(dres->whatNext == Dis_StopHere); - } + jmp_lit(dres, Ijk_Call, d64); + vassert(dres->whatNext == Dis_StopHere); DIP("call 0x%llx\n", (ULong)d64); return delta; @@ -21452,13 +21397,8 @@ Long dis_ESC_NONE ( if (haveF2(pfx)) DIP("bnd ; "); /* MPX bnd prefix. */ d64 = (guest_RIP_bbstart+delta+sz) + getSDisp(sz,delta); delta += sz; - if (resteerOkFn(callback_opaque, (Addr64)d64)) { - dres->whatNext = Dis_ResteerU; - dres->continueAt = d64; - } else { - jmp_lit(dres, Ijk_Boring, d64); - vassert(dres->whatNext == Dis_StopHere); - } + jmp_lit(dres, Ijk_Boring, d64); + vassert(dres->whatNext == Dis_StopHere); DIP("jmp 0x%llx\n", (ULong)d64); return delta; @@ -21469,13 +21409,8 @@ Long dis_ESC_NONE ( if (haveF2(pfx)) DIP("bnd ; "); /* MPX bnd prefix. */ d64 = (guest_RIP_bbstart+delta+1) + getSDisp8(delta); delta++; - if (resteerOkFn(callback_opaque, (Addr64)d64)) { - dres->whatNext = Dis_ResteerU; - dres->continueAt = d64; - } else { - jmp_lit(dres, Ijk_Boring, d64); - vassert(dres->whatNext == Dis_StopHere); - } + jmp_lit(dres, Ijk_Boring, d64); + vassert(dres->whatNext == Dis_StopHere); DIP("jmp-8 0x%llx\n", (ULong)d64); return delta; @@ -21658,9 +21593,6 @@ static Long dis_ESC_0F ( /*MB_OUT*/DisResult* dres, /*MB_OUT*/Bool* expect_CAS, - Bool (*resteerOkFn) ( /*opaque*/void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const VexArchInfo* archinfo, const VexAbiInfo* vbi, Prefix pfx, Int sz, Long deltaIN @@ -21910,56 +21842,10 @@ Long dis_ESC_0F ( jmpDelta = getSDisp32(delta); d64 = (guest_RIP_bbstart+delta+4) + jmpDelta; delta += 4; - if (resteerCisOk - && vex_control.guest_chase_cond - && (Addr64)d64 != (Addr64)guest_RIP_bbstart - && jmpDelta < 0 - && resteerOkFn( callback_opaque, (Addr64)d64) ) { - /* Speculation: assume this backward branch is taken. So - we need to emit a side-exit to the insn following this - one, on the negation of the condition, and continue at - the branch target address (d64). If we wind up back at - the first instruction of the trace, just stop; it's - better to let the IR loop unroller handle that case. */ - stmt( IRStmt_Exit( - mk_amd64g_calculate_condition( - (AMD64Condcode)(1 ^ (opc - 0x80))), - Ijk_Boring, - IRConst_U64(guest_RIP_bbstart+delta), - OFFB_RIP - )); - dres->whatNext = Dis_ResteerC; - dres->continueAt = d64; - comment = "(assumed taken)"; - } - else - if (resteerCisOk - && vex_control.guest_chase_cond - && (Addr64)d64 != (Addr64)guest_RIP_bbstart - && jmpDelta >= 0 - && resteerOkFn( callback_opaque, guest_RIP_bbstart+delta ) ) { - /* Speculation: assume this forward branch is not taken. - So we need to emit a side-exit to d64 (the dest) and - continue disassembling at the insn immediately - following this one. */ - stmt( IRStmt_Exit( - mk_amd64g_calculate_condition((AMD64Condcode) - (opc - 0x80)), - Ijk_Boring, - IRConst_U64(d64), - OFFB_RIP - )); - dres->whatNext = Dis_ResteerC; - dres->continueAt = guest_RIP_bbstart+delta; - comment = "(assumed not taken)"; - } - else { - /* Conservative default translation - end the block at - this point. */ - jcc_01( dres, (AMD64Condcode)(opc - 0x80), - guest_RIP_bbstart+delta, d64 ); - vassert(dres->whatNext == Dis_StopHere); - } + /* End the block at this point. */ + jcc_01( dres, (AMD64Condcode)(opc - 0x80), + guest_RIP_bbstart+delta, d64 ); + vassert(dres->whatNext == Dis_StopHere); DIP("j%s-32 0x%llx %s\n", name_AMD64Condcode(opc - 0x80), (ULong)d64, comment); return delta; @@ -22727,9 +22613,6 @@ __attribute__((noinline)) static Long dis_ESC_0F38 ( /*MB_OUT*/DisResult* dres, - Bool (*resteerOkFn) ( /*opaque*/void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const VexArchInfo* archinfo, const VexAbiInfo* vbi, Prefix pfx, Int sz, Long deltaIN @@ -22845,9 +22728,6 @@ __attribute__((noinline)) static Long dis_ESC_0F3A ( /*MB_OUT*/DisResult* dres, - Bool (*resteerOkFn) ( /*opaque*/void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const VexArchInfo* archinfo, const VexAbiInfo* vbi, Prefix pfx, Int sz, Long deltaIN @@ -24187,9 +24067,6 @@ static Long dis_ESC_0F__VEX ( /*MB_OUT*/DisResult* dres, /*OUT*/ Bool* uses_vvvv, - Bool (*resteerOkFn) ( /*opaque*/void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const VexArchInfo* archinfo, const VexAbiInfo* vbi, Prefix pfx, Int sz, Long deltaIN @@ -28158,9 +28035,6 @@ static Long dis_ESC_0F38__VEX ( /*MB_OUT*/DisResult* dres, /*OUT*/ Bool* uses_vvvv, - Bool (*resteerOkFn) ( /*opaque*/void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const VexArchInfo* archinfo, const VexAbiInfo* vbi, Prefix pfx, Int sz, Long deltaIN @@ -30585,9 +30459,6 @@ static Long dis_ESC_0F3A__VEX ( /*MB_OUT*/DisResult* dres, /*OUT*/ Bool* uses_vvvv, - Bool (*resteerOkFn) ( /*opaque*/void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const VexArchInfo* archinfo, const VexAbiInfo* vbi, Prefix pfx, Int sz, Long deltaIN @@ -32206,9 +32077,6 @@ Long dis_ESC_0F3A__VEX ( static DisResult disInstr_AMD64_WRK ( /*OUT*/Bool* expect_CAS, - Bool (*resteerOkFn) ( /*opaque*/void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, Long delta64, const VexArchInfo* archinfo, const VexAbiInfo* vbi, @@ -32241,7 +32109,6 @@ DisResult disInstr_AMD64_WRK ( /* Set result defaults. */ dres.whatNext = Dis_Continue; dres.len = 0; - dres.continueAt = 0; dres.jk_StopHere = Ijk_INVALID; dres.hint = Dis_HintNone; *expect_CAS = False; @@ -32503,22 +32370,18 @@ DisResult disInstr_AMD64_WRK ( switch (esc) { case ESC_NONE: delta = dis_ESC_NONE( &dres, expect_CAS, - resteerOkFn, resteerCisOk, callback_opaque, archinfo, vbi, pfx, sz, delta ); break; case ESC_0F: delta = dis_ESC_0F ( &dres, expect_CAS, - resteerOkFn, resteerCisOk, callback_opaque, archinfo, vbi, pfx, sz, delta ); break; case ESC_0F38: delta = dis_ESC_0F38( &dres, - resteerOkFn, resteerCisOk, callback_opaque, archinfo, vbi, pfx, sz, delta ); break; case ESC_0F3A: delta = dis_ESC_0F3A( &dres, - resteerOkFn, resteerCisOk, callback_opaque, archinfo, vbi, pfx, sz, delta ); break; default: @@ -32533,20 +32396,14 @@ DisResult disInstr_AMD64_WRK ( switch (esc) { case ESC_0F: delta = dis_ESC_0F__VEX ( &dres, &uses_vvvv, - resteerOkFn, resteerCisOk, - callback_opaque, archinfo, vbi, pfx, sz, delta ); break; case ESC_0F38: delta = dis_ESC_0F38__VEX ( &dres, &uses_vvvv, - resteerOkFn, resteerCisOk, - callback_opaque, archinfo, vbi, pfx, sz, delta ); break; case ESC_0F3A: delta = dis_ESC_0F3A__VEX ( &dres, &uses_vvvv, - resteerOkFn, resteerCisOk, - callback_opaque, archinfo, vbi, pfx, sz, delta ); break; case ESC_NONE: @@ -32630,10 +32487,6 @@ DisResult disInstr_AMD64_WRK ( case Dis_Continue: stmt( IRStmt_Put( OFFB_RIP, mkU64(guest_RIP_bbstart + delta) ) ); break; - case Dis_ResteerU: - case Dis_ResteerC: - stmt( IRStmt_Put( OFFB_RIP, mkU64(dres.continueAt) ) ); - break; case Dis_StopHere: break; default: @@ -32657,9 +32510,6 @@ DisResult disInstr_AMD64_WRK ( is located in host memory at &guest_code[delta]. */ DisResult disInstr_AMD64 ( IRSB* irsb_IN, - Bool (*resteerOkFn) ( void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const UChar* guest_code_IN, Long delta, Addr guest_IP, @@ -32687,9 +32537,7 @@ DisResult disInstr_AMD64 ( IRSB* irsb_IN, x1 = irsb_IN->stmts_used; expect_CAS = False; - dres = disInstr_AMD64_WRK ( &expect_CAS, resteerOkFn, - resteerCisOk, - callback_opaque, + dres = disInstr_AMD64_WRK ( &expect_CAS, delta, archinfo, abiinfo, sigill_diag_IN ); x2 = irsb_IN->stmts_used; vassert(x2 >= x1); @@ -32720,9 +32568,7 @@ DisResult disInstr_AMD64 ( IRSB* irsb_IN, /* inconsistency detected. re-disassemble the instruction so as to generate a useful error message; then assert. */ vex_traceflags |= VEX_TRACE_FE; - dres = disInstr_AMD64_WRK ( &expect_CAS, resteerOkFn, - resteerCisOk, - callback_opaque, + dres = disInstr_AMD64_WRK ( &expect_CAS, delta, archinfo, abiinfo, sigill_diag_IN ); for (i = x1; i < x2; i++) { vex_printf("\t\t"); diff --git a/VEX/priv/guest_arm64_defs.h b/VEX/priv/guest_arm64_defs.h index 319d601..b2094d6 100644 --- a/VEX/priv/guest_arm64_defs.h +++ b/VEX/priv/guest_arm64_defs.h @@ -39,9 +39,6 @@ guest_generic_bb_to_IR.h. */ extern DisResult disInstr_ARM64 ( IRSB* irbb, - Bool (*resteerOkFn) ( void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const UChar* guest_code, Long delta, Addr guest_IP, diff --git a/VEX/priv/guest_arm64_toIR.c b/VEX/priv/guest_arm64_toIR.c index 2589ddf..6eb896c 100644 --- a/VEX/priv/guest_arm64_toIR.c +++ b/VEX/priv/guest_arm64_toIR.c @@ -6885,7 +6885,6 @@ Bool dis_ARM64_branch_etc(/*MB_OUT*/DisResult* dres, UInt insn, Long simm64 = (Long)sx_to_64(uimm64, 21); vassert(dres->whatNext == Dis_Continue); vassert(dres->len == 4); - vassert(dres->continueAt == 0); vassert(dres->jk_StopHere == Ijk_INVALID); stmt( IRStmt_Exit(unop(Iop_64to1, mk_arm64g_calculate_condition(cond)), Ijk_Boring, @@ -14797,9 +14796,6 @@ Bool dis_ARM64_simd_and_fp(/*MB_OUT*/DisResult* dres, UInt insn) static Bool disInstr_ARM64_WRK ( /*MB_OUT*/DisResult* dres, - Bool (*resteerOkFn) ( /*opaque*/void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const UChar* guest_instr, const VexArchInfo* archinfo, const VexAbiInfo* abiinfo @@ -14823,7 +14819,6 @@ Bool disInstr_ARM64_WRK ( /* Set result defaults. */ dres->whatNext = Dis_Continue; dres->len = 4; - dres->continueAt = 0; dres->jk_StopHere = Ijk_INVALID; dres->hint = Dis_HintNone; @@ -14959,7 +14954,6 @@ Bool disInstr_ARM64_WRK ( if (!ok) { vassert(dres->whatNext == Dis_Continue); vassert(dres->len == 4); - vassert(dres->continueAt == 0); vassert(dres->jk_StopHere == Ijk_INVALID); } @@ -14977,9 +14971,6 @@ Bool disInstr_ARM64_WRK ( is located in host memory at &guest_code[delta]. */ DisResult disInstr_ARM64 ( IRSB* irsb_IN, - Bool (*resteerOkFn) ( void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const UChar* guest_code_IN, Long delta_IN, Addr guest_IP, @@ -15006,7 +14997,6 @@ DisResult disInstr_ARM64 ( IRSB* irsb_IN, /* Try to decode */ Bool ok = disInstr_ARM64_WRK( &dres, - resteerOkFn, resteerCisOk, callback_opaque, &guest_code_IN[delta_IN], archinfo, abiinfo ); if (ok) { @@ -15016,10 +15006,6 @@ DisResult disInstr_ARM64 ( IRSB* irsb_IN, case Dis_Continue: putPC( mkU64(dres.len + guest_PC_curr_instr) ); break; - case Dis_ResteerU: - case Dis_ResteerC: - putPC(mkU64(dres.continueAt)); - break; case Dis_StopHere: break; default: @@ -15054,7 +15040,6 @@ DisResult disInstr_ARM64 ( IRSB* irsb_IN, dres.len = 0; dres.whatNext = Dis_StopHere; dres.jk_StopHere = Ijk_NoDecode; - dres.continueAt = 0; } return dres; } diff --git a/VEX/priv/guest_arm_defs.h b/VEX/priv/guest_arm_defs.h index 58bbbd0..85521e7 100644 --- a/VEX/priv/guest_arm_defs.h +++ b/VEX/priv/guest_arm_defs.h @@ -41,9 +41,6 @@ geust_generic_ bb_to_IR.h. */ extern DisResult disInstr_ARM ( IRSB* irbb, - Bool (*resteerOkFn) ( void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const UChar* guest_code, Long delta, Addr guest_IP, diff --git a/VEX/priv/guest_arm_toIR.c b/VEX/priv/guest_arm_toIR.c index 50c97e9..6027d47 100644 --- a/VEX/priv/guest_arm_toIR.c +++ b/VEX/priv/guest_arm_toIR.c @@ -16077,9 +16077,6 @@ static Bool decode_NV_instruction_ARMv7_and_below static DisResult disInstr_ARM_WRK ( - Bool (*resteerOkFn) ( /*opaque*/void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const UChar* guest_instr, const VexArchInfo* archinfo, const VexAbiInfo* abiinfo, @@ -16099,7 +16096,6 @@ DisResult disInstr_ARM_WRK ( /* Set result defaults. */ dres.whatNext = Dis_Continue; dres.len = 4; - dres.continueAt = 0; dres.jk_StopHere = Ijk_INVALID; dres.hint = Dis_HintNone; @@ -17034,75 +17030,19 @@ DisResult disInstr_ARM_WRK ( condT, Ijk_Boring); } if (condT == IRTemp_INVALID) { - /* unconditional transfer to 'dst'. See if we can simply - continue tracing at the destination. */ - if (resteerOkFn( callback_opaque, dst )) { - /* yes */ - dres.whatNext = Dis_ResteerU; - dres.continueAt = dst; - } else { - /* no; terminate the SB at this point. */ - llPutIReg(15, mkU32(dst)); - dres.jk_StopHere = jk; - dres.whatNext = Dis_StopHere; - } + /* Unconditional transfer to 'dst'. Terminate the SB at this point. */ + llPutIReg(15, mkU32(dst)); + dres.jk_StopHere = jk; + dres.whatNext = Dis_StopHere; DIP("b%s 0x%x\n", link ? "l" : "", dst); } else { - /* conditional transfer to 'dst' */ - const HChar* comment = ""; - - /* First see if we can do some speculative chasing into one - arm or the other. Be conservative and only chase if - !link, that is, this is a normal conditional branch to a - known destination. */ - if (!link - && resteerCisOk - && vex_control.guest_chase_cond - && dst < guest_R15_curr_instr_notENC - && resteerOkFn( callback_opaque, dst) ) { - /* Speculation: assume this backward branch is taken. So - we need to emit a side-exit to the insn following this - one, on the negation of the condition, and continue at - the branch target address (dst). */ - stmt( IRStmt_Exit( unop(Iop_Not1, - unop(Iop_32to1, mkexpr(condT))), - Ijk_Boring, - IRConst_U32(guest_R15_curr_instr_notENC+4), - OFFB_R15T )); - dres.whatNext = Dis_ResteerC; - dres.continueAt = (Addr32)dst; - comment = "(assumed taken)"; - } - else - if (!link - && resteerCisOk - && vex_control.guest_chase_cond - && dst >= guest_R15_curr_instr_notENC - && resteerOkFn( callback_opaque, - guest_R15_curr_instr_notENC+4) ) { - /* Speculation: assume this forward branch is not taken. - So we need to emit a side-exit to dst (the dest) and - continue disassembling at the insn immediately - following this one. */ - stmt( IRStmt_Exit( unop(Iop_32to1, mkexpr(condT)), - Ijk_Boring, - IRConst_U32(dst), - OFFB_R15T )); - dres.whatNext = Dis_ResteerC; - dres.continueAt = guest_R15_curr_instr_notENC+4; - comment = "(assumed not taken)"; - } - else { - /* Conservative default translation - end the block at - this point. */ - stmt( IRStmt_Exit( unop(Iop_32to1, mkexpr(condT)), - jk, IRConst_U32(dst), OFFB_R15T )); - llPutIReg(15, mkU32(guest_R15_curr_instr_notENC + 4)); - dres.jk_StopHere = Ijk_Boring; - dres.whatNext = Dis_StopHere; - } - DIP("b%s%s 0x%x %s\n", link ? "l" : "", nCC(INSN_COND), - dst, comment); + /* Conditional transfer to 'dst'. Terminate the SB at this point. */ + stmt( IRStmt_Exit( unop(Iop_32to1, mkexpr(condT)), + jk, IRConst_U32(dst), OFFB_R15T )); + llPutIReg(15, mkU32(guest_R15_curr_instr_notENC + 4)); + dres.jk_StopHere = Ijk_Boring; + dres.whatNext = Dis_StopHere; + DIP("b%s%s 0x%x\n", link ? "l" : "", nCC(INSN_COND), dst); } goto decode_success; } @@ -18896,7 +18836,6 @@ DisResult disInstr_ARM_WRK ( dres.len = 0; dres.whatNext = Dis_StopHere; dres.jk_StopHere = Ijk_NoDecode; - dres.continueAt = 0; return dres; decode_success: @@ -18953,10 +18892,6 @@ DisResult disInstr_ARM_WRK ( case Dis_Continue: llPutIReg(15, mkU32(dres.len + guest_R15_curr_instr_notENC)); break; - case Dis_ResteerU: - case Dis_ResteerC: - llPutIReg(15, mkU32(dres.continueAt)); - break; case Dis_StopHere: break; default: @@ -18989,9 +18924,6 @@ static const UChar it_length_table[256]; /* fwds */ static DisResult disInstr_THUMB_WRK ( - Bool (*resteerOkFn) ( /*opaque*/void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const UChar* guest_instr, const VexArchInfo* archinfo, const VexAbiInfo* abiinfo, @@ -19016,7 +18948,6 @@ DisResult disInstr_THUMB_WRK ( /* Set result defaults. */ dres.whatNext = Dis_Continue; dres.len = 2; - dres.continueAt = 0; dres.jk_StopHere = Ijk_INVALID; dres.hint = Dis_HintNone; @@ -20761,7 +20692,6 @@ DisResult disInstr_THUMB_WRK ( /* Change result defaults to suit 32-bit insns. */ vassert(dres.whatNext == Dis_Continue); vassert(dres.len == 2); - vassert(dres.continueAt == 0); dres.len = 4; /* ---------------- BL/BLX simm26 ---------------- */ @@ -23531,7 +23461,6 @@ DisResult disInstr_THUMB_WRK ( dres.len = 0; dres.whatNext = Dis_StopHere; dres.jk_StopHere = Ijk_NoDecode; - dres.continueAt = 0; return dres; decode_success: @@ -23541,10 +23470,6 @@ DisResult disInstr_THUMB_WRK ( case Dis_Continue: llPutIReg(15, mkU32(dres.len + (guest_R15_curr_instr_notENC | 1))); break; - case Dis_ResteerU: - case Dis_ResteerC: - llPutIReg(15, mkU32(dres.continueAt)); - break; case Dis_StopHere: break; default: @@ -23650,9 +23575,6 @@ static const UChar it_length_table[256] is located in host memory at &guest_code[delta]. */ DisResult disInstr_ARM ( IRSB* irsb_IN, - Bool (*resteerOkFn) ( void*, Addr ), - Bool resteerCisOk, - void* callback_opaque, const UChar* guest_code_IN, Long delta_ENCODED, Addr guest_IP_ENCODED, @@ -23679,14 +23601,10 @@ DisResult disInstr_ARM ( IRSB* irsb_IN, } if (isThumb) { - dres = disInstr_THUMB_WRK ( resteerOkFn, - resteerCisOk, callback_opaque, - &guest_code_IN[delta_ENCODED - 1], + dres = disInstr_THUMB_WRK ( &guest_code_IN[delta_ENCODED - 1], archinfo, abiinfo, sigill_diag_IN ); } else { - dres = disInstr_ARM_WRK ( resteerOkFn, - resteerCisOk, callback_opaque, - &guest_code_IN[delta_ENCODED], + dres = disInstr_ARM_WRK ( &guest_code_IN[delta_ENCODED], archinfo, abiinfo, sigill_diag_IN ); } diff --git a/VEX/priv/guest_generic_bb_to_IR.c b/VEX/priv/guest_generic_bb_to_IR.c index 9596787..4cb813f 100644 --- a/VEX/priv/guest_generic_bb_to_IR.c +++ b/VEX/priv/guest_generic_bb_to_IR.c @@ -37,290 +37,738 @@ #include "main_util.h" #include "main_globals.h" #include "guest_generic_bb_to_IR.h" +#include "ir_opt.h" +/*--------------------------------------------------------------*/ +/*--- Forwards for fns called by self-checking translations ---*/ +/*--------------------------------------------------------------*/ + /* Forwards .. */ -VEX_REGPARM(2) -static UInt genericg_compute_checksum_4al ( HWord first_w32, HWord n_w32s ); -VEX_REGPARM(1) -static UInt genericg_compute_checksum_4al_1 ( HWord first_w32 ); -VEX_REGPARM(1) -static UInt genericg_compute_checksum_4al_2 ( HWord first_w32 ); -VEX_REGPARM(1) -static UInt genericg_compute_checksum_4al_3 ( HWord first_w32 ); -VEX_REGPARM(1) -static UInt genericg_compute_checksum_4al_4 ( HWord first_w32 ); -VEX_REGPARM(1) -static UInt genericg_compute_checksum_4al_5 ( HWord first_w32 ); -VEX_REGPARM(1) -static UInt genericg_compute_checksum_4al_6 ( HWord first_w32 ); -VEX_REGPARM(1) -static UInt genericg_compute_checksum_4al_7 ( HWord first_w32 ); -VEX_REGPARM(1) -static UInt genericg_compute_checksum_4al_8 ( HWord first_w32 ); -VEX_REGPARM(1) -static UInt genericg_compute_checksum_4al_9 ( HWord first_w32 ); -VEX_REGPARM(1) -static UInt genericg_compute_checksum_4al_10 ( HWord first_w32 ); -VEX_REGPARM(1) -static UInt genericg_compute_checksum_4al_11 ( HWord first_w32 ); -VEX_REGPARM(1) -static UInt genericg_compute_checksum_4al_12 ( HWord first_w32 ); +VEX_REGPARM(2) static UInt genericg_compute_checksum_4al ( HWord first_w32, + HWord n_w32s ); +VEX_REGPARM(1) static UInt genericg_compute_checksum_4al_1 ( HWord first_w32 ); +VEX_REGPARM(1) static UInt genericg_compute_checksum_4al_2 ( HWord first_w32 ); +VEX_REGPARM(1) static UInt genericg_compute_checksum_4al_3 ( HWord first_w32 ); +VEX_REGPARM(1) static UInt genericg_compute_checksum_4al_4 ( HWord first_w32 ); +VEX_REGPARM(1) static UInt genericg_compute_checksum_4al_5 ( HWord first_w32 ); +VEX_REGPARM(1) static UInt genericg_compute_checksum_4al_6 ( HWord first_w32 ); +VEX_REGPARM(1) static UInt genericg_compute_checksum_4al_7 ( HWord first_w32 ); +VEX_REGPARM(1) static UInt genericg_compute_checksum_4al_8 ( HWord first_w32 ); +VEX_REGPARM(1) static UInt genericg_compute_checksum_4al_9 ( HWord first_w32 ); +VEX_REGPARM(1) static UInt genericg_compute_checksum_4al_10 ( HWord first_w32 ); +VEX_REGPARM(1) static UInt genericg_compute_checksum_4al_11 ( HWord first_w32 ); +VEX_REGPARM(1) static UInt genericg_compute_checksum_4al_12 ( HWord first_w32 ); + +VEX_REGPARM(2) static ULong genericg_compute_checksum_8al ( HWord first_w64, + HWord n_w64s ); +VEX_REGPARM(1) static ULong genericg_compute_checksum_8al_1 ( HWord first_w64 ); +VEX_REGPARM(1) static ULong genericg_compute_checksum_8al_2 ( HWord first_w64 ); +VEX_REGPARM(1) static ULong genericg_compute_checksum_8al_3 ( HWord first_w64 ); +VEX_REGPARM(1) static ULong genericg_compute_checksum_8al_4 ( HWord first_w64 ); +VEX_REGPARM(1) static ULong genericg_compute_checksum_8al_5 ( HWord first_w64 ); +VEX_REGPARM(1) static ULong genericg_compute_checksum_8al_6 ( HWord first_w64 ); +VEX_REGPARM(1) static ULong genericg_compute_checksum_8al_7 ( HWord first_w64 ); +VEX_REGPARM(1) static ULong genericg_compute_checksum_8al_8 ( HWord first_w64 ); +VEX_REGPARM(1) static ULong genericg_compute_checksum_8al_9 ( HWord first_w64 ); +VEX_REGPARM(1) static ULong genericg_compute_checksum_8al_10 ( HWord first_w64 ); +VEX_REGPARM(1) static ULong genericg_compute_checksum_8al_11 ( HWord first_w64 ); +VEX_REGPARM(1) static ULong genericg_compute_checksum_8al_12 ( HWord first_w64 ); + + +/*--------------------------------------------------------------*/ +/*--- Creation of self-check IR ---*/ +/*--------------------------------------------------------------*/ + +static void create_self_checks_as_needed( + /*MOD*/ IRSB* irsb, + /*OUT*/ UInt* n_sc_extents, + /*MOD*/ VexRegisterUpdates* pxControl, + /*MOD*/ void* callback_opaque, + /*IN*/ UInt (*needs_self_check) + (void*, /*MB_MOD*/VexRegisterUpdates*, + const VexGuestExtents*), + const VexGuestExtents* vge, + const VexAbiInfo* abiinfo_both, + const IRType guest_word_type, + const Int selfcheck_idx, + /*IN*/ Int offB_GUEST_CMSTART, + /*IN*/ Int offB_GUEST_CMLEN, + /*IN*/ Int offB_GUEST_IP, + const Addr guest_IP_sbstart + ) +{ + /* The scheme is to compute a rather crude checksum of the code + we're making a translation of, and add to the IR a call to a + helper routine which recomputes the checksum every time the + translation is run, and requests a retranslation if it doesn't + match. This is obviously very expensive and considerable + efforts are made to speed it up: -VEX_REGPARM(2) -static ULong genericg_compute_checksum_8al ( HWord first_w64, HWord n_w64s ); -VEX_REGPARM(1) -static ULong genericg_compute_checksum_8al_1 ( HWord first_w64 ); -VEX_REGPARM(1) -static ULong genericg_compute_checksum_8al_2 ( HWord first_w64 ); -VEX_REGPARM(1) -static ULong genericg_compute_checksum_8al_3 ( HWord first_w64 ); -VEX_REGPARM(1) -static ULong genericg_compute_checksum_8al_4 ( HWord first_w64 ); -VEX_REGPARM(1) -static ULong genericg_compute_checksum_8al_5 ( HWord first_w64 ); -VEX_REGPARM(1) -static ULong genericg_compute_checksum_8al_6 ( HWord first_w64 ); -VEX_REGPARM(1) -static ULong genericg_compute_checksum_8al_7 ( HWord first_w64 ); -VEX_REGPARM(1) -static ULong genericg_compute_checksum_8al_8 ( HWord first_w64 ); -VEX_REGPARM(1) -static ULong genericg_compute_checksum_8al_9 ( HWord first_w64 ); -VEX_REGPARM(1) -static ULong genericg_compute_checksum_8al_10 ( HWord first_w64 ); -VEX_REGPARM(1) -static ULong genericg_compute_checksum_8al_11 ( HWord first_w64 ); -VEX_REGPARM(1) -static ULong genericg_compute_checksum_8al_12 ( HWord first_w64 ); + * the checksum is computed from all the naturally aligned + host-sized words that overlap the translated code. That means + it could depend on up to 7 bytes before and 7 bytes after + which aren't part of the translated area, and so if those + change then we'll unnecessarily have to discard and + retranslate. This seems like a pretty remote possibility and + it seems as if the benefit of not having to deal with the ends + of the range at byte precision far outweigh any possible extra + translations needed. -/* Small helpers */ -static Bool const_False ( void* callback_opaque, Addr a ) { - return False; -} + * there's a generic routine and 12 specialised cases, which + handle the cases of 1 through 12-word lengths respectively. + They seem to cover about 90% of the cases that occur in + practice. -/* Disassemble a complete basic block, starting at guest_IP_start, - returning a new IRSB. The disassembler may chase across basic - block boundaries if it wishes and if chase_into_ok allows it. - The precise guest address ranges from which code has been taken - are written into vge. guest_IP_bbstart is taken to be the IP in - the guest's address space corresponding to the instruction at - &guest_code[0]. + We ask the caller, via needs_self_check, which of the 3 vge + extents needs a check, and only generate check code for those + that do. + */ + { + Addr base2check; + UInt len2check; + HWord expectedhW; + IRTemp tistart_tmp, tilen_tmp; + HWord VEX_REGPARM(2) (*fn_generic)(HWord, HWord); + HWord VEX_REGPARM(1) (*fn_spec)(HWord); + const HChar* nm_generic; + const HChar* nm_spec; + HWord fn_generic_entry = 0; + HWord fn_spec_entry = 0; + UInt host_word_szB = sizeof(HWord); + IRType host_word_type = Ity_INVALID; - dis_instr_fn is the arch-specific fn to disassemble on function; it - is this that does the real work. + UInt extents_needing_check + = needs_self_check(callback_opaque, pxControl, vge); - needs_self_check is a callback used to ask the caller which of the - extents, if any, a self check is required for. The returned value - is a bitmask with a 1 in position i indicating that the i'th extent - needs a check. Since there can be at most 3 extents, the returned - values must be between 0 and 7. + if (host_word_szB == 4) host_word_type = Ity_I32; + if (host_word_szB == 8) host_word_type = Ity_I64; + vassert(host_word_type != Ity_INVALID); - The number of extents which did get a self check (0 to 3) is put in - n_sc_extents. The caller already knows this because it told us - which extents to add checks for, via the needs_self_check callback, - but we ship the number back out here for the caller's convenience. + vassert(vge->n_used >= 1 && vge->n_used <= 3); - preamble_function is a callback which allows the caller to add - its own IR preamble (following the self-check, if any). May be - NULL. If non-NULL, the IRSB under construction is handed to - this function, which presumably adds IR statements to it. The - callback may optionally complete the block and direct bb_to_IR - not to disassemble any instructions into it; this is indicated - by the callback returning True. + /* Caller shouldn't claim that nonexistent extents need a + check. */ + vassert((extents_needing_check >> vge->n_used) == 0); - offB_CMADDR and offB_CMLEN are the offsets of guest_CMADDR and - guest_CMLEN. Since this routine has to work for any guest state, - without knowing what it is, those offsets have to passed in. + /* Guest addresses as IRConsts. Used in self-checks to specify the + restart-after-discard point. */ + IRConst* guest_IP_sbstart_IRConst + = guest_word_type==Ity_I32 + ? IRConst_U32(toUInt(guest_IP_sbstart)) + : IRConst_U64(guest_IP_sbstart); - callback_opaque is a caller-supplied pointer to data which the - callbacks may want to see. Vex has no idea what it is. - (In fact it's a VgInstrumentClosure.) -*/ + const Int n_extent_slots = sizeof(vge->base) / sizeof(vge->base[0]); + vassert(n_extent_slots == 3); -/* Regarding IP updating. dis_instr_fn (that does the guest specific - work of disassembling an individual instruction) must finish the - resulting IR with "PUT(guest_IP) = ". Hence in all cases it must - state the next instruction address. + vassert(selfcheck_idx + (n_extent_slots - 1) * 5 + 4 < irsb->stmts_used); - If the block is to be ended at that point, then this routine - (bb_to_IR) will set up the next/jumpkind/offsIP fields so as to - make a transfer (of the right kind) to "GET(guest_IP)". Hence if - dis_instr_fn generates incorrect IP updates we will see it - immediately (due to jumping to the wrong next guest address). + for (Int i = 0; i < vge->n_used; i++) { + /* Do we need to generate a check for this extent? */ + if ((extents_needing_check & (1 << i)) == 0) + continue; - However it is also necessary to set this up so it can be optimised - nicely. The IRSB exit is defined to update the guest IP, so that - chaining works -- since the chain_me stubs expect the chain-to - address to be in the guest state. Hence what the IRSB next fields - will contain initially is (implicitly) + /* Tell the caller */ + (*n_sc_extents)++; - PUT(guest_IP) [implicitly] = GET(guest_IP) [explicit expr on ::next] + /* the extent we're generating a check for */ + base2check = vge->base[i]; + len2check = vge->len[i]; + + /* stay sane */ + vassert(len2check >= 0 && len2check < 2000/*arbitrary*/); + + /* Skip the check if the translation involved zero bytes */ + if (len2check == 0) + continue; + + HWord first_hW = ((HWord)base2check) + & ~(HWord)(host_word_szB-1); + HWord last_hW = (((HWord)base2check) + len2check - 1) + & ~(HWord)(host_word_szB-1); + vassert(first_hW <= last_hW); + HWord hW_diff = last_hW - first_hW; + vassert(0 == (hW_diff & (host_word_szB-1))); + HWord hWs_to_check = (hW_diff + host_word_szB) / host_word_szB; + vassert(hWs_to_check > 0 + && hWs_to_check < 2004/*arbitrary*/ / host_word_szB); + + /* vex_printf("%lx %lx %ld\n", first_hW, last_hW, hWs_to_check); */ + + if (host_word_szB == 8) { + fn_generic = (VEX_REGPARM(2) HWord(*)(HWord, HWord)) + genericg_compute_checksum_8al; + nm_generic = "genericg_compute_checksum_8al"; + } else { + fn_generic = (VEX_REGPARM(2) HWord(*)(HWord, HWord)) + genericg_compute_checksum_4al; + nm_generic = "genericg_compute_checksum_4al"; + } + + fn_spec = NULL; + nm_spec = NULL; + + if (host_word_szB == 8) { + const HChar* nm = NULL; + ULong VEX_REGPARM(1) (*fn)(HWord) = NULL; + switch (hWs_to_check) { + case 1: fn = genericg_compute_checksum_8al_1; + nm = "genericg_compute_checksum_8al_1"; break; + case 2: fn = genericg_compute_checksum_8al_2; + nm = "genericg_compute_checksum_8al_2"; break; + case 3: fn = genericg_compute_checksum_8al_3; + nm = "genericg_compute_checksum_8al_3"; break; + case 4: fn = genericg_compute_checksum_8al_4; + nm = "genericg_compute_checksum_8al_4"; break; + case 5: fn = genericg_compute_checksum_8al_5; + nm = "genericg_compute_checksum_8al_5"; break; + case 6: fn = genericg_compute_checksum_8al_6; + nm = "genericg_compute_checksum_8al_6"; break; + case 7: fn = genericg_compute_checksum_8al_7; + nm = "genericg_compute_checksum_8al_7"; break; + case 8: fn = genericg_compute_checksum_8al_8; + nm = "genericg_compute_checksum_8al_8"; break; + case 9: fn = genericg_compute_checksum_8al_9; + nm = "genericg_compute_checksum_8al_9"; break; + case 10: fn = genericg_compute_checksum_8al_10; + nm = "genericg_compute_checksum_8al_10"; break; + case 11: fn = genericg_compute_checksum_8al_11; + nm = "genericg_compute_checksum_8al_11"; break; + case 12: fn = genericg_compute_checksum_8al_12; + nm = "genericg_compute_checksum_8al_12"; break; + default: break; + } + fn_spec = (VEX_REGPARM(1) HWord(*)(HWord)) fn; + nm_spec = nm; + } else { + const HChar* nm = NULL; + UInt VEX_REGPARM(1) (*fn)(HWord) = NULL; + switch (hWs_to_check) { + case 1: fn = genericg_compute_checksum_4al_1; + nm = "genericg_compute_checksum_4al_1"; break; + case 2: fn = genericg_compute_checksum_4al_2; + nm = "genericg_compute_checksum_4al_2"; break; + case 3: fn = genericg_compute_checksum_4al_3; + nm = "genericg_compute_checksum_4al_3"; break; + case 4: fn = genericg_compute_checksum_4al_4; + nm = "genericg_compute_checksum_4al_4"; break; + case 5: fn = genericg_compute_checksum_4al_5; + nm = "genericg_compute_checksum_4al_5"; break; + case 6: fn = genericg_compute_checksum_4al_6; + nm = "genericg_compute_checksum_4al_6"; break; + case 7: fn = genericg_compute_checksum_4al_7; + nm = "genericg_compute_checksum_4al_7"; break; + case 8: fn = genericg_compute_checksum_4al_8; + nm = "genericg_compute_checksum_4al_8"; break; + case 9: fn = genericg_compute_checksum_4al_9; + nm = "genericg_compute_checksum_4al_9"; break; + case 10: fn = genericg_compute_checksum_4al_10; + nm = "genericg_compute_checksum_4al_10"; break; + case 11: fn = genericg_compute_checksum_4al_11; + nm = "genericg_compute_checksum_4al_11"; break; + case 12: fn = genericg_compute_checksum_4al_12; + nm = "genericg_compute_checksum_4al_12"; break; + default: break; + } + fn_spec = (VEX_REGPARM(1) HWord(*)(HWord))fn; + nm_spec = nm; + } + + expectedhW = fn_generic( first_hW, hWs_to_check ); + /* If we got a specialised version, check it produces the same + result as the generic version! */ + if (fn_spec) { + vassert(nm_spec); + vassert(expectedhW == fn_spec( first_hW )); + } else { + vassert(!nm_spec); + } + + /* Set CMSTART and CMLEN. These will describe to the despatcher + the area of guest code to invalidate should we exit with a + self-check failure. */ + tistart_tmp = newIRTemp(irsb->tyenv, guest_word_type); + tilen_tmp = newIRTemp(irsb->tyenv, guest_word_type); + + IRConst* base2check_IRConst + = guest_word_type==Ity_I32 ? IRConst_U32(toUInt(base2check)) + : IRConst_U64(base2check); + IRConst* len2check_IRConst + = guest_word_type==Ity_I32 ? IRConst_U32(len2check) + : IRConst_U64(len2check); + + IRStmt** stmt0 = &irsb->stmts[selfcheck_idx + i * 5 + 0]; + IRStmt** stmt1 = &irsb->stmts[selfcheck_idx + i * 5 + 1]; + IRStmt** stmt2 = &irsb->stmts[selfcheck_idx + i * 5 + 2]; + IRStmt** stmt3 = &irsb->stmts[selfcheck_idx + i * 5 + 3]; + IRStmt** stmt4 = &irsb->stmts[selfcheck_idx + i * 5 + 4]; + vassert((*stmt0)->tag == Ist_NoOp); + vassert((*stmt1)->tag == Ist_NoOp); + vassert((*stmt2)->tag == Ist_NoOp); + vassert((*stmt3)->tag == Ist_NoOp); + vassert((*stmt4)->tag == Ist_NoOp); + + *stmt0 = IRStmt_WrTmp(tistart_tmp, IRExpr_Const(base2check_IRConst) ); + *stmt1 = IRStmt_WrTmp(tilen_tmp, IRExpr_Const(len2check_IRConst) ); + *stmt2 = IRStmt_Put( offB_GUEST_CMSTART, IRExpr_RdTmp(tistart_tmp) ); + *stmt3 = IRStmt_Put( offB_GUEST_CMLEN, IRExpr_RdTmp(tilen_tmp) ); + + /* Generate the entry point descriptors */ + if (abiinfo_both->host_ppc_calls_use_fndescrs) { + HWord* descr = (HWord*)fn_generic; + fn_generic_entry = descr[0]; + if (fn_spec) { + descr = (HWord*)fn_spec; + fn_spec_entry = descr[0]; + } else { + fn_spec_entry = (HWord)NULL; + } + } else { + fn_generic_entry = (HWord)fn_generic; + if (fn_spec) { + fn_spec_entry = (HWord)fn_spec; + } else { + fn_spec_entry = (HWord)NULL; + } + } + + IRExpr* callexpr = NULL; + if (fn_spec) { + callexpr = mkIRExprCCall( + host_word_type, 1/*regparms*/, + nm_spec, (void*)fn_spec_entry, + mkIRExprVec_1( + mkIRExpr_HWord( (HWord)first_hW ) + ) + ); + } else { + callexpr = mkIRExprCCall( + host_word_type, 2/*regparms*/, + nm_generic, (void*)fn_generic_entry, + mkIRExprVec_2( + mkIRExpr_HWord( (HWord)first_hW ), + mkIRExpr_HWord( (HWord)hWs_to_check ) + ) + ); + } + + *stmt4 + = IRStmt_Exit( + IRExpr_Binop( + host_word_type==Ity_I64 ? Iop_CmpNE64 : Iop_CmpNE32, + callexpr, + host_word_type==Ity_I64 + ? IRExpr_Const(IRConst_U64(expectedhW)) + : IRExpr_Const(IRConst_U32(expectedhW)) + ), + Ijk_InvalICache, + /* Where we must restart if there's a failure: at the + first extent, regardless of which extent the + failure actually happened in. */ + guest_IP_sbstart_IRConst, + offB_GUEST_IP + ); + } /* for (i = 0; i < vge->n_used; i++) */ + + for (Int i = vge->n_used; + i < sizeof(vge->base) / sizeof(vge->base[0]); i++) { + IRStmt* stmt0 = irsb->stmts[selfcheck_idx + i * 5 + 0]; + IRStmt* stmt1 = irsb->stmts[selfcheck_idx + i * 5 + 1]; + IRStmt* stmt2 = irsb->stmts[selfcheck_idx + i * 5 + 2]; + IRStmt* stmt3 = irsb->stmts[selfcheck_idx + i * 5 + 3]; + IRStmt* stmt4 = irsb->stmts[selfcheck_idx + i * 5 + 4]; + vassert(stmt0->tag == Ist_NoOp); + vassert(stmt1->tag == Ist_NoOp); + vassert(stmt2->tag == Ist_NoOp); + vassert(stmt3->tag == Ist_NoOp); + vassert(stmt4->tag == Ist_NoOp); + } + } +} + + +/*--------------------------------------------------------------*/ +/*--- To do with speculation of IRStmts ---*/ +/*--------------------------------------------------------------*/ + +static Bool expr_is_speculatable ( const IRExpr* e ) +{ + switch (e->tag) { + case Iex_Load: + return False; + case Iex_Unop: // FIXME BOGUS, since it might trap + case Iex_Binop: // FIXME ditto + case Iex_ITE: // this is OK + return True; + case Iex_CCall: + return True; // This is probably correct + case Iex_Get: + return True; + default: + vex_printf("\n"); ppIRExpr(e); + vpanic("expr_is_speculatable: unhandled expr"); + } +} + +static Bool stmt_is_speculatable ( const IRStmt* st ) +{ + switch (st->tag) { + case Ist_IMark: + case Ist_Put: + return True; + case Ist_Store: // definitely not + case Ist_CAS: // definitely not + case Ist_Exit: // We could in fact spec this, if required + return False; + case Ist_WrTmp: + return expr_is_speculatable(st->Ist.WrTmp.data); + default: + vex_printf("\n"); ppIRStmt(st); + vpanic("stmt_is_speculatable: unhandled stmt"); + } +} + +static Bool block_is_speculatable ( const IRSB* bb ) +{ + Int i = bb->stmts_used; + vassert(i >= 2); // Must have at least: IMark, final Exit + i--; + vassert(bb->stmts[i]->tag == Ist_Exit); + i--; + for (; i >= 0; i--) { + if (!stmt_is_speculatable(bb->stmts[i])) + return False; + } + return True; +} + +static void speculate_stmt_to_end_of ( /*MOD*/IRSB* bb, + /*IN*/ IRStmt* st, IRTemp guard ) +{ + // We assume all stmts we're presented with here have previously been OK'd by + // stmt_is_speculatable above. + switch (st->tag) { + case Ist_IMark: + case Ist_WrTmp: // FIXME is this ok? + addStmtToIRSB(bb, st); + break; + case Ist_Put: { + // Put(offs, e) ==> ... [truncated message content] |
|
From: Julian S. <se...@so...> - 2020-01-02 05:35:54
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=740381f8ac28e2878de7b068611385ed01985478 commit 740381f8ac28e2878de7b068611385ed01985478 Author: Julian Seward <js...@ac...> Date: Thu Jan 2 06:34:52 2020 +0100 Update following recent bug-fix commits. Diff: --- NEWS | 5 +++++ docs/internals/3_15_BUGSTATUS.txt | 22 +++++++--------------- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/NEWS b/NEWS index 28d8027..9d24df6 100644 --- a/NEWS +++ b/NEWS @@ -83,11 +83,16 @@ where XXXXXX is the bug number as listed below. 408091 Missing pkey syscalls 408414 Add support for missing for preadv2 and pwritev2 syscalls 409141 Valgrind hangs when SIGKILLed +409206 Support for Linux PPS and PTP ioctls 409367 exit_group() after signal to thread waiting in futex() causes hangs 409780 References to non-existent configure.in +410556 Add support for BLKIO{MIN,OPT} and BLKALIGNOFF ioctls 410599 Non-deterministic behaviour of pth_self_kill_15_other test +410757 discrepancy for preadv2/pwritev2 syscalls across different versions 411134 Allow the user to change a set of command line options during execution +411451 amd64->IR of bt/btc/bts/btr with immediate clears zero flag 412344 Problem setting mips flags with specific paths +413119 Ioctl wrapper for DRM_IOCTL_I915_GEM_MMAP 413330 avx-1 test fails on AMD EPYC 7401P 24-Core Processor 413603 callgrind_annotate/cg_annotate truncate function names at '#' 414565 Specific use case bug found in SysRes VG_(do_sys_sigprocmask) diff --git a/docs/internals/3_15_BUGSTATUS.txt b/docs/internals/3_15_BUGSTATUS.txt index c0be01f..778053f 100644 --- a/docs/internals/3_15_BUGSTATUS.txt +++ b/docs/internals/3_15_BUGSTATUS.txt @@ -13,16 +13,11 @@ of 3.15.0. It doesn't carry over bugs from earlier versions. 407376 Update Xen support to 4.12 and add more coverage ** Has patch, looks reasonable + ** 2019Dec30: causes implicit-fallthrough warning; author queried 408858 Add new io_uring_register, setup, enter syscalls No patch, no test case -409206 [PATCH] Support for Linux PPS and PTP ioctls - ** Has patches, looks reasonable - -410556 [PATCH] add support for BLKIO{MIN,OPT} and BLKALIGNOFF ioctls - ** Has patches, looks reasonable - 410743 shmat() calls for 32-bit programs fail when running in 64-bit valgrind Not sure if this is important. Ask MJW. @@ -38,9 +33,6 @@ of 3.15.0. It doesn't carry over bugs from earlier versions. 412408 unhandled arm-linux syscall: 124 - adjtime - on arm-linux * trivial patch, but need to check the handler is correct -413119 ioctl wrapper for DRM_IOCTL_I915_GEM_MMAP - ** plausible; contains patches - 415621 epoll_ctl reports for uninitialized padding * maybe an inaccurate wrapper; may be easy to fix? @@ -92,11 +84,12 @@ of 3.15.0. It doesn't carry over bugs from earlier versions. === Tools/Memcheck ===================================================== 407589 Add support for C11 aligned_alloc() and GNU reallocarray() - Missing allocation intercepts? + * Missing allocation intercepts? 409429 False positives at unexpected location due to failure to recognize cmpeq as a dependency breaking idiom (fixed in grail? check this) - In grail: 96de5118f5332ae145912ebe91b8fa143df74b8d + * In grail: 96de5118f5332ae145912ebe91b8fa143df74b8d + (but not merged from it; needs doing separately) 415141 Possible leak with calling __libc_freeres before all thread's tid_addresses are cleared @@ -107,7 +100,7 @@ of 3.15.0. It doesn't carry over bugs from earlier versions. === Uncategorised/build ================================================ 415516 Can't cross compile on openwrt - MIPS build failure + * MIPS build failure === Uncategorised/run ================================================== @@ -129,9 +122,6 @@ of 3.15.0. It doesn't carry over bugs from earlier versions. == 414053 393351 has STR -411451 x86/amd64->IR of bt/btc/bts/btr with immediate clears zero flag - * has patch and nano-test-case - === VEX/arm32 ========================================================== 410102 Valgrind ir sanity check failure crash @@ -198,3 +188,5 @@ Extras apply included fixes for Xen 4.6/4.7/4.8/4.9/4.10/4.11/4.12 390553 ? Can we get rid of exp-sgcheck now? + +Very large executable support -- adjust tool load address? Status? |