You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
|
1
|
2
|
3
|
|
4
|
5
|
6
(1) |
7
|
8
|
9
|
10
|
|
11
|
12
|
13
(6) |
14
|
15
|
16
(2) |
17
(1) |
|
18
|
19
|
20
|
21
(4) |
22
|
23
|
24
|
|
25
|
26
|
27
|
28
|
29
|
30
|
31
|
|
From: Julian S. <se...@so...> - 2021-07-13 10:53:15
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=61307ee83121aa5f0b57a12a80e90fc2f414380a commit 61307ee83121aa5f0b57a12a80e90fc2f414380a Author: Julian Seward <js...@ac...> Date: Tue Jul 13 12:52:10 2021 +0200 Un-break arm64 isel following 22bae4b1544fc5d82f131ef8fde4cea7666112c2 22bae4b1544fc5d82f131ef8fde4cea7666112c2 introduced an iropt-level rewrite rule 64to16( 32Uto64 ( x )) --> 32to16(x) that creates Iop_32to16 nodes. The arm64 isel apparently has never seen these before and so asserts. This is a 1-liner fix. Diff: --- VEX/priv/host_arm64_isel.c | 1 + 1 file changed, 1 insertion(+) diff --git a/VEX/priv/host_arm64_isel.c b/VEX/priv/host_arm64_isel.c index 26b27f1f7c..4b1d8c8469 100644 --- a/VEX/priv/host_arm64_isel.c +++ b/VEX/priv/host_arm64_isel.c @@ -2188,6 +2188,7 @@ static HReg iselIntExpr_R_wrk ( ISelEnv* env, IRExpr* e ) case Iop_64to32: case Iop_64to16: case Iop_64to8: + case Iop_32to16: /* These are no-ops. */ return iselIntExpr_R(env, e->Iex.Unop.arg); |
|
From: Julian S. <se...@so...> - 2021-07-13 08:41:32
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=22bae4b1544fc5d82f131ef8fde4cea7666112c2 commit 22bae4b1544fc5d82f131ef8fde4cea7666112c2 Author: Julian Seward <js...@ac...> Date: Tue Jul 13 10:41:04 2021 +0200 amd64 front end: Make uses of 8- and 16-bit GPRs GET the entire containing register. Until now, a read of a 32-bit GPR (eg, %ecx) in the amd64 front end actually involved GETting the containing 64-bit reg (%rcx) and dropping off its top 32-bits, in the IR translation. This makes IR optimisation work well for code that mixes 32 and 64 bit integer operations, which is very commont. In particular it helps guarantee that PUT-to-GET and redundant-GET optimisations work, hence that constant propagation/folding across such boundaries works, and indirectly helps to avoid generating code in the back end that suffers from store-forwarding or partial-register-read stalls. This commit partially extends those advantages to 8- and 16-bit GPR reads. In particular, all 16-bit GPR fetches are now a GET of the whole 64-bit register followed by an Iop_64to16 cast. The same scheme is used for 8-bit register fetches, except for the "anomalous four" (%ah, %bh, %ch, %dh), whose handling is left unchanged. With this in place, now, a wider write followed by a smaller read, will play nice with constant folding, propagation, for example (somewhat artificially): movl $17, %ecx // 32-bit write of %rcx shrl %cl, %r15 // 8-bit read of %rcx The 17 will be propagated, in IR, up to the shift. The commit also adds a couple more rewrite rules in ir_opt.c to remove some of the resulting pointless conversion pairings. Diff: --- VEX/priv/guest_amd64_toIR.c | 187 ++++++++++++++++++++++++++++++-------------- VEX/priv/ir_opt.c | 8 ++ 2 files changed, 138 insertions(+), 57 deletions(-) diff --git a/VEX/priv/guest_amd64_toIR.c b/VEX/priv/guest_amd64_toIR.c index 070a1c5bc5..c6296f3987 100644 --- a/VEX/priv/guest_amd64_toIR.c +++ b/VEX/priv/guest_amd64_toIR.c @@ -912,7 +912,8 @@ static Int integerGuestReg64Offset ( UInt reg ) /* Produce the name of an integer register, for printing purposes. reg is a number in the range 0 .. 15 that has been generated from a 3-bit reg-field number and a REX extension bit. irregular denotes - the case where sz==1 and no REX byte is present. */ + the case where sz==1 and no REX byte is present and where the denoted + sub-register is bits 15:8 of the containing 64-bit register. */ static const HChar* nameIReg ( Int sz, UInt reg, Bool irregular ) @@ -929,13 +930,13 @@ const HChar* nameIReg ( Int sz, UInt reg, Bool irregular ) static const HChar* ireg8_names[16] = { "%al", "%cl", "%dl", "%bl", "%spl", "%bpl", "%sil", "%dil", "%r8b", "%r9b", "%r10b","%r11b","%r12b","%r13b","%r14b","%r15b" }; - static const HChar* ireg8_irregular[8] - = { "%al", "%cl", "%dl", "%bl", "%ah", "%ch", "%dh", "%bh" }; + static const HChar* ireg8_irregular[4] + = { "%ah", "%ch", "%dh", "%bh" }; vassert(reg < 16); if (sz == 1) { if (irregular) - vassert(reg < 8); + vassert(reg >= 4 && reg < 8); } else { vassert(irregular == False); } @@ -945,7 +946,8 @@ const HChar* nameIReg ( Int sz, UInt reg, Bool irregular ) case 4: return ireg32_names[reg]; case 2: return ireg16_names[reg]; case 1: if (irregular) { - return ireg8_irregular[reg]; + vassert(reg >= 4 && reg < 8); + return ireg8_irregular[reg - 4]; } else { return ireg8_names[reg]; } @@ -962,7 +964,7 @@ Int offsetIReg ( Int sz, UInt reg, Bool irregular ) vassert(reg < 16); if (sz == 1) { if (irregular) - vassert(reg < 8); + vassert(reg >= 4 && reg < 8); } else { vassert(irregular == False); } @@ -988,7 +990,7 @@ Int offsetIReg ( Int sz, UInt reg, Bool irregular ) static IRExpr* getIRegCL ( void ) { vassert(host_endness == VexEndnessLE); - return IRExpr_Get( OFFB_RCX, Ity_I8 ); + return unop(Iop_64to8, IRExpr_Get( OFFB_RCX, Ity_I64 )); } @@ -1020,8 +1022,8 @@ static IRExpr* getIRegRAX ( Int sz ) { vassert(host_endness == VexEndnessLE); switch (sz) { - case 1: return IRExpr_Get( OFFB_RAX, Ity_I8 ); - case 2: return IRExpr_Get( OFFB_RAX, Ity_I16 ); + case 1: return unop(Iop_64to8, IRExpr_Get( OFFB_RAX, Ity_I64 )); + case 2: return unop(Iop_64to16, IRExpr_Get( OFFB_RAX, Ity_I64 )); case 4: return unop(Iop_64to32, IRExpr_Get( OFFB_RAX, Ity_I64 )); case 8: return IRExpr_Get( OFFB_RAX, Ity_I64 ); default: vpanic("getIRegRAX(amd64)"); @@ -1068,8 +1070,8 @@ static IRExpr* getIRegRDX ( Int sz ) { vassert(host_endness == VexEndnessLE); switch (sz) { - case 1: return IRExpr_Get( OFFB_RDX, Ity_I8 ); - case 2: return IRExpr_Get( OFFB_RDX, Ity_I16 ); + case 1: return unop(Iop_64to8, IRExpr_Get( OFFB_RDX, Ity_I64 )); + case 2: return unop(Iop_64to16, IRExpr_Get( OFFB_RDX, Ity_I64 )); case 4: return unop(Iop_64to32, IRExpr_Get( OFFB_RDX, Ity_I64 )); case 8: return IRExpr_Get( OFFB_RDX, Ity_I64 ); default: vpanic("getIRegRDX(amd64)"); @@ -1145,8 +1147,9 @@ static const HChar* nameIReg32 ( UInt regno ) static IRExpr* getIReg16 ( UInt regno ) { vassert(host_endness == VexEndnessLE); - return IRExpr_Get( integerGuestReg64Offset(regno), - Ity_I16 ); + return unop(Iop_64to16, + IRExpr_Get( integerGuestReg64Offset(regno), + Ity_I64 )); } static void putIReg16 ( UInt regno, IRExpr* e ) @@ -1193,22 +1196,46 @@ static IRExpr* getIRegRexB ( Int sz, Prefix pfx, UInt lo3bits ) { vassert(lo3bits < 8); vassert(IS_VALID_PFX(pfx)); - vassert(sz == 8 || sz == 4 || sz == 2 || sz == 1); - if (sz == 4) { - sz = 8; - return unop(Iop_64to32, - IRExpr_Get( - offsetIReg( sz, lo3bits | (getRexB(pfx) << 3), - False/*!irregular*/ ), - szToITy(sz) - ) - ); - } else { - return IRExpr_Get( - offsetIReg( sz, lo3bits | (getRexB(pfx) << 3), - toBool(sz==1 && !haveREX(pfx)) ), - szToITy(sz) - ); + UInt regNo = (getRexB(pfx) << 3) | lo3bits; + switch (sz) { + case 8: { + return IRExpr_Get( + offsetIReg( 8, regNo, False/*!irregular*/ ), + Ity_I64 + ); + } + case 4: { + return unop(Iop_64to32, + IRExpr_Get( + offsetIReg( 8, regNo, False/*!irregular*/ ), + Ity_I64 + )); + } + case 2: { + return unop(Iop_64to16, + IRExpr_Get( + offsetIReg( 8, regNo, False/*!irregular*/ ), + Ity_I64 + )); + } + case 1: { + Bool irregular = !haveREX(pfx) && regNo >= 4 && regNo < 8; + if (irregular) { + return IRExpr_Get( + offsetIReg( 1, regNo, True/*irregular*/ ), + Ity_I8 + ); + } else { + return unop(Iop_64to8, + IRExpr_Get( + offsetIReg( 8, regNo, False/*!irregular*/ ), + Ity_I64 + )); + } + } + default: { + vpanic("getIRegRexB"); + } } } @@ -1218,9 +1245,9 @@ static void putIRegRexB ( Int sz, Prefix pfx, UInt lo3bits, IRExpr* e ) vassert(IS_VALID_PFX(pfx)); vassert(sz == 8 || sz == 4 || sz == 2 || sz == 1); vassert(typeOfIRExpr(irsb->tyenv, e) == szToITy(sz)); + Bool irregular = sz == 1 && !haveREX(pfx) && lo3bits >= 4 && lo3bits < 8; stmt( IRStmt_Put( - offsetIReg( sz, lo3bits | (getRexB(pfx) << 3), - toBool(sz==1 && !haveREX(pfx)) ), + offsetIReg( sz, lo3bits | (getRexB(pfx) << 3), irregular ), sz==4 ? unop(Iop_32Uto64,e) : e )); } @@ -1269,20 +1296,39 @@ static UInt offsetIRegG ( Int sz, Prefix pfx, UChar mod_reg_rm ) vassert(IS_VALID_PFX(pfx)); vassert(sz == 8 || sz == 4 || sz == 2 || sz == 1); reg = gregOfRexRM( pfx, mod_reg_rm ); - return offsetIReg( sz, reg, toBool(sz == 1 && !haveREX(pfx)) ); + Bool irregular = sz == 1 && !haveREX(pfx) && reg >= 4 && reg < 8; + return offsetIReg( sz, reg, irregular ); } static IRExpr* getIRegG ( Int sz, Prefix pfx, UChar mod_reg_rm ) { - if (sz == 4) { - sz = 8; - return unop(Iop_64to32, - IRExpr_Get( offsetIRegG( sz, pfx, mod_reg_rm ), - szToITy(sz) )); - } else { - return IRExpr_Get( offsetIRegG( sz, pfx, mod_reg_rm ), - szToITy(sz) ); + switch (sz) { + case 8: { + return IRExpr_Get( offsetIRegG( 8, pfx, mod_reg_rm ), Ity_I64 ); + } + case 4: { + return unop(Iop_64to32, + IRExpr_Get( offsetIRegG( 8, pfx, mod_reg_rm ), Ity_I64 )); + } + case 2: { + return unop(Iop_64to16, + IRExpr_Get( offsetIRegG( 8, pfx, mod_reg_rm ), Ity_I64 )); + } + case 1: { + UInt regNo = gregOfRexRM( pfx, mod_reg_rm ); + Bool irregular = !haveREX(pfx) && regNo >= 4 && regNo < 8; + if (irregular) { + return IRExpr_Get( offsetIRegG( 1, pfx, mod_reg_rm ), Ity_I8 ); + } else { + return unop(Iop_64to8, + IRExpr_Get( offsetIRegG( 8, pfx, mod_reg_rm ), + Ity_I64 )); + } + } + default: { + vpanic("getIRegG"); + } } } @@ -1299,19 +1345,24 @@ void putIRegG ( Int sz, Prefix pfx, UChar mod_reg_rm, IRExpr* e ) static const HChar* nameIRegG ( Int sz, Prefix pfx, UChar mod_reg_rm ) { - return nameIReg( sz, gregOfRexRM(pfx,mod_reg_rm), - toBool(sz==1 && !haveREX(pfx)) ); + UInt regNo = gregOfRexRM( pfx, mod_reg_rm ); + Bool irregular = sz == 1 && !haveREX(pfx) && regNo >= 4 && regNo < 8; + return nameIReg( sz, gregOfRexRM(pfx,mod_reg_rm), irregular ); } static IRExpr* getIRegV ( Int sz, Prefix pfx ) { + vassert(sz == 8 || sz == 4); if (sz == 4) { - sz = 8; return unop(Iop_64to32, - IRExpr_Get( offsetIReg( sz, getVexNvvvv(pfx), False ), - szToITy(sz) )); + IRExpr_Get( offsetIReg( 8, getVexNvvvv(pfx), False ), + Ity_I64 )); + } else if (sz == 2) { + return unop(Iop_64to16, + IRExpr_Get( offsetIReg( 8, getVexNvvvv(pfx), False ), + Ity_I64 )); } else { return IRExpr_Get( offsetIReg( sz, getVexNvvvv(pfx), False ), szToITy(sz) ); @@ -1321,6 +1372,7 @@ IRExpr* getIRegV ( Int sz, Prefix pfx ) static void putIRegV ( Int sz, Prefix pfx, IRExpr* e ) { + vassert(sz == 8 || sz == 4); vassert(typeOfIRExpr(irsb->tyenv,e) == szToITy(sz)); if (sz == 4) { e = unop(Iop_32Uto64,e); @@ -1331,6 +1383,7 @@ void putIRegV ( Int sz, Prefix pfx, IRExpr* e ) static const HChar* nameIRegV ( Int sz, Prefix pfx ) { + vassert(sz == 8 || sz == 4); return nameIReg( sz, getVexNvvvv(pfx), False ); } @@ -1348,20 +1401,39 @@ static UInt offsetIRegE ( Int sz, Prefix pfx, UChar mod_reg_rm ) vassert(IS_VALID_PFX(pfx)); vassert(sz == 8 || sz == 4 || sz == 2 || sz == 1); reg = eregOfRexRM( pfx, mod_reg_rm ); - return offsetIReg( sz, reg, toBool(sz == 1 && !haveREX(pfx)) ); + Bool irregular = sz == 1 && !haveREX(pfx) && (reg >= 4 && reg < 8); + return offsetIReg( sz, reg, irregular ); } -static +static IRExpr* getIRegE ( Int sz, Prefix pfx, UChar mod_reg_rm ) { - if (sz == 4) { - sz = 8; - return unop(Iop_64to32, - IRExpr_Get( offsetIRegE( sz, pfx, mod_reg_rm ), - szToITy(sz) )); - } else { - return IRExpr_Get( offsetIRegE( sz, pfx, mod_reg_rm ), - szToITy(sz) ); + switch (sz) { + case 8: { + return IRExpr_Get( offsetIRegE( 8, pfx, mod_reg_rm ), Ity_I64 ); + } + case 4: { + return unop(Iop_64to32, + IRExpr_Get( offsetIRegE( 8, pfx, mod_reg_rm ), Ity_I64 )); + } + case 2: { + return unop(Iop_64to16, + IRExpr_Get( offsetIRegE( 8, pfx, mod_reg_rm ), Ity_I64 )); + } + case 1: { + UInt regNo = eregOfRexRM( pfx, mod_reg_rm ); + Bool irregular = !haveREX(pfx) && regNo >= 4 && regNo < 8; + if (irregular) { + return IRExpr_Get( offsetIRegE( 1, pfx, mod_reg_rm ), Ity_I8 ); + } else { + return unop(Iop_64to8, + IRExpr_Get( offsetIRegE( 8, pfx, mod_reg_rm ), + Ity_I64 )); + } + } + default: { + vpanic("getIRegE"); + } } } @@ -1378,8 +1450,9 @@ void putIRegE ( Int sz, Prefix pfx, UChar mod_reg_rm, IRExpr* e ) static const HChar* nameIRegE ( Int sz, Prefix pfx, UChar mod_reg_rm ) { - return nameIReg( sz, eregOfRexRM(pfx,mod_reg_rm), - toBool(sz==1 && !haveREX(pfx)) ); + UInt regNo = eregOfRexRM( pfx, mod_reg_rm ); + Bool irregular = sz == 1 && !haveREX(pfx) && regNo >= 4 && regNo < 8; + return nameIReg( sz, eregOfRexRM(pfx,mod_reg_rm), irregular ); } diff --git a/VEX/priv/ir_opt.c b/VEX/priv/ir_opt.c index 930fd49dd8..93dd6188ef 100644 --- a/VEX/priv/ir_opt.c +++ b/VEX/priv/ir_opt.c @@ -5480,6 +5480,14 @@ static IRExpr* fold_IRExpr_Unop ( IROp op, IRExpr* aa ) if (is_Unop(aa, Iop_8Uto64)) return IRExpr_Unop(Iop_8Uto32, aa->Iex.Unop.arg); break; + case Iop_64to16: + /* 64to16( 16Uto64 ( x )) --> x */ + if (is_Unop(aa, Iop_16Uto64)) + return aa->Iex.Unop.arg; + /* 64to16( 32Uto64 ( x )) --> 32to16(x) */ + if (is_Unop(aa, Iop_32Uto64)) + return IRExpr_Unop(Iop_32to16, aa->Iex.Unop.arg); + break; case Iop_32Uto64: /* 32Uto64( 8Uto32( x )) --> 8Uto64(x) */ |
|
From: Julian S. <se...@so...> - 2021-07-13 08:25:16
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=68f4dcface31a7aad8dc9f4186782920d73b7489 commit 68f4dcface31a7aad8dc9f4186782920d73b7489 Author: Julian Seward <js...@ac...> Date: Tue Jul 13 10:15:39 2021 +0200 Consistently set CC_NDEP when setting the flags thunk. For most settings of the flags thunk (guest_CC_{OP,DEP1,DEP2,NDEP}), the value of the NDEP field is irrelevant, because of the setting of the OP field, and so it is usually not set in such cases, which are the vast majority. This saves a store (a PUT) in the final generated code. But it has the bad effect that the IR optimiser cannot know that preceding PUTs to the field are possibly dead and can be removed. Most of the time that is not important, but just occasionally it can cause a lot of pointless extra computation (calling of amd64g_calculate_rflags_all) to happen. This was observed in a long basic block involved in a hash calculation, like this: rolq .. // sets CC_NDEP to the previous value of the flags, // as calculated by amd64g_calculate_rflags_all mulq .. (rolq/mulq repeated several times) addq .. // effect is, all of the flag computation done for the rol/mul // sequence is irrelevant, but iropt can't see that Setting CC_NDEP consistently to zero, even if it isn't needed, avoids the problem. Diff: --- VEX/priv/guest_amd64_toIR.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/VEX/priv/guest_amd64_toIR.c b/VEX/priv/guest_amd64_toIR.c index ad720873d4..070a1c5bc5 100644 --- a/VEX/priv/guest_amd64_toIR.c +++ b/VEX/priv/guest_amd64_toIR.c @@ -1814,6 +1814,7 @@ void setFlags_DEP1_DEP2 ( IROp op8, IRTemp dep1, IRTemp dep2, IRType ty ) stmt( IRStmt_Put( OFFB_CC_OP, mkU64(ccOp)) ); stmt( IRStmt_Put( OFFB_CC_DEP1, widenUto64(mkexpr(dep1))) ); stmt( IRStmt_Put( OFFB_CC_DEP2, widenUto64(mkexpr(dep2))) ); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); } @@ -1840,6 +1841,7 @@ void setFlags_DEP1 ( IROp op8, IRTemp dep1, IRType ty ) stmt( IRStmt_Put( OFFB_CC_OP, mkU64(ccOp)) ); stmt( IRStmt_Put( OFFB_CC_DEP1, widenUto64(mkexpr(dep1))) ); stmt( IRStmt_Put( OFFB_CC_DEP2, mkU64(0)) ); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); } @@ -1891,6 +1893,8 @@ static void setFlags_DEP1_DEP2_shift ( IROp op64, IRExpr_ITE( mkexpr(guardB), widenUto64(mkexpr(resUS)), IRExpr_Get(OFFB_CC_DEP2,Ity_I64) ) )); + stmt( IRStmt_Put( OFFB_CC_NDEP, + mkU64(0) )); } @@ -1943,6 +1947,7 @@ void setFlags_MUL ( IRType ty, IRTemp arg1, IRTemp arg2, ULong base_op ) } stmt( IRStmt_Put( OFFB_CC_DEP1, widenUto64(mkexpr(arg1)) )); stmt( IRStmt_Put( OFFB_CC_DEP2, widenUto64(mkexpr(arg2)) )); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); } @@ -5486,6 +5491,7 @@ static void fp_do_ucomi_ST0_STi ( UInt i, Bool pop_after ) binop(Iop_CmpF64, get_ST(0), get_ST(i))), mkU64(0x45) ))); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); if (pop_after) fp_pop(); } @@ -10260,6 +10266,7 @@ static Long dis_COMISD ( const VexAbiInfo* vbi, Prefix pfx, binop(Iop_CmpF64, mkexpr(argL), mkexpr(argR)) ), mkU64(0x45) ))); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); return delta; } @@ -10305,6 +10312,7 @@ static Long dis_COMISS ( const VexAbiInfo* vbi, Prefix pfx, unop(Iop_F32toF64,mkexpr(argR)))), mkU64(0x45) ))); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); return delta; } @@ -20608,6 +20616,7 @@ Long dis_ESC_NONE ( ) ) ); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); /* Also need to set the D flag, which is held in bit 10 of t1. If zero, put 1 in OFFB_DFLAG, else -1 in OFFB_DFLAG. */ @@ -29900,6 +29909,7 @@ Long dis_ESC_0F38__VEX ( : AMD64G_CC_OP_ANDN32)) ); stmt( IRStmt_Put( OFFB_CC_DEP1, widenUto64(mkexpr(dst))) ); stmt( IRStmt_Put( OFFB_CC_DEP2, mkU64(0)) ); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); *uses_vvvv = True; goto decode_success; } @@ -29937,6 +29947,7 @@ Long dis_ESC_0F38__VEX ( : AMD64G_CC_OP_BLSI32)) ); stmt( IRStmt_Put( OFFB_CC_DEP1, widenUto64(mkexpr(dst))) ); stmt( IRStmt_Put( OFFB_CC_DEP2, widenUto64(mkexpr(src))) ); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); *uses_vvvv = True; goto decode_success; } @@ -29971,6 +29982,7 @@ Long dis_ESC_0F38__VEX ( : AMD64G_CC_OP_BLSMSK32)) ); stmt( IRStmt_Put( OFFB_CC_DEP1, widenUto64(mkexpr(dst))) ); stmt( IRStmt_Put( OFFB_CC_DEP2, widenUto64(mkexpr(src))) ); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); *uses_vvvv = True; goto decode_success; } @@ -30005,6 +30017,7 @@ Long dis_ESC_0F38__VEX ( : AMD64G_CC_OP_BLSR32)) ); stmt( IRStmt_Put( OFFB_CC_DEP1, widenUto64(mkexpr(dst))) ); stmt( IRStmt_Put( OFFB_CC_DEP2, widenUto64(mkexpr(src))) ); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); *uses_vvvv = True; goto decode_success; } @@ -30074,6 +30087,7 @@ Long dis_ESC_0F38__VEX ( : AMD64G_CC_OP_BLSR32)) ); stmt( IRStmt_Put( OFFB_CC_DEP1, widenUto64(mkexpr(dst))) ); stmt( IRStmt_Put( OFFB_CC_DEP2, widenUto64(mkexpr(cond))) ); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); *uses_vvvv = True; goto decode_success; } @@ -30282,6 +30296,7 @@ Long dis_ESC_0F38__VEX ( : AMD64G_CC_OP_ANDN32)) ); stmt( IRStmt_Put( OFFB_CC_DEP1, widenUto64(mkexpr(dst))) ); stmt( IRStmt_Put( OFFB_CC_DEP2, mkU64(0)) ); + stmt( IRStmt_Put( OFFB_CC_NDEP, mkU64(0) )); *uses_vvvv = True; goto decode_success; } |
|
From: Julian S. <se...@so...> - 2021-07-13 07:35:37
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=0ff3be2f3ba275aa515b29d4aca8dc9208added2 commit 0ff3be2f3ba275aa515b29d4aca8dc9208added2 Author: Julian Seward <js...@ac...> Date: Tue Jul 13 09:34:05 2021 +0200 amd64 front end: more spec rules: S/NS after LOGICW, S after SHRL, Z after SHRW, C after SUBW. This adds a few more spec rules that seem useful for running Firefox built with gcc-O3 and clang-O3. At least one of them removes a false Memcheck error. There is also some improved debug printing, currently #if 0'd. Diff: --- VEX/priv/guest_amd64_helpers.c | 89 +++++++++++++++++++++++++++++++++++------- 1 file changed, 75 insertions(+), 14 deletions(-) diff --git a/VEX/priv/guest_amd64_helpers.c b/VEX/priv/guest_amd64_helpers.c index ab7b64bcc1..af2ddc29c5 100644 --- a/VEX/priv/guest_amd64_helpers.c +++ b/VEX/priv/guest_amd64_helpers.c @@ -1681,6 +1681,21 @@ IRExpr* guest_amd64_spechelper ( const HChar* function_name, mkU32(0))); } + if (isU64(cc_op, AMD64G_CC_OP_LOGICW) && isU64(cond, AMD64CondS)) { + /* word and/or/xor, then S --> (ULong)result[15] */ + return binop(Iop_And64, + binop(Iop_Shr64, cc_dep1, mkU8(15)), + mkU64(1)); + } + if (isU64(cc_op, AMD64G_CC_OP_LOGICW) && isU64(cond, AMD64CondNS)) { + /* word and/or/xor, then S --> (ULong) ~ result[15] */ + return binop(Iop_Xor64, + binop(Iop_And64, + binop(Iop_Shr64, cc_dep1, mkU8(15)), + mkU64(1)), + mkU64(1)); + } + /*---------------- LOGICB ----------------*/ if (isU64(cc_op, AMD64G_CC_OP_LOGICB) && isU64(cond, AMD64CondZ)) { @@ -1798,18 +1813,31 @@ IRExpr* guest_amd64_spechelper ( const HChar* function_name, binop(Iop_Shr64, cc_dep1, mkU8(31)), mkU64(1)); } - // The following looks correct to me, but never seems to happen because - // the front end converts jns to js by switching the fallthrough vs - // taken addresses. See jcc_01(). But then why do other conditions - // considered by this function show up in both variants (xx and Nxx) ? - //if (isU64(cc_op, AMD64G_CC_OP_SHRL) && isU64(cond, AMD64CondNS)) { - // /* SHRL/SARL, then NS --> (ULong) ~ result[31] */ - // vassert(0); - // return binop(Iop_Xor64, - // binop(Iop_And64, - // binop(Iop_Shr64, cc_dep1, mkU8(31)), - // mkU64(1)), - // mkU64(1)); + if (isU64(cc_op, AMD64G_CC_OP_SHRL) && isU64(cond, AMD64CondNS)) { + /* SHRL/SARL, then NS --> (ULong) ~ result[31] */ + return binop(Iop_Xor64, + binop(Iop_And64, + binop(Iop_Shr64, cc_dep1, mkU8(31)), + mkU64(1)), + mkU64(1)); + } + + /*---------------- SHRW ----------------*/ + + if (isU64(cc_op, AMD64G_CC_OP_SHRW) && isU64(cond, AMD64CondZ)) { + /* SHRW, then Z --> test dep1 == 0 */ + return unop(Iop_1Uto64, + binop(Iop_CmpEQ32, + unop(Iop_16Uto32, unop(Iop_64to16, cc_dep1)), + mkU32(0))); + } + // No known test case for this, hence disabled: + //if (isU64(cc_op, AMD64G_CC_OP_SHRW) && isU64(cond, AMD64CondNZ)) { + // /* SHRW, then NZ --> test dep1 == 0 */ + // return unop(Iop_1Uto64, + // binop(Iop_CmpNE32, + // unop(Iop_16Uto32, unop(Iop_64to16, cc_dep1)), + // mkU32(0))); //} /*---------------- COPY ----------------*/ @@ -1902,6 +1930,18 @@ IRExpr* guest_amd64_spechelper ( const HChar* function_name, ); } +# if 0 + if (cond->tag == Iex_Const && cc_op->tag == Iex_Const) { + vex_printf("spec request failed: "); + vex_printf(" %s ", function_name); + for (i = 0; i < 2/*arity*/; i++) { + vex_printf(" "); + ppIRExpr(args[i]); + } + vex_printf("\n"); + } +# endif + return NULL; } @@ -1930,6 +1970,13 @@ IRExpr* guest_amd64_spechelper ( const HChar* function_name, unop(Iop_64to32, cc_dep1), unop(Iop_64to32, cc_dep2))); } + if (isU64(cc_op, AMD64G_CC_OP_SUBW)) { + /* C after sub denotes unsigned less than */ + return unop(Iop_1Uto64, + binop(Iop_CmpLT64U, + binop(Iop_And64,cc_dep1,mkU64(0xFFFF)), + binop(Iop_And64,cc_dep2,mkU64(0xFFFF)))); + } if (isU64(cc_op, AMD64G_CC_OP_SUBB)) { /* C after sub denotes unsigned less than */ return unop(Iop_1Uto64, @@ -1958,8 +2005,10 @@ IRExpr* guest_amd64_spechelper ( const HChar* function_name, /* cflag after logic is zero */ return mkU64(0); } - if (isU64(cc_op, AMD64G_CC_OP_DECL) || isU64(cc_op, AMD64G_CC_OP_INCL) - || isU64(cc_op, AMD64G_CC_OP_DECQ) || isU64(cc_op, AMD64G_CC_OP_INCQ)) { + if (isU64(cc_op, AMD64G_CC_OP_DECL) + || isU64(cc_op, AMD64G_CC_OP_INCL) + || isU64(cc_op, AMD64G_CC_OP_DECQ) + || isU64(cc_op, AMD64G_CC_OP_INCQ)) { /* If the thunk is dec or inc, the cflag is supplied as CC_NDEP. */ return cc_ndep; } @@ -1970,6 +2019,18 @@ IRExpr* guest_amd64_spechelper ( const HChar* function_name, } # endif +# if 0 + if (cc_op->tag == Iex_Const) { + vex_printf("spec request failed: "); + vex_printf(" %s ", function_name); + for (i = 0; i < 2/*arity*/; i++) { + vex_printf(" "); + ppIRExpr(args[i]); + } + vex_printf("\n"); + } +# endif + return NULL; } |
|
From: Julian S. <se...@so...> - 2021-07-13 07:15:13
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=a2becd59ae1d1ef6e774549b203d5c158fa20666 commit a2becd59ae1d1ef6e774549b203d5c158fa20666 Author: Julian Seward <js...@ac...> Date: Tue Jul 13 09:12:43 2021 +0200 Remove redundant assertions and conditionals in move_CEnt_to_top. move_CEnt_to_top is on the hot path when reading large amounts of debug info, especially Dwarf inlined-function info. It shows up in 'perf' profiles. This commit removes assertions which are asserted elsewhere, and tries to avoid a couple of conditional branches. Diff: --- coregrind/m_debuginfo/image.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/coregrind/m_debuginfo/image.c b/coregrind/m_debuginfo/image.c index acb09523b6..ebe6dfcfe8 100644 --- a/coregrind/m_debuginfo/image.c +++ b/coregrind/m_debuginfo/image.c @@ -523,14 +523,24 @@ static void realloc_CEnt ( DiImage* img, UInt entNo, SizeT szB, Bool fromC ) to make space. */ static void move_CEnt_to_top ( DiImage* img, UInt entNo ) { - vg_assert(img->ces_used <= CACHE_N_ENTRIES); - vg_assert(entNo > 0 && entNo < img->ces_used); - CEnt* tmp = img->ces[entNo]; - while (entNo > 0) { + vg_assert(entNo < img->ces_used); + if (LIKELY(entNo == 1)) { + CEnt* tmp = img->ces[1]; + img->ces[entNo] = img->ces[0]; + img->ces[0] = tmp; + } else { + vg_assert(entNo > 1); // a.k.a. >= 2 + CEnt* tmp = img->ces[entNo]; img->ces[entNo] = img->ces[entNo-1]; entNo--; + img->ces[entNo] = img->ces[entNo-1]; + entNo--; + while (entNo > 0) { + img->ces[entNo] = img->ces[entNo-1]; + entNo--; + } + img->ces[0] = tmp; } - img->ces[0] = tmp; } /* Set the given entry so that it has a chunk of the file containing |
|
From: Julian S. <se...@so...> - 2021-07-13 07:11:15
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=e5f66a2aa00fa88ba3e0fb004510f0a630881ef1 commit e5f66a2aa00fa88ba3e0fb004510f0a630881ef1 Author: Julian Seward <js...@ac...> Date: Tue Jul 13 09:07:45 2021 +0200 Reimplement h_generic_calc_GetMSBs8x16 to be more efficient. h_generic_calc_GetMSBs8x16 concatenates the top bit of each 8-bit lane in a 128-bit value, producing a 16-bit scalar value. (It is PMOVMSKB, really). The existing implementation is excessively inefficient and shows up sometimes in 'perf' profiles of V. This commit replaces it with a logarithmic (4-stage) algorithm which is hopefully much faster. Diff: --- VEX/priv/host_generic_simd128.c | 31 ++++++++++++++----------------- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/VEX/priv/host_generic_simd128.c b/VEX/priv/host_generic_simd128.c index 1c0f7cfbaf..f895de46f4 100644 --- a/VEX/priv/host_generic_simd128.c +++ b/VEX/priv/host_generic_simd128.c @@ -383,23 +383,20 @@ void VEX_REGPARM(3) UInt /*not-regparm*/ h_generic_calc_GetMSBs8x16 ( ULong w64hi, ULong w64lo ) { - UInt r = 0; - if (w64hi & (1ULL << (64-1))) r |= (1<<15); - if (w64hi & (1ULL << (56-1))) r |= (1<<14); - if (w64hi & (1ULL << (48-1))) r |= (1<<13); - if (w64hi & (1ULL << (40-1))) r |= (1<<12); - if (w64hi & (1ULL << (32-1))) r |= (1<<11); - if (w64hi & (1ULL << (24-1))) r |= (1<<10); - if (w64hi & (1ULL << (16-1))) r |= (1<<9); - if (w64hi & (1ULL << ( 8-1))) r |= (1<<8); - if (w64lo & (1ULL << (64-1))) r |= (1<<7); - if (w64lo & (1ULL << (56-1))) r |= (1<<6); - if (w64lo & (1ULL << (48-1))) r |= (1<<5); - if (w64lo & (1ULL << (40-1))) r |= (1<<4); - if (w64lo & (1ULL << (32-1))) r |= (1<<3); - if (w64lo & (1ULL << (24-1))) r |= (1<<2); - if (w64lo & (1ULL << (16-1))) r |= (1<<1); - if (w64lo & (1ULL << ( 8-1))) r |= (1<<0); + /* Some serious bit twiddling going on here. Mostly we can do it in + parallel for the upper and lower 64 bits, assuming the processor offers + a suitably high level of ILP. */ + w64hi &= 0x8080808080808080ULL; + w64lo &= 0x8080808080808080ULL; + w64hi >>= 7; + w64lo >>= 7; + w64hi |= (w64hi >> 7); + w64lo |= (w64lo >> 7); + w64hi |= (w64hi >> 14); + w64lo |= (w64lo >> 14); + w64hi |= (w64hi >> 28); + w64lo |= (w64lo >> 28); + UInt r = ((w64hi & 0xFF) << 8) | (w64lo & 0xFF); return r; } |