You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
1
(1) |
2
(2) |
3
(2) |
4
(3) |
5
(1) |
|
6
(2) |
7
|
8
|
9
|
10
(1) |
11
|
12
|
|
13
|
14
(2) |
15
(27) |
16
(1) |
17
(4) |
18
(4) |
19
|
|
20
|
21
(1) |
22
(2) |
23
|
24
(2) |
25
|
26
(2) |
|
27
|
28
(22) |
29
(5) |
30
(3) |
31
(6) |
|
|
|
From: John R. <jr...@bi...> - 2017-08-31 11:46:09
|
> My motivation is a huge binary which takes a lot of time to instrument and which > is executed frequently during many many test suite runs without any > human intervention. If you have only a hammer [memcheck] then everything begins to look like a nail. Probably you should enlarge your toolbox instead of trying to optimize memcheck. Please be less coy about the huge binary. How big is it? How many shared libraries? What is the total /usr/bin/size, particularly .text? What programming languages does it employ? How much address space does it use at run time? How much time does memcheck take? How many machines are you running round-the-clock for this test suite? [Yes, the numerical answers to each question do matter.] Probably the software has very poor quality: few unit tests, undocumented design and implementation strategy, little or no consideration of testability. So: Apply profiling to the subroutine call graph. Use code coverage analysis. Look at the bugs in the last year. Look at the changes to the source code in the last year [each change is a proxy for a bug.] Identify the 20% of the source that is responsible for 80% of the bugs. Attack that 20% using divide-and-conquer: There should be a three-level hierarchy of pieces. Develop the unit tests for the lowest-level pieces and the integration tests for each node in the hierarchy. Apply profiling + coverage + memcheck at each node. (Expect 6 months. Hire two graduate students and a manager [perhaps yourself: but it will take 25% of your time].) -- |
|
From: John R. <jr...@bi...> - 2017-08-31 11:11:16
|
>> None of that is it real reason I didn't pursue it, though. The real reason >> is address space layout randomization. Because different libraries get loaded >> at different addresses in subsequent runs, > > Duh. What I meant is "because *the same* library gets loaded at different .." Apply 'prelink' to [a copy of] each shared library, specifying the particular address at which the library was loaded when the VEX translation was performed. -- |
|
From: Julian S. <se...@so...> - 2017-08-31 09:17:40
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=5cc18ff30bcf951201a4e1d8058a7b215367351d commit 5cc18ff30bcf951201a4e1d8058a7b215367351d Author: Julian Seward <js...@ac...> Date: Thu Aug 31 11:11:25 2017 +0200 Improve the implementation of expensiveCmpEQorNE. .. so that the code it creates runs in approximately half the time it did before. This is in support of making the cost of expensive (exactly) integer EQ/NE as low as possible, since the day will soon come when we'll need to enable this by default. Diff: --- memcheck/mc_translate.c | 157 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 111 insertions(+), 46 deletions(-) diff --git a/memcheck/mc_translate.c b/memcheck/mc_translate.c index 980c1d7..44b6a73 100644 --- a/memcheck/mc_translate.c +++ b/memcheck/mc_translate.c @@ -937,6 +937,46 @@ static IRAtom* mkPCastXXtoXXlsb ( MCEnv* mce, IRAtom* varg, IRType ty ) tl_assert(0); } +/* --------- Optimistic casts. --------- */ + +/* The function takes and returns an expression of type TY. If any of the + VBITS indicate defined (value == 0) the resulting expression has all bits + set to 0. Otherwise, all bits are 1. In words, if any bits are defined + then all bits are made to be defined. + + In short we compute (vbits - (vbits >>u 1)) >>s (bitsize(vbits)-1). +*/ +static IRAtom* mkOCastAt( MCEnv* mce, IRType ty, IRAtom* vbits ) +{ + IROp opSUB, opSHR, opSAR; + UInt sh; + + switch (ty) { + case Ity_I64: + opSUB = Iop_Sub64; opSHR = Iop_Shr64; opSAR = Iop_Sar64; sh = 63; + break; + case Ity_I32: + opSUB = Iop_Sub32; opSHR = Iop_Shr32; opSAR = Iop_Sar32; sh = 31; + break; + case Ity_I16: + opSUB = Iop_Sub16; opSHR = Iop_Shr16; opSAR = Iop_Sar16; sh = 15; + break; + case Ity_I8: + opSUB = Iop_Sub8; opSHR = Iop_Shr8; opSAR = Iop_Sar8; sh = 7; + break; + default: + ppIRType(ty); + VG_(tool_panic)("mkOCastTo"); + } + + IRAtom *shr1, *at; + shr1 = assignNew('V', mce,ty, binop(opSHR, vbits, mkU8(1))); + at = assignNew('V', mce,ty, binop(opSUB, vbits, shr1)); + at = assignNew('V', mce,ty, binop(opSAR, at, mkU8(sh))); + return at; +} + + /* --------- Accurate interpretation of CmpEQ/CmpNE. --------- */ /* Normally, we can do CmpEQ/CmpNE by doing UifU on the arguments, and @@ -951,12 +991,12 @@ static IRAtom* mkPCastXXtoXXlsb ( MCEnv* mce, IRAtom* varg, IRType ty ) PCastTo<1> ( -- naive version - PCastTo<sz>( UifU<sz>(vxx, vyy) ) + UifU<sz>(vxx, vyy) `DifD<sz>` -- improvement term - PCastTo<sz>( PCast<sz>( CmpEQ<sz> ( vec, 1...1 ) ) ) + OCast<sz>(vec) ) where @@ -967,27 +1007,47 @@ static IRAtom* mkPCastXXtoXXlsb ( MCEnv* mce, IRAtom* varg, IRType ty ) vyy, // 0 iff bit defined Not<sz>(Xor<sz>( xx, yy )) // 0 iff bits different ) - + If any bit of vec is 0, the result is defined and so the improvement term should produce 0...0, else it should produce 1...1. Hence require for the improvement term: - if vec == 1...1 then 1...1 else 0...0 - -> - PCast<sz>( CmpEQ<sz> ( vec, 1...1 ) ) + OCast(vec) = if vec == 1...1 then 1...1 else 0...0 + + which you can think of as an "optimistic cast" (OCast, the opposite of + the normal "pessimistic cast" (PCast) family. An OCast says all bits + are defined if any bit is defined. + + It is possible to show that + + if vec == 1...1 then 1...1 else 0...0 + + can be implemented in straight-line code as + + (vec - (vec >>u 1)) >>s (word-size-in-bits - 1) + + We note that vec contains the sub-term Or<sz>(vxx, vyy). Since UifU is + implemented with Or (since 1 signifies undefinedness), this is a + duplicate of the UifU<sz>(vxx, vyy) term and so we can CSE it out, giving + a final version of: - This was extensively re-analysed and checked on 6 July 05. + let naive = UifU<sz>(vxx, vyy) + vec = Or<sz>(naive, Not<sz>(Xor<sz)(xx, yy)) + in + PCastTo<1>( DifD<sz>(naive, OCast<sz>(vec)) ) + + This was extensively re-analysed and checked on 6 July 05 and again + in July 2017. */ static IRAtom* expensiveCmpEQorNE ( MCEnv* mce, IRType ty, IRAtom* vxx, IRAtom* vyy, IRAtom* xx, IRAtom* yy ) { - IRAtom *naive, *vec, *improvement_term; - IRAtom *improved, *final_cast, *top; - IROp opDIFD, opUIFU, opXOR, opNOT, opCMP, opOR; + IRAtom *naive, *vec, *improved, *final_cast; + IROp opDIFD, opUIFU, opOR, opXOR, opNOT; tl_assert(isShadowAtom(mce,vxx)); tl_assert(isShadowAtom(mce,vyy)); @@ -997,57 +1057,54 @@ static IRAtom* expensiveCmpEQorNE ( MCEnv* mce, tl_assert(sameKindedAtoms(vyy,yy)); switch (ty) { + case Ity_I8: + opDIFD = Iop_And8; + opUIFU = Iop_Or8; + opOR = Iop_Or8; + opXOR = Iop_Xor8; + opNOT = Iop_Not8; + break; case Ity_I16: - opOR = Iop_Or16; opDIFD = Iop_And16; opUIFU = Iop_Or16; - opNOT = Iop_Not16; + opOR = Iop_Or16; opXOR = Iop_Xor16; - opCMP = Iop_CmpEQ16; - top = mkU16(0xFFFF); + opNOT = Iop_Not16; break; case Ity_I32: - opOR = Iop_Or32; opDIFD = Iop_And32; opUIFU = Iop_Or32; - opNOT = Iop_Not32; + opOR = Iop_Or32; opXOR = Iop_Xor32; - opCMP = Iop_CmpEQ32; - top = mkU32(0xFFFFFFFF); + opNOT = Iop_Not32; break; case Ity_I64: - opOR = Iop_Or64; opDIFD = Iop_And64; opUIFU = Iop_Or64; - opNOT = Iop_Not64; + opOR = Iop_Or64; opXOR = Iop_Xor64; - opCMP = Iop_CmpEQ64; - top = mkU64(0xFFFFFFFFFFFFFFFFULL); + opNOT = Iop_Not64; break; default: VG_(tool_panic)("expensiveCmpEQorNE"); } naive - = mkPCastTo(mce,ty, - assignNew('V', mce, ty, binop(opUIFU, vxx, vyy))); + = assignNew('V', mce, ty, binop(opUIFU, vxx, vyy)); vec = assignNew( 'V', mce,ty, binop( opOR, - assignNew('V', mce,ty, binop(opOR, vxx, vyy)), + naive, assignNew( - 'V', mce,ty, - unop( opNOT, - assignNew('V', mce,ty, binop(opXOR, xx, yy)))))); - - improvement_term - = mkPCastTo( mce,ty, - assignNew('V', mce,Ity_I1, binop(opCMP, vec, top))); + 'V', mce,ty, + unop(opNOT, + assignNew('V', mce,ty, binop(opXOR, xx, yy)))))); improved - = assignNew( 'V', mce,ty, binop(opDIFD, naive, improvement_term) ); + = assignNew( 'V', mce,ty, + binop(opDIFD, naive, mkOCastAt(mce, ty, vec))); final_cast = mkPCastTo( mce, Ity_I1, improved ); @@ -4087,12 +4144,9 @@ IRAtom* expr2vbits_Binop ( MCEnv* mce, case Iop_Add8: return mkLeft8(mce, mkUifU8(mce, vatom1,vatom2)); - case Iop_CmpEQ64: - case Iop_CmpNE64: - if (mce->bogusLiterals) - goto expensive_cmp64; - else - goto cheap_cmp64; + ////---- CmpXX64 + case Iop_CmpEQ64: case Iop_CmpNE64: + if (mce->bogusLiterals) goto expensive_cmp64; else goto cheap_cmp64; expensive_cmp64: case Iop_ExpCmpNE64: @@ -4103,12 +4157,9 @@ IRAtom* expr2vbits_Binop ( MCEnv* mce, case Iop_CmpLT64U: case Iop_CmpLT64S: return mkPCastTo(mce, Ity_I1, mkUifU64(mce, vatom1,vatom2)); - case Iop_CmpEQ32: - case Iop_CmpNE32: - if (mce->bogusLiterals) - goto expensive_cmp32; - else - goto cheap_cmp32; + ////---- CmpXX32 + case Iop_CmpEQ32: case Iop_CmpNE32: + if (mce->bogusLiterals) goto expensive_cmp32; else goto cheap_cmp32; expensive_cmp32: case Iop_ExpCmpNE32: @@ -4119,15 +4170,29 @@ IRAtom* expr2vbits_Binop ( MCEnv* mce, case Iop_CmpLT32U: case Iop_CmpLT32S: return mkPCastTo(mce, Ity_I1, mkUifU32(mce, vatom1,vatom2)); + ////---- CmpXX16 case Iop_CmpEQ16: case Iop_CmpNE16: - return mkPCastTo(mce, Ity_I1, mkUifU16(mce, vatom1,vatom2)); + if (mce->bogusLiterals) goto expensive_cmp16; else goto cheap_cmp16; + expensive_cmp16: case Iop_ExpCmpNE16: return expensiveCmpEQorNE(mce,Ity_I16, vatom1,vatom2, atom1,atom2 ); + cheap_cmp16: + return mkPCastTo(mce, Ity_I1, mkUifU16(mce, vatom1,vatom2)); + + ////---- CmpXX8 case Iop_CmpEQ8: case Iop_CmpNE8: + if (mce->bogusLiterals) goto expensive_cmp8; else goto cheap_cmp8; + + expensive_cmp8: + return expensiveCmpEQorNE(mce,Ity_I8, vatom1,vatom2, atom1,atom2 ); + + cheap_cmp8: return mkPCastTo(mce, Ity_I1, mkUifU8(mce, vatom1,vatom2)); + ////---- end CmpXX{64,32,16,8} + case Iop_CasCmpEQ8: case Iop_CasCmpNE8: case Iop_CasCmpEQ16: case Iop_CasCmpNE16: case Iop_CasCmpEQ32: case Iop_CasCmpNE32: |
|
From: Julian S. <js...@ac...> - 2017-08-31 08:29:09
|
On 31/08/17 10:25, Julian Seward wrote: > None of that is it real reason I didn't persue it, though. The real reason > is address space layout randomization. Because different libraries get loaded > at different addresses in subsequent runs, Duh. What I meant is "because *the same* library gets loaded at different .." J |
|
From: Julian S. <js...@ac...> - 2017-08-31 08:25:14
|
> What would be the major challenges here? > My preliminary idea was that trans-cache could request blocks either > from VEX or from the pre-image. I've thought about this a couple of times in the past but never did anything about it. One of the reasons is that I thought it would be difficult to do and actually get a win. Eg, for starting Firefox on Memcheck, the JIT needs to process about 500,000 blocks, giving about 300MB of instrumented code. If we say (perhaps somewhat optimistically) that the JIT can process about 10000 blocks/sec, then that is 50 seconds of computation. In order to get a win, we'd need to be able to at least compute a hash of the block to be jitted (based on the instruction bytes), find the offset of the block in our memory-mapped file, and pull in the relevant translation, all in around 100 microseconds. I might be persuaded that this is doable if the cache file is in the filesystem cache, but as soon as we hit backing storage (especially if it's a rotating disk) I think our prospects are poor. None of that is it real reason I didn't persue it, though. The real reason is address space layout randomization. Because different libraries get loaded at different addresses in subsequent runs, this will cause the hit rate on the cache to be zero for the libraries involved. This implies that the load address for the library somehow needs to be incorporated in the cache keys that we're using. And that's true because the front ends (guest_amd64_toIR.c, etc) bake into the IR, values derived from the program counter: branch target addresses, and PC-relative load/store addresses. I can't see any way around this without major re-engineering of the JITs. Because what we'd need to somehow parameterise the cache so that we could look up a translation independent of its load address, and then if found, patch up the old version so it works for the "new" address. > If you are considering translating the entire program and caching it, I > think that would be much faster, Mhm, but then you have the problem of finding all the code that is part of the program, which is equivalent to solving the halting problem. ----- For these reasons, my preference is to make the JIT faster, and ultimately to move to having a "two speed" JIT. That is, where code initially is instrumented using a fast and low quality JIT, to reduce latency and to gather branch and block-use statistics. When we decide a particular path is hot enough then those blocks are given to a slower, optimising JIT, so we ultimately get both low latency for cold paths and high performance for hot paths. This seems to be the "modern way". Also, the optimising JIT can run in a helper thread, so in effect we never have to wait for it, because we can just use the unoptimised version of a (super)block until the optimised version is ready. J |
|
From: Julian S. <js...@ac...> - 2017-08-31 07:46:17
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=35037b3ba1a4a7fa30ed53713bc10b00b420a5d3 commit 35037b3ba1a4a7fa30ed53713bc10b00b420a5d3 Author: Julian Seward <js...@ac...> Date: Wed Aug 30 19:43:59 2017 +0200 amd64 back end: handle CmpNEZ64(And64(x,y)) better; ditto the 32 bit case. Handle CmpNEZ64(And64(x,y)) by branching on flags, similarly to CmpNEZ64(Or64(x,y)). Ditto the 32 bit equivalents. Also, remove expensive DEFINE_PATTERN/DECLARE_PATTERN uses there and hardwire the matching logic. n-i-bz. This is in support of reducing the cost of expensiveCmpEQorNE in memcheck. Diff: --- VEX/priv/host_amd64_isel.c | 53 +++++++++++++++++++++++++++++----------------- 1 file changed, 33 insertions(+), 20 deletions(-) diff --git a/VEX/priv/host_amd64_isel.c b/VEX/priv/host_amd64_isel.c index ecd57e7..1787e87 100644 --- a/VEX/priv/host_amd64_isel.c +++ b/VEX/priv/host_amd64_isel.c @@ -2205,8 +2205,6 @@ static AMD64CondCode iselCondCode ( ISelEnv* env, const IRExpr* e ) /* DO NOT CALL THIS DIRECTLY ! */ static AMD64CondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e ) { - MatchInfo mi; - vassert(e); vassert(typeOfIRExpr(env->type_env,e) == Ity_I1); @@ -2277,10 +2275,25 @@ static AMD64CondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e ) /* --- patterns rooted at: CmpNEZ32 --- */ - /* CmpNEZ32(x) */ - if (e->tag == Iex_Unop + if (e->tag == Iex_Unop && e->Iex.Unop.op == Iop_CmpNEZ32) { - HReg r1 = iselIntExpr_R(env, e->Iex.Unop.arg); + IRExpr* arg = e->Iex.Unop.arg; + if (arg->tag == Iex_Binop + && (arg->Iex.Binop.op == Iop_Or32 + || arg->Iex.Binop.op == Iop_And32)) { + /* CmpNEZ32(Or32(x,y)) */ + /* CmpNEZ32(And32(x,y)) */ + HReg r0 = iselIntExpr_R(env, arg->Iex.Binop.arg1); + AMD64RMI* rmi1 = iselIntExpr_RMI(env, arg->Iex.Binop.arg2); + HReg tmp = newVRegI(env); + addInstr(env, mk_iMOVsd_RR(r0, tmp)); + addInstr(env, AMD64Instr_Alu32R( + arg->Iex.Binop.op == Iop_Or32 ? Aalu_OR : Aalu_AND, + rmi1, tmp)); + return Acc_NZ; + } + /* CmpNEZ32(x) */ + HReg r1 = iselIntExpr_R(env, arg); AMD64RMI* rmi2 = AMD64RMI_Imm(0); addInstr(env, AMD64Instr_Alu32R(Aalu_CMP,rmi2,r1)); return Acc_NZ; @@ -2288,25 +2301,25 @@ static AMD64CondCode iselCondCode_wrk ( ISelEnv* env, const IRExpr* e ) /* --- patterns rooted at: CmpNEZ64 --- */ - /* CmpNEZ64(Or64(x,y)) */ - { - DECLARE_PATTERN(p_CmpNEZ64_Or64); - DEFINE_PATTERN(p_CmpNEZ64_Or64, - unop(Iop_CmpNEZ64, binop(Iop_Or64, bind(0), bind(1)))); - if (matchIRExpr(&mi, p_CmpNEZ64_Or64, e)) { - HReg r0 = iselIntExpr_R(env, mi.bindee[0]); - AMD64RMI* rmi1 = iselIntExpr_RMI(env, mi.bindee[1]); + if (e->tag == Iex_Unop + && e->Iex.Unop.op == Iop_CmpNEZ64) { + IRExpr* arg = e->Iex.Unop.arg; + if (arg->tag == Iex_Binop + && (arg->Iex.Binop.op == Iop_Or64 + || arg->Iex.Binop.op == Iop_And64)) { + /* CmpNEZ64(Or64(x,y)) */ + /* CmpNEZ64(And64(x,y)) */ + HReg r0 = iselIntExpr_R(env, arg->Iex.Binop.arg1); + AMD64RMI* rmi1 = iselIntExpr_RMI(env, arg->Iex.Binop.arg2); HReg tmp = newVRegI(env); addInstr(env, mk_iMOVsd_RR(r0, tmp)); - addInstr(env, AMD64Instr_Alu64R(Aalu_OR,rmi1,tmp)); + addInstr(env, AMD64Instr_Alu64R( + arg->Iex.Binop.op == Iop_Or64 ? Aalu_OR : Aalu_AND, + rmi1, tmp)); return Acc_NZ; } - } - - /* CmpNEZ64(x) */ - if (e->tag == Iex_Unop - && e->Iex.Unop.op == Iop_CmpNEZ64) { - HReg r1 = iselIntExpr_R(env, e->Iex.Unop.arg); + /* CmpNEZ64(x) */ + HReg r1 = iselIntExpr_R(env, arg); AMD64RMI* rmi2 = AMD64RMI_Imm(0); addInstr(env, AMD64Instr_Alu64R(Aalu_CMP,rmi2,r1)); return Acc_NZ; |