You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
1
(14) |
2
(16) |
3
(13) |
4
(3) |
|
5
(18) |
6
(1) |
7
(6) |
8
(2) |
9
(16) |
10
(19) |
11
(14) |
|
12
(1) |
13
(6) |
14
(20) |
15
(26) |
16
(18) |
17
(15) |
18
(16) |
|
19
(7) |
20
(8) |
21
(19) |
22
(19) |
23
(21) |
24
(15) |
25
(15) |
|
26
(11) |
27
(17) |
28
(21) |
29
(14) |
|
|
|
|
From: Philippe W. <phi...@sk...> - 2012-02-13 21:45:40
|
I quickly modified Valgrind to have it multi-threaded.
The approach:
1. is only done with the "pipe" locking
2. is using in total 3 "pipe semaphores" to implement a reader/writer
locking schema. (using atomic increment/decrement would allow
to have only two semaphores).
3. only none tool "works" (?) (see below for the definition of "works").
4. is a real hack. ("calling this a hack is an insult to other hacks").
A.o. - I have disabled some assertions related to the dispatch_cntr
(which is a global variable now and should be per thread).
- the concept of "the" running tid being broken, several other
asserts blindly commented
- no search for any efficiency. E.g. upgrading a read lock to
a write lock is done by releasing the read lock, and obtaining
a write lock. E.g. if there is a mixture of threads blocked
in a syscall and burning, the scheduling is slower
- it is missing for sure plently of 'multi-thread related' asserts
- zillions of things are probably not working properly
e.g. debug printing, ...
- ....
In any case, here are some results:
time ./vg-in-place --tool=none ./gdbserver_tests/parallel_sleepers 50000 0 50000 B-B-B-B-
==5365== Nulgrind, the minimal Valgrind tool
....
real 0m9.920s
user 0m38.038s
sys 0m0.188s
time ../trunk_untouched/vg-in-place --tool=none
./gdbserver_tests/parallel_sleepers 50000 0 50000 B-B-B-B-
==5374== Nulgrind, the minimal Valgrind tool
....
real 0m38.569s
user 0m37.410s
sys 0m0.180s
So, we see that parallel_sleepers (which is in fact 4 threads burning CPU)
is taking about 38.5 seconds real and 37.4 cpu with the trunk, and is
taking about 9.9 seconds real and 38 seconds cpu with the hacked Valgrind.
Of course, only none tool is done :).
And of course, it is just an unvalidated hack done in 3 hours.
(185 tests failing, but only 22 none tests failing :)
But if we can make the tools somewhat multi-threaded and the rest of
the coregrind properly protected, it looks promising ....
I am attaching the patch just for information/starting a discussion.
It is for sure totally out of question to commit this.
Philippe
|
|
From: Stefan S. <en...@ho...> - 2012-02-13 13:21:14
|
On 12/11/2009 02:17 PM, Zoltan Herczeg wrote: > Hi, > > My name is Zoltan Herczeg from Hungary, and I have created a new memory > analyzer tool for valgirnd, called freya. Freya has a different purpose > compared to massif: organizing the results by custom rules. > > A description of freya can be found here: http://webkit.sed.hu/node/29 > I used it for analyizng WebKit's memory consumption: > http://webkit.sed.hu/node/31 > > I hope you find it interesting. Hi, is there a repository with the latest version up somewhere? Or would it make sense to contribute this to valgrind? Stefan > Zoltan > > PS: I really like valgrind. The code (and API) is well organized, > everything can be found in no-time, and everything works in way as you > expect it. > > > > ------------------------------------------------------------------------------ > Return on Information: > Google Enterprise Search pays you back > Get the facts. > http://p.sf.net/sfu/google-dev2dev > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: <sv...@va...> - 2012-02-13 08:55:06
|
Author: bart Date: 2012-02-13 08:50:32 +0000 (Mon, 13 Feb 2012) New Revision: 12382 Log: Update Subversion ignore list Modified: trunk/none/tests/ Property changes on: trunk/none/tests ___________________________________________________________________ Name: svn:ignore - *.dSYM *.so *.stderr.diff* *.stderr.out *.stdout.diff* *.stdout.out .deps allexec32 allexec64 ansi args async-sigs as_mmap as_shm bitfield1 blockfault bug129866 closeall coolo_sigaction coolo_strlen discard exec-sigmask execve faultstatus fcntl_setown fdleak_cmsg fdleak_creat fdleak_dup fdleak_dup2 fdleak_fcntl fdleak_ipv4 fdleak_open fdleak_pipe fdleak_socketpair floored fork fucomip gxx304 insn_basic insn_basic.c insn_cmov insn_cmov.c insn_fpu insn_fpu.c insn_mmx insn_mmx.c insn_mmxext insn_mmxext.c insn_sse insn_sse.c insn_sse2 insn_sse2.c Makefile Makefile.in manythreads map_unaligned map_unmap mmap_fcntl_bug mq mremap mremap2 munmap_exe nestedfns pending pluto procfs-cmdline-exe pth_atfork1 pth_blockedsig pth_cancel1 pth_cancel2 pth_cvsimple pth_detached pth_empty pth_exit pth_exit2 pth_mutexspeed pth_once pth_rwlock pth_semaphore1 pth_simple_mutex pth_simple_threads pth_specific pth_stackalign pth_yield rcrl readline1 resolv res_search require-text-symbol rlimit_nofile selfrun sem semlimit sha1_test shortpush shorts sigstackgrowth smc1 stackgrowth susphello syscall-restart1 syscall-restart2 syslog system thread-exits threaded-fork threadederrno timestamp tls valgrind_cpp_test vgcore.* vgprintf yield + *.dSYM *.so *.stderr.diff* *.stderr.out *.stdout.diff* *.stdout.out .deps allexec32 allexec64 ansi args async-sigs as_mmap as_shm bitfield1 blockfault bug129866 closeall coolo_sigaction coolo_strlen discard exec-sigmask execve faultstatus fcntl_setown fdleak_cmsg fdleak_creat fdleak_dup fdleak_dup2 fdleak_fcntl fdleak_ipv4 fdleak_open fdleak_pipe fdleak_socketpair floored fork fucomip gxx304 insn_basic insn_basic.c insn_cmov insn_cmov.c insn_fpu insn_fpu.c insn_mmx insn_mmx.c insn_mmxext insn_mmxext.c insn_sse insn_sse.c insn_sse2 insn_sse2.c Makefile Makefile.in manythreads map_unaligned map_unmap mmap_fcntl_bug mq mremap mremap2 munmap_exe nestedfns pending pluto process_vm_readv_writev procfs-cmdline-exe pth_atfork1 pth_blockedsig pth_cancel1 pth_cancel2 pth_cvsimple pth_detached pth_empty pth_exit pth_exit2 pth_mutexspeed pth_once pth_rwlock pth_semaphore1 pth_simple_mutex pth_simple_threads pth_specific pth_stackalign pth_yield rcrl readline1 require-text-symbol resolv res_search rlimit_nofile selfrun sem semlimit sha1_test shortpush shorts sigstackgrowth smc1 stackgrowth susphello syscall-restart1 syscall-restart2 syslog system thread-exits threaded-fork threadederrno timestamp tls valgrind_cpp_test vgcore.* vgprintf yield |
|
From: <sv...@va...> - 2012-02-13 08:52:31
|
Author: bart
Date: 2012-02-13 08:47:51 +0000 (Mon, 13 Feb 2012)
New Revision: 12381
Log:
nightly build: Run nightly build also if only VEX has been modified. Check out matching revisions of Valgrind and VEX instead of using latest VEX when checking out the (today - 1) source code.
Modified:
trunk/nightly/bin/nightly
Modified: trunk/nightly/bin/nightly
===================================================================
--- trunk/nightly/bin/nightly 2012-02-10 16:45:01 UTC (rev 12380)
+++ trunk/nightly/bin/nightly 2012-02-13 08:47:51 UTC (rev 12381)
@@ -9,13 +9,21 @@
# Helper functions
#----------------------------------------------------------------------------
-# Returns the revision number of the source files with date $1.
+# Returns the revision number for the source files at date $1 in Subversion
+# repo $2. Note: the "--depth" command line argument is supported from
+# Subversion version 1.5 on.
get_svn_revision() {
- (cd $DIR; rm -rf infodir;
- svn co -r "{$1}" "${valgrind_svn_repo}/nightly" infodir > /dev/null;
- revno=`svn info infodir | sed -n 's/^Revision: //p'`;
- rm -rf infodir;
- echo $revno)
+ (
+ cd $DIR
+ rm -rf infodir
+ if ! svn co -r "{$1}" --depth empty "$2" infodir > /dev/null 2>&1; then
+ # Subversion 1.4 or before.
+ rm -rf infodir
+ svn co -r "{$1}" --non-recursive "$2" infodir > /dev/null
+ fi
+ svn info infodir | sed -n 's/^Revision: //p'
+ rm -rf infodir
+ )
}
runcmd () {
@@ -52,6 +60,7 @@
#----------------------------------------------------------------------------
valgrind_svn_repo="svn://svn.valgrind.org/valgrind/trunk"
+vex_svn_repo="svn://svn.valgrind.org/vex/trunk"
# Must have exactly two arguments
if [ $# -ne 2 ] ; then
@@ -112,10 +121,13 @@
# Check out, build, test
#----------------------------------------------------------------------------
-svn_old_rev="`get_svn_revision ${svn_old_date}`"
-svn_new_rev="`get_svn_revision ${svn_new_date}`"
-if [ "${svn_old_rev}" = "${svn_new_rev}" ]; then
- echo "Both {$svn_old_date} and {$svn_new_date} correspond to r${svn_new_rev}"\
+vg_old_rev="`get_svn_revision ${svn_old_date} ${valgrind_svn_repo}`"
+vg_new_rev="`get_svn_revision ${svn_new_date} ${valgrind_svn_repo}`"
+vex_old_rev="`get_svn_revision ${svn_old_date} ${vex_svn_repo}`"
+vex_new_rev="`get_svn_revision ${svn_new_date} ${vex_svn_repo}`"
+if [ "${vg_old_rev}" = "${vg_new_rev}" -a "${vex_old_rev}" = "${vex_new_rev}" ]
+then
+ echo "Both {$svn_old_date} and {$svn_new_date} correspond to Valgrind r${vg_new_rev} / VEX r${vex_new_rev}"\
"-- skipping nightly build." >unchanged.log
exit 0
fi
@@ -141,7 +153,8 @@
# Check out, build, run tests
runcmd $logfile \
"Checking out valgrind source tree" \
- "svn co ${valgrind_svn_repo} -r {$svn_date} valgrind-$logfile" && \
+ "svn co ${valgrind_svn_repo} -r {$svn_date} valgrind-$logfile\
+ && svn update -r {$svn_date} valgrind-$logfile/VEX" && \
\
runcmd $logfile \
"Configuring valgrind " \
|
|
From: Florian K. <br...@ac...> - 2012-02-13 00:25:38
|
On 02/12/2012 04:05 PM, Julian Seward wrote: > > Great stuff, especially the numbers. > >> Putting an upper bound on the number of nodes we allow sameIREXprs >> to recurse into makes sense. But since proving equality was key >> to fixing #287260 and recursing is not the norm I suggest to use >> a large limit. The patch sets it to 100. > > Having a natural aversion to global state, I'd have maybe preferred > a depth limit, but no matter. If we ever come to multithread the JIT > we'll have way bigger problems to deal with. > The depth is not as good a measure as is the number of nodes. A binary tree of depth 4 can have anywhere between 4 nodes and 15 nodes. That being said, the global state could still be avoided. But I left it as is for now. > Re the 100, this is my only concern. I think that we could still get > hit pretty hard in the worst case. > I changed it down to 30 and committed it. The numbers I had reported are deceiving. Although the number of traversed nodes was <= 25 in all cases it can be much higher. I just ran some proprietary application which showed max_nodes = 62 !! Which only confirms what we already know: improving the perf bucket with a larger variety of real apps would be a very good thing... Florian |
|
From: <sv...@va...> - 2012-02-13 00:11:05
|
Author: florian
Date: 2012-02-13 00:06:29 +0000 (Mon, 13 Feb 2012)
New Revision: 2246
Log:
This patch is a follow-up to r2244 which fixed bugzilla #287260 on
some platforms but not on all that we test.
The issue was that cprop_BB did not see that in Add32(t2,t3) the
driving expressions for t2 and t3 were the same. Therefore, the
Add was not replaced with a shift (which is necessary for proper
memcheck operation).
So in this patch:
(1) In cprop_BB, when setting up the "env", record *any* assignment
to a temporary (and not just those that are subject to copy
propagation).
(2) Pass this env down to fold_Expr and then sameIRExprs.
(3) Replace sameIRTemps with sameIRExprs and enhance it. Upon
encountering an RdTmp, check "env" and recurse into the
expression assigned to the temporary.
As a side, the functions sameIcoU32s and sameIRTempsOrIcoU32s
and replaced with sameIRExprs.
(4) Add some machinery to monitor frequency and effectiveness of
sameIRExprs (can be enabled by setting STATS_IROPT).
Hopefully, fixes #287260 for good.
Modified:
trunk/priv/ir_opt.c
Modified: trunk/priv/ir_opt.c
===================================================================
--- trunk/priv/ir_opt.c 2012-02-04 17:07:07 UTC (rev 2245)
+++ trunk/priv/ir_opt.c 2012-02-13 00:06:29 UTC (rev 2246)
@@ -46,7 +46,10 @@
/* Set to 1 for lots of debugging output. */
#define DEBUG_IROPT 0
+/* Set to 1 to gather some statistics. Currently only for sameIRExprs. */
+#define STATS_IROPT 0
+
/* What iropt does, 29 Dec 04.
It takes an IRSB and produces a new one with the same meaning,
@@ -883,41 +886,139 @@
/*--- Constant propagation and folding ---*/
/*---------------------------------------------------------------*/
+#if STATS_IROPT
+/* How often sameIRExprs was invoked */
+static UInt invocation_count;
+/* How often sameIRExprs recursed through IRTemp assignments */
+static UInt recursion_count;
+/* How often sameIRExprs found identical IRExprs */
+static UInt success_count;
+/* How often recursing through assignments to IRTemps helped
+ establishing equality. */
+static UInt recursion_success_count;
+/* Whether or not recursing through an IRTemp assignment helped
+ establishing IRExpr equality for a given sameIRExprs invocation. */
+static Bool recursion_helped;
+/* Whether or not a given sameIRExprs invocation recursed through an
+ IRTemp assignment */
+static Bool recursed;
+/* Maximum number of nodes ever visited when comparing two IRExprs. */
+static UInt max_nodes_visited;
+#endif /* STATS_IROPT */
+
+/* Count the number of nodes visited for a given sameIRExprs invocation. */
+static UInt num_nodes_visited;
+
+/* Do not visit more than NODE_LIMIT nodes when comparing two IRExprs.
+ This is to guard against performance degradation by visiting large
+ trees without success. */
+#define NODE_LIMIT 30
+
+
/* The env in this section is a map from IRTemp to IRExpr*,
that is, an array indexed by IRTemp. */
-/* Are both expressions simply the same IRTemp ? */
-static Bool sameIRTemps ( IRExpr* e1, IRExpr* e2 )
+/* Do both expressions compute the same value? The answer is generally
+ conservative, i.e. it will report that the expressions do not compute
+ the same value when in fact they do. The reason is that we do not
+ keep track of changes in the guest state and memory. Thusly, two
+ Get's, GetI's or Load's, even when accessing the same location, will be
+ assumed to compute different values. After all the accesses may happen
+ at different times and the guest state / memory can have changed in
+ the meantime. */
+static Bool sameIRExprs_aux ( IRExpr** env, IRExpr* e1, IRExpr* e2 )
{
- return toBool( e1->tag == Iex_RdTmp
- && e2->tag == Iex_RdTmp
- && e1->Iex.RdTmp.tmp == e2->Iex.RdTmp.tmp );
-}
+ if (e1->tag != e2->tag) return False;
-static Bool sameIcoU32s ( IRExpr* e1, IRExpr* e2 )
-{
- return toBool( e1->tag == Iex_Const
- && e2->tag == Iex_Const
- && e1->Iex.Const.con->tag == Ico_U32
- && e2->Iex.Const.con->tag == Ico_U32
- && e1->Iex.Const.con->Ico.U32
- == e2->Iex.Const.con->Ico.U32 );
-}
+ if (num_nodes_visited++ > NODE_LIMIT) return False;
-/* Are both expressions either the same IRTemp or IRConst-U32s ? If
- in doubt, say No. */
-static Bool sameIRTempsOrIcoU32s ( IRExpr* e1, IRExpr* e2 )
-{
switch (e1->tag) {
case Iex_RdTmp:
- return sameIRTemps(e1, e2);
- case Iex_Const:
- return sameIcoU32s(e1, e2);
+ if (e1->Iex.RdTmp.tmp == e2->Iex.RdTmp.tmp) return True;
+
+ if (env[e1->Iex.RdTmp.tmp] && env[e2->Iex.RdTmp.tmp]) {
+ Bool same = sameIRExprs_aux(env, env[e1->Iex.RdTmp.tmp],
+ env[e2->Iex.RdTmp.tmp]);
+#if STATS_IROPT
+ recursed = True;
+ if (same) recursion_helped = True;
+#endif
+ return same;
+ }
+ return False;
+
+ case Iex_Get:
+ case Iex_GetI:
+ case Iex_Load:
+ /* Guest state / memory could have changed in the meantime. */
+ return False;
+
+ case Iex_Binop:
+ return toBool( e1->Iex.Binop.op == e2->Iex.Binop.op
+ && sameIRExprs_aux( env, e1->Iex.Binop.arg1, e2->Iex.Binop.arg1 )
+ && sameIRExprs_aux( env, e1->Iex.Binop.arg2, e2->Iex.Binop.arg2 ));
+
+ case Iex_Unop:
+ return toBool( e1->Iex.Unop.op == e2->Iex.Unop.op
+ && sameIRExprs_aux( env, e1->Iex.Unop.arg, e2->Iex.Unop.arg ));
+
+ case Iex_Const: {
+ IRConst *c1 = e1->Iex.Const.con;
+ IRConst *c2 = e2->Iex.Const.con;
+ vassert(c1->tag == c2->tag);
+ switch (c1->tag) {
+ case Ico_U1: return toBool( c1->Ico.U1 == c2->Ico.U1 );
+ case Ico_U8: return toBool( c1->Ico.U8 == c2->Ico.U8 );
+ case Ico_U16: return toBool( c1->Ico.U16 == c2->Ico.U16 );
+ case Ico_U32: return toBool( c1->Ico.U32 == c2->Ico.U32 );
+ case Ico_U64: return toBool( c1->Ico.U64 == c2->Ico.U64 );
+ default: break;
+ }
+ return False;
+ }
+
+ case Iex_Triop:
+ return toBool( e1->Iex.Triop.op == e2->Iex.Triop.op
+ && sameIRExprs_aux( env, e1->Iex.Triop.arg1, e2->Iex.Triop.arg1 )
+ && sameIRExprs_aux( env, e1->Iex.Triop.arg2, e2->Iex.Triop.arg2 )
+ && sameIRExprs_aux( env, e1->Iex.Triop.arg3, e2->Iex.Triop.arg3 ));
+
+ case Iex_Mux0X:
+ return toBool( sameIRExprs_aux( env, e1->Iex.Mux0X.cond, e2->Iex.Mux0X.cond )
+ && sameIRExprs_aux( env, e1->Iex.Mux0X.expr0, e2->Iex.Mux0X.expr0 )
+ && sameIRExprs_aux( env, e1->Iex.Mux0X.exprX, e2->Iex.Mux0X.exprX ));
+
default:
- return False;
+ /* Not very likely to be "same". */
+ break;
}
+
+ return False;
}
+static Bool sameIRExprs ( IRExpr** env, IRExpr* e1, IRExpr* e2 )
+{
+ Bool same;
+
+ num_nodes_visited = 0;
+ same = sameIRExprs_aux(env, e1, e2);
+
+#if STATS_IROPT
+ ++invocation_count;
+ if (recursed) ++recursion_count;
+ success_count += same;
+ if (same && recursion_helped)
+ ++recursion_success_count;
+ if (num_nodes_visited > max_nodes_visited)
+ max_nodes_visited = num_nodes_visited;
+ recursed = False; /* reset */
+ recursion_helped = False; /* reset */
+#endif /* STATS_IROPT */
+
+ return same;
+}
+
+
/* Is this literally IRExpr_Const(IRConst_U32(0)) ? */
static Bool isZeroU32 ( IRExpr* e )
{
@@ -1014,7 +1115,7 @@
}
-static IRExpr* fold_Expr ( IRExpr* e )
+static IRExpr* fold_Expr ( IRExpr** env, IRExpr* e )
{
Int shift;
IRExpr* e2 = e; /* e2 is the result of folding e, if possible */
@@ -1638,7 +1739,7 @@
break;
}
/* Or8/Or16/Or32/Max32U(t,t) ==> t, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = e->Iex.Binop.arg1;
break;
}
@@ -1650,7 +1751,7 @@
x+x produces a defined least significant bit, and it seems
simplest just to get rid of the problem by rewriting it
out, since the opportunity to do so exists. */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = IRExpr_Binop(Iop_Shl8, e->Iex.Binop.arg1,
IRExpr_Const(IRConst_U8(1)));
break;
@@ -1672,7 +1773,7 @@
break;
}
/* Add32/Add64(t,t) ==> t << 1. Same rationale as for Add8. */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = IRExpr_Binop(e->Iex.Binop.op == Iop_Add32 ? Iop_Shl32 : Iop_Shl64,
e->Iex.Binop.arg1, IRExpr_Const(IRConst_U8(1)));
break;
@@ -1686,7 +1787,7 @@
break;
}
/* Sub64(t,t) ==> 0, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = mkZeroOfPrimopResultType(e->Iex.Binop.op);
break;
}
@@ -1710,7 +1811,7 @@
break;
}
/* And32(t,t) ==> t, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = e->Iex.Binop.arg1;
break;
}
@@ -1720,7 +1821,7 @@
case Iop_And16:
case Iop_And64:
/* And8/And16/And64(t,t) ==> t, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = e->Iex.Binop.arg1;
break;
}
@@ -1728,7 +1829,7 @@
case Iop_OrV128:
/* V128(t,t) ==> t, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = e->Iex.Binop.arg1;
break;
}
@@ -1740,7 +1841,7 @@
case Iop_Xor64:
case Iop_XorV128:
/* Xor8/16/32/64/V128(t,t) ==> 0, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = mkZeroOfPrimopResultType(e->Iex.Binop.op);
break;
}
@@ -1748,7 +1849,7 @@
case Iop_Sub32:
/* Sub32(t,t) ==> 0, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = mkZeroOfPrimopResultType(e->Iex.Binop.op);
break;
}
@@ -1757,7 +1858,7 @@
case Iop_CmpEQ64:
case Iop_CmpEQ8x8:
case Iop_CmpEQ8x16:
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = mkOnesOfPrimopResultType(e->Iex.Binop.op);
break;
}
@@ -1782,15 +1883,15 @@
}
else
/* are the arms identical? (pretty weedy test) */
- if (sameIRTempsOrIcoU32s(e->Iex.Mux0X.expr0,
- e->Iex.Mux0X.exprX)) {
+ if (sameIRExprs(env, e->Iex.Mux0X.expr0,
+ e->Iex.Mux0X.exprX)) {
e2 = e->Iex.Mux0X.expr0;
}
}
/* Show cases where we've found but not folded 'op(t,t)'. */
if (0 && e == e2 && e->tag == Iex_Binop
- && sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ && sameIRExprs(env, e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
vex_printf("IDENT: ");
ppIRExpr(e); vex_printf("\n");
}
@@ -1828,11 +1929,15 @@
switch (ex->tag) {
case Iex_RdTmp:
if (env[(Int)ex->Iex.RdTmp.tmp] != NULL) {
- return env[(Int)ex->Iex.RdTmp.tmp];
- } else {
- /* not bound in env */
- return ex;
+ IRExpr *rhs = env[(Int)ex->Iex.RdTmp.tmp];
+ if (rhs->tag == Iex_RdTmp)
+ return rhs;
+ if (rhs->tag == Iex_Const
+ && rhs->Iex.Const.con->tag != Ico_F64i)
+ return rhs;
}
+ /* not bound in env */
+ return ex;
case Iex_Const:
case Iex_Get:
@@ -1943,15 +2048,15 @@
vassert(isIRAtom(st->Ist.AbiHint.base));
vassert(isIRAtom(st->Ist.AbiHint.nia));
return IRStmt_AbiHint(
- fold_Expr(subst_Expr(env, st->Ist.AbiHint.base)),
+ fold_Expr(env, subst_Expr(env, st->Ist.AbiHint.base)),
st->Ist.AbiHint.len,
- fold_Expr(subst_Expr(env, st->Ist.AbiHint.nia))
+ fold_Expr(env, subst_Expr(env, st->Ist.AbiHint.nia))
);
case Ist_Put:
vassert(isIRAtom(st->Ist.Put.data));
return IRStmt_Put(
st->Ist.Put.offset,
- fold_Expr(subst_Expr(env, st->Ist.Put.data))
+ fold_Expr(env, subst_Expr(env, st->Ist.Put.data))
);
case Ist_PutI:
@@ -1959,9 +2064,9 @@
vassert(isIRAtom(st->Ist.PutI.data));
return IRStmt_PutI(
st->Ist.PutI.descr,
- fold_Expr(subst_Expr(env, st->Ist.PutI.ix)),
+ fold_Expr(env, subst_Expr(env, st->Ist.PutI.ix)),
st->Ist.PutI.bias,
- fold_Expr(subst_Expr(env, st->Ist.PutI.data))
+ fold_Expr(env, subst_Expr(env, st->Ist.PutI.data))
);
case Ist_WrTmp:
@@ -1969,7 +2074,7 @@
allowed to be more than just a constant or a tmp. */
return IRStmt_WrTmp(
st->Ist.WrTmp.tmp,
- fold_Expr(subst_Expr(env, st->Ist.WrTmp.data))
+ fold_Expr(env, subst_Expr(env, st->Ist.WrTmp.data))
);
case Ist_Store:
@@ -1977,8 +2082,8 @@
vassert(isIRAtom(st->Ist.Store.data));
return IRStmt_Store(
st->Ist.Store.end,
- fold_Expr(subst_Expr(env, st->Ist.Store.addr)),
- fold_Expr(subst_Expr(env, st->Ist.Store.data))
+ fold_Expr(env, subst_Expr(env, st->Ist.Store.addr)),
+ fold_Expr(env, subst_Expr(env, st->Ist.Store.data))
);
case Ist_CAS: {
@@ -1991,11 +2096,11 @@
vassert(isIRAtom(cas->dataLo));
cas2 = mkIRCAS(
cas->oldHi, cas->oldLo, cas->end,
- fold_Expr(subst_Expr(env, cas->addr)),
- cas->expdHi ? fold_Expr(subst_Expr(env, cas->expdHi)) : NULL,
- fold_Expr(subst_Expr(env, cas->expdLo)),
- cas->dataHi ? fold_Expr(subst_Expr(env, cas->dataHi)) : NULL,
- fold_Expr(subst_Expr(env, cas->dataLo))
+ fold_Expr(env, subst_Expr(env, cas->addr)),
+ cas->expdHi ? fold_Expr(env, subst_Expr(env, cas->expdHi)) : NULL,
+ fold_Expr(env, subst_Expr(env, cas->expdLo)),
+ cas->dataHi ? fold_Expr(env, subst_Expr(env, cas->dataHi)) : NULL,
+ fold_Expr(env, subst_Expr(env, cas->dataLo))
);
return IRStmt_CAS(cas2);
}
@@ -2007,9 +2112,9 @@
return IRStmt_LLSC(
st->Ist.LLSC.end,
st->Ist.LLSC.result,
- fold_Expr(subst_Expr(env, st->Ist.LLSC.addr)),
+ fold_Expr(env, subst_Expr(env, st->Ist.LLSC.addr)),
st->Ist.LLSC.storedata
- ? fold_Expr(subst_Expr(env, st->Ist.LLSC.storedata))
+ ? fold_Expr(env, subst_Expr(env, st->Ist.LLSC.storedata))
: NULL
);
@@ -2022,13 +2127,13 @@
d2->args = shallowCopyIRExprVec(d2->args);
if (d2->mFx != Ifx_None) {
vassert(isIRAtom(d2->mAddr));
- d2->mAddr = fold_Expr(subst_Expr(env, d2->mAddr));
+ d2->mAddr = fold_Expr(env, subst_Expr(env, d2->mAddr));
}
vassert(isIRAtom(d2->guard));
- d2->guard = fold_Expr(subst_Expr(env, d2->guard));
+ d2->guard = fold_Expr(env, subst_Expr(env, d2->guard));
for (i = 0; d2->args[i]; i++) {
vassert(isIRAtom(d2->args[i]));
- d2->args[i] = fold_Expr(subst_Expr(env, d2->args[i]));
+ d2->args[i] = fold_Expr(env, subst_Expr(env, d2->args[i]));
}
return IRStmt_Dirty(d2);
}
@@ -2047,7 +2152,7 @@
case Ist_Exit: {
IRExpr* fcond;
vassert(isIRAtom(st->Ist.Exit.guard));
- fcond = fold_Expr(subst_Expr(env, st->Ist.Exit.guard));
+ fcond = fold_Expr(env, subst_Expr(env, st->Ist.Exit.guard));
if (fcond->tag == Iex_Const) {
/* Interesting. The condition on this exit has folded down to
a constant. */
@@ -2091,10 +2196,9 @@
out->tyenv = deepCopyIRTypeEnv( in->tyenv );
/* Set up the env with which travels forward. This holds a
- substitution, mapping IRTemps to atoms, that is, IRExprs which
- are either IRTemps or IRConsts. Thus, copy and constant
- propagation is done. The environment is to be applied as we
- move along. Keys are IRTemps. Values are IRExpr*s.
+ substitution, mapping IRTemps to IRExprs. The environment
+ is to be applied as we move along. Keys are IRTemps.
+ Values are IRExpr*s.
*/
for (i = 0; i < n_tmps; i++)
env[i] = NULL;
@@ -2117,32 +2221,34 @@
/* If the statement has been folded into a no-op, forget it. */
if (st2->tag == Ist_NoOp) continue;
- /* Now consider what the stmt looks like. If it's of the form
- 't = const' or 't1 = t2', add it to the running environment
- and not to the output BB. Otherwise, add it to the output
- BB. Note, we choose not to propagate const when const is an
- F64i, so that F64i literals can be CSE'd later. This helps
- x86 floating point code generation. */
+ /* If the statement assigns to an IRTemp add it to the running
+ environment. This is for the benefit of copy propagation
+ and to allow sameIRExpr look through IRTemps. */
+ if (st2->tag == Ist_WrTmp) {
+ vassert(env[(Int)(st2->Ist.WrTmp.tmp)] == NULL);
+ env[(Int)(st2->Ist.WrTmp.tmp)] = st2->Ist.WrTmp.data;
- if (st2->tag == Ist_WrTmp
- && st2->Ist.WrTmp.data->tag == Iex_Const
- && st2->Ist.WrTmp.data->Iex.Const.con->tag != Ico_F64i) {
- /* 't = const' -- add to env.
- The pair (IRTemp, IRExpr*) is added. */
- env[(Int)(st2->Ist.WrTmp.tmp)] = st2->Ist.WrTmp.data;
+ /* 't1 = t2' -- don't add to BB; will be optimized out */
+ if (st2->Ist.WrTmp.data->tag == Iex_RdTmp) continue;
+
+ /* 't = const' && 'const != F64i' -- don't add to BB
+ Note, we choose not to propagate const when const is an
+ F64i, so that F64i literals can be CSE'd later. This helps
+ x86 floating point code generation. */
+ if (st2->Ist.WrTmp.data->tag == Iex_Const
+ && st2->Ist.WrTmp.data->Iex.Const.con->tag != Ico_F64i) continue;
}
- else
- if (st2->tag == Ist_WrTmp && st2->Ist.WrTmp.data->tag == Iex_RdTmp) {
- /* 't1 = t2' -- add to env.
- The pair (IRTemp, IRExpr*) is added. */
- env[(Int)(st2->Ist.WrTmp.tmp)] = st2->Ist.WrTmp.data;
- }
- else {
- /* Not interesting, copy st2 into the output block. */
- addStmtToIRSB( out, st2 );
- }
+
+ /* Not interesting, copy st2 into the output block. */
+ addStmtToIRSB( out, st2 );
}
+#if STATS_IROPT
+ vex_printf("sameIRExpr: invoked = %u/%u equal = %u/%u max_nodes = %u\n",
+ invocation_count, recursion_count, success_count,
+ recursion_success_count, max_nodes_visited);
+#endif
+
out->next = subst_Expr( env, in->next );
out->jumpkind = in->jumpkind;
return out;
|