|
From: <sv...@va...> - 2012-01-29 02:27:43
|
Author: florian
Date: 2012-01-29 02:23:07 +0000 (Sun, 29 Jan 2012)
New Revision: 12359
Log:
Add testcase for bugzilla #287260. Testcase by M.Welinder
(mwe...@gm...). Announce bug fix.
Added:
trunk/memcheck/tests/bug287260.c
trunk/memcheck/tests/bug287260.stderr.exp
trunk/memcheck/tests/bug287260.vgtest
Modified:
trunk/NEWS
trunk/memcheck/tests/
trunk/memcheck/tests/Makefile.am
Modified: trunk/NEWS
===================================================================
--- trunk/NEWS 2012-01-28 14:20:30 UTC (rev 12358)
+++ trunk/NEWS 2012-01-29 02:23:07 UTC (rev 12359)
@@ -46,6 +46,7 @@
283413 Fix wrong sanity check
286270 vgpreload is not friendly to 64->32 bit execs, gives ld.so warnings
286374 Running cachegrind with --branch-sim=yes on 64-bit PowerPC program fails
+287260 Incorrect conditional jump or move depends on uninitialised value(s)
287858 VG_(strerror): unknown error
289699 vgdb connection in relay mode erroneously closed due to buffer overrun
289939 wish: complete monitor cmd 'leak_check' with details about leaked or reachable blocks
Property changes on: trunk/memcheck/tests
___________________________________________________________________
Name: svn:ignore
- *.dSYM
*.stderr.diff*
*.stderr.out
*.stdout.diff*
*.stdout.out
.deps
accounting
addressable
atomic_incs
badaddrvalue
badfree
badjump
badjump2
badloop
badpoll
badrw
big_blocks_freed_list
brk
brk2
buflen_check
calloc-overflow
clientperm
clientstackperm
custom-overlap
custom_alloc
deep_templates
describe-block
dir
doublefree
erringfds
error_counts
errs1
err_disable1
err_disable2
err_disable3
err_disable4
execve1
execve2
exitprog
file_locking
filter_leak_check_size
filter_stderr
fprw
fwrite
hello
holey_buffer_too_small
inits
inline
leak-0
leak-cases
leak-cycle
leak-delta
leak-pool
leak-regroot
leak-tree
leakotron
linux-capget
linux-syscalls-2007
linux-syslog-syscall
linux-timerfd-syscall
long-supps
long_namespace_xml
lsframe1
lsframe2
Makefile
Makefile.in
mallinfo
malloc1
malloc2
malloc3
malloc_free_fill
malloc_usable
manuel1
manuel2
manuel3
match-overrun
memalign2
memalign_test
memcmptest
mempool
mempool2
metadata
mismatches
mmaptest
nanoleak
nanoleak2
nanoleak_supp
new_nothrow
new_override
noisy_child
null_socket
origin1-yes
origin2-not-quite
origin3-no
origin4-many
origin5-bz2
origin6-fp
oset_test
overlap
partiallydefinedeq
partial_load
pdb-realloc
pdb-realloc2
pipe
pointer-trace
post-syscall
realloc1
realloc2
realloc3
sbfragment
scalar
scalar_exit_group
scalar_fork
scalar_supp
scalar_vfork
sh-mem
sh-mem-random
sigaltstack
sigkill
signal2
sigprocmask
stack_changes
stack_switch
strchr
str_tester
supp1
supp2
suppfree
supp_unknown
threadederrno
trivialleak
unit_libcbase
unit_oset
varinfo1
varinfo2
varinfo3
varinfo4
varinfo5
varinfo5so.so
varinfo6
vcpu_bz2
vcpu_fbench
vcpu_fnfns
vgtest_ume
weirdioctl
with space
wrap1
wrap2
wrap3
wrap4
wrap5
wrap6
wrap7
wrap7so.so
wrap8
writev1
xml1
zeropage
+ *.dSYM
*.stderr.diff*
*.stderr.out
*.stdout.diff*
*.stdout.out
.deps
accounting
addressable
atomic_incs
badaddrvalue
badfree
badjump
badjump2
badloop
badpoll
badrw
big_blocks_freed_list
brk
brk2
buflen_check
bug287260
calloc-overflow
clientperm
clientstackperm
custom-overlap
custom_alloc
deep_templates
describe-block
dir
doublefree
erringfds
error_counts
errs1
err_disable1
err_disable2
err_disable3
err_disable4
execve1
execve2
exitprog
file_locking
filter_leak_check_size
filter_stderr
fprw
fwrite
hello
holey_buffer_too_small
inits
inline
leak-0
leak-cases
leak-cycle
leak-delta
leak-pool
leak-regroot
leak-tree
leakotron
linux-capget
linux-syscalls-2007
linux-syslog-syscall
linux-timerfd-syscall
long-supps
long_namespace_xml
lsframe1
lsframe2
Makefile
Makefile.in
mallinfo
malloc1
malloc2
malloc3
malloc_free_fill
malloc_usable
manuel1
manuel2
manuel3
match-overrun
memalign2
memalign_test
memcmptest
mempool
mempool2
metadata
mismatches
mmaptest
nanoleak
nanoleak2
nanoleak_supp
new_nothrow
new_override
noisy_child
null_socket
origin1-yes
origin2-not-quite
origin3-no
origin4-many
origin5-bz2
origin6-fp
oset_test
overlap
partiallydefinedeq
partial_load
pdb-realloc
pdb-realloc2
pipe
pointer-trace
post-syscall
realloc1
realloc2
realloc3
sbfragment
scalar
scalar_exit_group
scalar_fork
scalar_supp
scalar_vfork
sh-mem
sh-mem-random
sigaltstack
sigkill
signal2
sigprocmask
stack_changes
stack_switch
strchr
str_tester
supp1
supp2
suppfree
supp_unknown
threadederrno
trivialleak
unit_libcbase
unit_oset
varinfo1
varinfo2
varinfo3
varinfo4
varinfo5
varinfo5so.so
varinfo6
vcpu_bz2
vcpu_fbench
vcpu_fnfns
vgtest_ume
weirdioctl
with space
wrap1
wrap2
wrap3
wrap4
wrap5
wrap6
wrap7
wrap7so.so
wrap8
writev1
xml1
zeropage
Modified: trunk/memcheck/tests/Makefile.am
===================================================================
--- trunk/memcheck/tests/Makefile.am 2012-01-28 14:20:30 UTC (rev 12358)
+++ trunk/memcheck/tests/Makefile.am 2012-01-29 02:23:07 UTC (rev 12359)
@@ -66,6 +66,7 @@
big_blocks_freed_list.stderr.exp big_blocks_freed_list.vgtest \
brk2.stderr.exp brk2.vgtest \
buflen_check.stderr.exp buflen_check.vgtest buflen_check.stderr.exp-kfail \
+ bug287260.stderr.exp bug287260.vgtest \
calloc-overflow.stderr.exp calloc-overflow.vgtest\
clientperm.stderr.exp \
clientperm.stdout.exp clientperm.vgtest \
@@ -222,6 +223,7 @@
big_blocks_freed_list \
brk2 \
buflen_check \
+ bug287260 \
calloc-overflow \
clientperm \
custom_alloc \
Added: trunk/memcheck/tests/bug287260.c
===================================================================
--- trunk/memcheck/tests/bug287260.c (rev 0)
+++ trunk/memcheck/tests/bug287260.c 2012-01-29 02:23:07 UTC (rev 12359)
@@ -0,0 +1,16 @@
+#include <stdio.h>
+
+typedef struct {
+ unsigned int stuff : 21;
+ signed int rotation : 10;
+} Oink;
+
+int
+main (int argc, char **argv)
+{
+ volatile Oink r;
+
+ r.rotation = 45;
+ fprintf (stderr, "%d\n", r.rotation);
+ return 0;
+}
Added: trunk/memcheck/tests/bug287260.stderr.exp
===================================================================
--- trunk/memcheck/tests/bug287260.stderr.exp (rev 0)
+++ trunk/memcheck/tests/bug287260.stderr.exp 2012-01-29 02:23:07 UTC (rev 12359)
@@ -0,0 +1,11 @@
+
+45
+
+HEAP SUMMARY:
+ in use at exit: 0 bytes in 0 blocks
+ total heap usage: 0 allocs, 0 frees, 0 bytes allocated
+
+For a detailed leak analysis, rerun with: --leak-check=full
+
+For counts of detected and suppressed errors, rerun with: -v
+ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Added: trunk/memcheck/tests/bug287260.vgtest
===================================================================
--- trunk/memcheck/tests/bug287260.vgtest (rev 0)
+++ trunk/memcheck/tests/bug287260.vgtest 2012-01-29 02:23:07 UTC (rev 12359)
@@ -0,0 +1 @@
+prog: bug287260
|
|
From: Bart V. A. <bva...@ac...> - 2012-01-29 10:30:32
|
On Sun, Jan 29, 2012 at 2:23 AM, <sv...@va...> wrote: > +287260 Incorrect conditional jump or move depends on uninitialised value(s) Hello Florian, Are you sure this bug is now fixed ? According to the nightly build output memcheck/tests/bug287260 fails now on several platforms it didn't fail before. Bart. |
|
From: Florian K. <br...@ac...> - 2012-01-29 15:15:36
|
Hi Bart, I checked in that testcase yesterday together with a change to ir_opt.c that fixed the named bug on the platforms I have access to. But something else is going on. Might be worth looking at. Unfortunately I cannot do it. Florian On 01/29/2012 05:30 AM, Bart Van Assche wrote: > On Sun, Jan 29, 2012 at 2:23 AM, <sv...@va...> wrote: >> +287260 Incorrect conditional jump or move depends on uninitialised value(s) > > Hello Florian, > > Are you sure this bug is now fixed ? According to the nightly build > output memcheck/tests/bug287260 fails now on several platforms it > didn't fail before. > > Bart. > > ------------------------------------------------------------------------------ > Try before you buy = See our experts in action! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-dev2 > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: Bart V. A. <bva...@ac...> - 2012-01-30 19:54:43
|
On Sun, Jan 29, 2012 at 3:15 PM, Florian Krohm <br...@ac...> wrote: > I checked in that testcase yesterday together with a change to ir_opt.c > that fixed the named bug on the platforms I have access to. But > something else is going on. Might be worth looking at. Unfortunately I > cannot do it. IMHO it is because on 64-bit platforms the following Add32 expression isn't converted into a shift: t66 = 32Sto64(Add32(64to32(t56),64to32(t56))) The optimizations in ir_opt.c are only applied if sameIRTemp() returns true. For the above case that test fails because the arguments of Add32 are expressions instead of just temporary variables. Bart. |
|
From: Florian K. <br...@ac...> - 2012-01-31 01:40:16
Attachments:
patch-for-bart
|
On 01/30/2012 02:54 PM, Bart Van Assche wrote: > On Sun, Jan 29, 2012 at 3:15 PM, Florian Krohm <br...@ac...> wrote: >> I checked in that testcase yesterday together with a change to ir_opt.c >> that fixed the named bug on the platforms I have access to. But >> something else is going on. Might be worth looking at. Unfortunately I >> cannot do it. > > IMHO it is because on 64-bit platforms the following Add32 expression > isn't converted into a shift: > > t66 = 32Sto64(Add32(64to32(t56),64to32(t56))) > > The optimizations in ir_opt.c are only applied if sameIRTemp() returns > true. For the above case that test fails because the arguments of > Add32 are expressions instead of just temporary variables. > Yep, could be. Try the attached patch. Does it help ? It should help with the expression you gave. Florian |
|
From: Julian S. <js...@ac...> - 2012-01-31 09:05:23
|
> Yep, could be. Try the attached patch. Does it help ? It should help > with the expression you gave. Yeah, I was just going to suggest that. The SSAness of the IR makes it easy to write an equivalence function for this. J |
|
From: Bart V. A. <bva...@ac...> - 2012-01-31 17:23:45
|
On Tue, Jan 31, 2012 at 1:40 AM, Florian Krohm <br...@ac...> wrote:
> On 01/30/2012 02:54 PM, Bart Van Assche wrote:
> > IMHO it is because on 64-bit platforms the following Add32 expression
> > isn't converted into a shift:
> >
> > t66 = 32Sto64(Add32(64to32(t56),64to32(t56)))
> >
> > The optimizations in ir_opt.c are only applied if sameIRTemp() returns
> > true. For the above case that test fails because the arguments of
> > Add32 are expressions instead of just temporary variables.
> >
>
> Yep, could be. Try the attached patch. Does it help ? It should help
> with the expression you gave.
Sorry but I had posted the wrong VEX code. It's probably this fragment
for which the add is not being converted into a shift that is
triggering the uninitialized value error:
0x400587: addl %eax,%eax
------ IMark(0x400587, 2, 0) ------
PUT(168) = 0x400587:I64
t18 = 64to32(GET:I64(0))
t17 = 64to32(GET:I64(0))
t16 = Add32(t18,t17)
PUT(128) = 0x3:I64
PUT(136) = 32Uto64(t18)
PUT(144) = 32Uto64(t17)
PUT(0) = 32Uto64(t16)
I'm afraid that a more complex approach is necessary to recognize in
the above code that the Add32() can be converted into a shift left ?
Or is there an easy way to avoid that t17 and t18 are different
temporaries ?
Bart.
|
|
From: Julian S. <js...@ac...> - 2012-01-31 17:37:40
|
I think you posted the right IR code (and also that Florian's fix is right). All this .. > ------ IMark(0x400587, 2, 0) ------ > PUT(168) = 0x400587:I64 > t18 = 64to32(GET:I64(0)) > t17 = 64to32(GET:I64(0)) > t16 = Add32(t18,t17) > PUT(128) = 0x3:I64 > PUT(136) = 32Uto64(t18) > PUT(144) = 32Uto64(t17) > PUT(0) = 32Uto64(t16) .. after it is subject to IR optimisation (basic compiler stuff -- constant prop/folding, CSE etc), especially in the context of the the IR of the immediately preceding and following instructions, does indeed reduce to > > > t66 = 32Sto64(Add32(64to32(t56),64to32(t56))) J |
|
From: Bart V. A. <bva...@ac...> - 2012-01-31 17:44:25
|
On Tue, Jan 31, 2012 at 5:37 PM, Julian Seward <js...@ac...> wrote: > > I think you posted the right IR code (and also that Florian's fix is > right). All this .. > > > ------ IMark(0x400587, 2, 0) ------ > > PUT(168) = 0x400587:I64 > > t18 = 64to32(GET:I64(0)) > > t17 = 64to32(GET:I64(0)) > > t16 = Add32(t18,t17) > > PUT(128) = 0x3:I64 > > PUT(136) = 32Uto64(t18) > > PUT(144) = 32Uto64(t17) > > PUT(0) = 32Uto64(t16) > > .. after it is subject to IR optimisation (basic compiler stuff -- > constant prop/folding, CSE etc), especially in the context of the > the IR of the immediately preceding and following instructions, > does indeed reduce to > > > > > t66 = 32Sto64(Add32(64to32(t56),64to32(t56))) Florians' patch doesn't help on my setup - I still see the uninitialized value errors. The above statement is from the "After tree-building" stage. As far as I understand VEX memcheck instrumentation happens before that stage and hence optimizing statements that only occur in the after-tree-building state and not before wouldn't make sense, isn't it ? Bart. |
|
From: Bart V. A. <bva...@ac...> - 2012-01-31 19:33:11
|
On Tue, Jan 31, 2012 at 5:44 PM, Bart Van Assche <bva...@ac...> wrote:
> Florians' patch doesn't help on my setup - I still see the
> uninitialized value errors. The above statement is from the "After
> tree-building" stage. As far as I understand VEX memcheck
> instrumentation happens before that stage and hence optimizing
> statements that only occur in the after-tree-building state and not
> before wouldn't make sense, isn't it ?
Maybe it's easier to discuss a patch than IR fragments. Below there is
an (ugly) VEX patch that fixes the uninitialized value complaints on
my setup. I'm not sure though how to convert this in a proper VEX
patch.
diff --git a/priv/guest_amd64_toIR.c b/priv/guest_amd64_toIR.c
index 0e845b0..a329714 100644
--- a/priv/guest_amd64_toIR.c
+++ b/priv/guest_amd64_toIR.c
@@ -2693,7 +2693,9 @@ ULong dis_op2_E_G ( VexAbiInfo* vbi,
IRType ty = szToITy(size);
IRTemp dst1 = newTemp(ty);
IRTemp src = newTemp(ty);
+ IRExpr *src_expr;
IRTemp dst0 = newTemp(ty);
+ IRExpr *dst0_expr;
UChar rm = getUChar(delta0);
IRTemp addr = IRTemp_INVALID;
@@ -2716,8 +2718,14 @@ ULong dis_op2_E_G ( VexAbiInfo* vbi,
putIRegG(size,pfx,rm, mkU(ty,0));
}
- assign( dst0, getIRegG(size,pfx,rm) );
- assign( src, getIRegE(size,pfx,rm) );
+ dst0_expr = getIRegG(size,pfx,rm);
+ assign(dst0, dst0_expr);
+ src_expr = getIRegE(size,pfx,rm);
+ if (sameIRExprs(dst0_expr, src_expr))
+ src = dst0;
+ else
+ assign(src, src_expr);
+
if (addSubCarry && op8 == Iop_Add8) {
helper_ADC( size, dst1, dst0, src,
@@ -2809,7 +2817,9 @@ ULong dis_op2_G_E ( VexAbiInfo* vbi,
IRType ty = szToITy(size);
IRTemp dst1 = newTemp(ty);
IRTemp src = newTemp(ty);
+ IRExpr *src_expr;
IRTemp dst0 = newTemp(ty);
+ IRExpr *dst0_expr;
UChar rm = getUChar(delta0);
IRTemp addr = IRTemp_INVALID;
@@ -2830,8 +2840,13 @@ ULong dis_op2_G_E ( VexAbiInfo* vbi,
putIRegE(size,pfx,rm, mkU(ty,0));
}
- assign(dst0, getIRegE(size,pfx,rm));
- assign(src, getIRegG(size,pfx,rm));
+ dst0_expr = getIRegE(size,pfx,rm);
+ assign(dst0, dst0_expr);
+ src_expr = getIRegG(size,pfx,rm);
+ if (sameIRExprs(dst0_expr, src_expr))
+ src = dst0;
+ else
+ assign(src, src_expr);
if (addSubCarry && op8 == Iop_Add8) {
helper_ADC( size, dst1, dst0, src,
diff --git a/priv/ir_opt.c b/priv/ir_opt.c
index 0698cf7..f77db76 100644
--- a/priv/ir_opt.c
+++ b/priv/ir_opt.c
@@ -904,18 +904,23 @@ static Bool sameIcoU32s ( IRExpr* e1, IRExpr* e2 )
== e2->Iex.Const.con->Ico.U32 );
}
-/* Are both expressions either the same IRTemp or IRConst-U32s ? If
- in doubt, say No. */
-static Bool sameIRTempsOrIcoU32s ( IRExpr* e1, IRExpr* e2 )
+Bool sameIRExprs(IRExpr* e1, IRExpr* e2)
{
- switch (e1->tag) {
- case Iex_RdTmp:
- return sameIRTemps(e1, e2);
- case Iex_Const:
- return sameIcoU32s(e1, e2);
- default:
- return False;
- }
+ return toBool(sameIcoU32s(e1, e2)
+ || sameIRTemps(e1, e2)
+ || (e1->tag == Iex_Get
+ && e2->tag == Iex_Get
+ && e1->Iex.Get.offset == e2->Iex.Get.offset
+ && e1->Iex.Get.ty == e2->Iex.Get.ty)
+ || (e1->tag == Iex_Unop
+ && e2->tag == Iex_Unop
+ && e1->Iex.Unop.op == e2->Iex.Unop.op
+ && sameIRExprs(e1->Iex.Unop.arg, e2->Iex.Unop.arg))
+ || (e1->tag == Iex_Binop
+ && e2->tag == Iex_Binop
+ && e1->Iex.Binop.op == e2->Iex.Binop.op
+ && sameIRExprs(e1->Iex.Binop.arg1, e2->Iex.Binop.arg1)
+ && sameIRExprs(e1->Iex.Binop.arg2, e2->Iex.Binop.arg2)));
}
/* Is this literally IRExpr_Const(IRConst_U32(0)) ? */
@@ -1638,7 +1643,7 @@ static IRExpr* fold_Expr ( IRExpr* e )
break;
}
/* Or8/Or16/Or32/Max32U(t,t) ==> t, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = e->Iex.Binop.arg1;
break;
}
@@ -1650,7 +1655,7 @@ static IRExpr* fold_Expr ( IRExpr* e )
x+x produces a defined least significant bit, and it seems
simplest just to get rid of the problem by rewriting it
out, since the opportunity to do so exists. */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = IRExpr_Binop(Iop_Shl8, e->Iex.Binop.arg1,
IRExpr_Const(IRConst_U8(1)));
break;
@@ -1672,7 +1677,7 @@ static IRExpr* fold_Expr ( IRExpr* e )
break;
}
/* Add32/Add64(t,t) ==> t << 1. Same rationale as for Add8. */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = IRExpr_Binop(e->Iex.Binop.op == Iop_Add32 ?
Iop_Shl32 : Iop_Shl64,
e->Iex.Binop.arg1,
IRExpr_Const(IRConst_U8(1)));
break;
@@ -1686,7 +1691,7 @@ static IRExpr* fold_Expr ( IRExpr* e )
break;
}
/* Sub64(t,t) ==> 0, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = mkZeroOfPrimopResultType(e->Iex.Binop.op);
break;
}
@@ -1710,7 +1715,7 @@ static IRExpr* fold_Expr ( IRExpr* e )
break;
}
/* And32(t,t) ==> t, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = e->Iex.Binop.arg1;
break;
}
@@ -1720,7 +1725,7 @@ static IRExpr* fold_Expr ( IRExpr* e )
case Iop_And16:
case Iop_And64:
/* And8/And16/And64(t,t) ==> t, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = e->Iex.Binop.arg1;
break;
}
@@ -1728,7 +1733,7 @@ static IRExpr* fold_Expr ( IRExpr* e )
case Iop_OrV128:
/* V128(t,t) ==> t, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = e->Iex.Binop.arg1;
break;
}
@@ -1740,7 +1745,7 @@ static IRExpr* fold_Expr ( IRExpr* e )
case Iop_Xor64:
case Iop_XorV128:
/* Xor8/16/32/64/V128(t,t) ==> 0, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = mkZeroOfPrimopResultType(e->Iex.Binop.op);
break;
}
@@ -1748,7 +1753,7 @@ static IRExpr* fold_Expr ( IRExpr* e )
case Iop_Sub32:
/* Sub32(t,t) ==> 0, for some IRTemp t */
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = mkZeroOfPrimopResultType(e->Iex.Binop.op);
break;
}
@@ -1757,7 +1762,7 @@ static IRExpr* fold_Expr ( IRExpr* e )
case Iop_CmpEQ64:
case Iop_CmpEQ8x8:
case Iop_CmpEQ8x16:
- if (sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ if (sameIRExprs(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
e2 = mkOnesOfPrimopResultType(e->Iex.Binop.op);
break;
}
@@ -1782,15 +1787,14 @@ static IRExpr* fold_Expr ( IRExpr* e )
}
else
/* are the arms identical? (pretty weedy test) */
- if (sameIRTempsOrIcoU32s(e->Iex.Mux0X.expr0,
- e->Iex.Mux0X.exprX)) {
+ if (sameIRExprs(e->Iex.Mux0X.expr0, e->Iex.Mux0X.exprX)) {
e2 = e->Iex.Mux0X.expr0;
}
}
/* Show cases where we've found but not folded 'op(t,t)'. */
if (0 && e == e2 && e->tag == Iex_Binop
- && sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+ && sameIRExprs(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
vex_printf("IDENT: ");
ppIRExpr(e); vex_printf("\n");
}
diff --git a/pub/libvex_ir.h b/pub/libvex_ir.h
index 04c9fa9..3e68f84 100644
--- a/pub/libvex_ir.h
+++ b/pub/libvex_ir.h
@@ -1553,6 +1553,8 @@ extern IRExpr* IRExpr_Mux0X ( IRExpr* cond,
IRExpr* expr0, IRExpr* exprX );
/* Deep-copy an IRExpr. */
extern IRExpr* deepCopyIRExpr ( IRExpr* );
+extern Bool sameIRExprs(IRExpr* e1, IRExpr* e2);
+
/* Pretty-print an IRExpr. */
extern void ppIRExpr ( IRExpr* );
|
|
From: Julian S. <js...@ac...> - 2012-02-01 10:08:55
|
> > > > > t66 = 32Sto64(Add32(64to32(t56),64to32(t56)))
>
> Florians' patch doesn't help on my setup - I still see the
> uninitialized value errors. The above statement is from the "After
> tree-building" stage. As far as I understand VEX memcheck
> instrumentation happens before that stage and hence optimizing
> statements that only occur in the after-tree-building state and not
> before wouldn't make sense, isn't it ?
Yes, you're right. I didn't think about this enough. The original
insn is
0x40057F: addl %eax,%eax
------ IMark(0x40057F, 2, 0) ------
PUT(168) = 0x40057F:I64
t18 = 64to32(GET:I64(0))
t17 = 64to32(GET:I64(0))
t16 = Add32(t18,t17)
PUT(128) = 0x3:I64
PUT(136) = 32Uto64(t18)
PUT(144) = 32Uto64(t17)
PUT(0) = 32Uto64(t16)
and the post-optimised, pre-instrumented code (--trace-flags=01000000) that
goes into memcheck instrumentation is
------ IMark(0x40057F, 2, 0) ------
t57 = 64to32(t56)
t59 = 64to32(t56)
t16 = Add32(t57,t59)
IR-NoOp
IR-NoOp
t63 = 32Uto64(t16)
This is a flattened version of the expression that Florian's patch aims to
find. But the entire compilation pipeline operates on flattened IR -- the
tree building stage is (almost) the last stage, and is only there so that
the instruction selectors have the opportunity to convert multiple expression
nodes into single instructions if they want.
So Florian's patch won't help, as it stands, because it assumes the IR is
un-flattened. What we need is something that can do the equivalence check
for flattened IR, but that requires laboriously scanning backwards through
the block to connect up IRTemp defs and uses.
This should be fixed in the IR optimiser, though, not in the front ends,
since doing it in the front ends means duplicating functionality.
J
|
|
From: Josef W. <Jos...@gm...> - 2012-01-31 22:58:07
|
On 31.01.2012 02:40, Florian Krohm wrote:
> ...
> + if ( e1->tag == Iex_Binop&& e2->tag == Iex_Binop) {
> + return toBool( sameIRValues( e1->Iex.Binop.arg1, e2->Iex.Binop.arg1)
> +&& sameIRValues( e1->Iex.Binop.arg2, e2->Iex.Binop.arg2));
> ...
Don't you also need to check if the two binops are the same?
Josef
|
|
From: Julian S. <js...@ac...> - 2012-02-01 09:57:42
|
On Tuesday, January 31, 2012 11:57:59 pm Josef Weidendorfer wrote:
> On 31.01.2012 02:40, Florian Krohm wrote:
> > ...
> >
> > + if ( e1->tag == Iex_Binop&& e2->tag == Iex_Binop) {
> > + return toBool( sameIRValues( e1->Iex.Binop.arg1,
> > e2->Iex.Binop.arg1) +&& sameIRValues( e1->Iex.Binop.arg2,
> > e2->Iex.Binop.arg2));
> >
> > ...
>
> Don't you also need to check if the two binops are the same?
Ah yes, good point :)
J
|
|
From: Florian K. <br...@ac...> - 2012-02-05 16:33:43
Attachments:
iropt-wip-patch
|
On 02/01/2012 05:08 AM, Julian Seward wrote:
>
>
> This is a flattened version of the expression that Florian's patch aims to
> find. But the entire compilation pipeline operates on flattened IR -- the
> tree building stage is (almost) the last stage, and is only there so that
> the instruction selectors have the opportunity to convert multiple expression
> nodes into single instructions if they want.
>
[snip]
> This should be fixed in the IR optimiser, though, not in the front ends,
> since doing it in the front ends means duplicating functionality.
>
So here is a work-in-progress patch for the IR optimizer.
Since the IR is flat at the time of this optimization we need
to be able to see through an IRTemp. Basically a mapping from
IRTemp to the rhs that was assigned to the temporary.
So this patch does three things:
(1) In cprop_BB, when setting up the "env", record *any* assignment
to a temporary (and not just those that are subject to copy
propagation).
(2) Pass this env down to fold_Expr and then sameIRExprs. This
is pure mechanics.
(3) Replace sameIRTemps with sameIRExprs. Upon encountering an
RdTmp, check "env" and recurse into the expression assigned
to the temporary.
As a side, the functions sameIcoU32s and sameIRTempsOrIcoU32s
and replaced with sameIRExprs.
The good news is that the memcheck errors for bug287260.c are no
longer reported. The bad news is that there are 8 segfaults when
running make regtest on x86_64. I've been chasing the bug for a
while and stared at the patch. But it's elusive and I haven't found
it. Time for a 2nd opinion.
The symptom for the bug is a segfault in the jitted code when
attempting to load from some address. I looked at the IRSB where it
happens and made sure that the optimization of that IRSB is correct.
So it happens somewhere earlier. Following this path looks like a
tremendous time sink...
Is there something fundamental that I'm missing? Perhaps some
dependency between optimization phases? For instance I
was wondering about cprop_BB. It is careful to not call addStmtToIRSB
for a statement that will be eliminated by copy propagation. That
seems unnecessary because cprop_BB is usually followed by dead code
removal which would take care of it.
The bug disappears if we do not recurse through temporaries (i.e.
replace the #if 1 in the patch with #if 0). Which is an indication
that I got sameIRExpr right, this time :)
On my x86_64 it can be reproduced with none/tests/empty_exe on
s390x with memcheck/tests/badfree and --tool=none.
Florian
|
|
From: Bart V. A. <bva...@ac...> - 2012-02-05 16:57:02
|
On Sun, Feb 5, 2012 at 4:33 PM, Florian Krohm <br...@ac...> wrote:
> So here is a work-in-progress patch for the IR optimizer.
Maybe the follow-up patch below makes sense ?
diff --git a/priv/ir_opt.c b/priv/ir_opt.c
index 9a0d071..d0ffd71 100644
--- a/priv/ir_opt.c
+++ b/priv/ir_opt.c
@@ -893,6 +893,7 @@ static Bool sameIRExprs ( IRExpr** env, IRExpr*
e1, IRExpr* e2 )
switch (e1->tag) {
case Iex_RdTmp:
+ if (e2->tag != Iex_RdTmp) return False;
if (e1->Iex.RdTmp.tmp == e2->Iex.RdTmp.tmp) return True;
#if 1
if (env[e1->Iex.RdTmp.tmp] && env[e2->Iex.RdTmp.tmp]) {
Bart.
|
|
From: Julian S. <js...@ac...> - 2012-02-07 10:43:17
|
> The good news is that the memcheck errors for bug287260.c are no > longer reported. The bad news is that there are 8 segfaults when > running make regtest on x86_64. I've been chasing the bug for a > while and stared at the patch. But it's elusive and I haven't found > it. Time for a 2nd opinion. > > The symptom for the bug is a segfault in the jitted code when > attempting to load from some address. I looked at the IRSB where it > happens and made sure that the optimization of that IRSB is correct. > So it happens somewhere earlier. Following this path looks like a > tremendous time sink... Yes .. if the state of the simulated machine got corrupted in some earlier block, your chances of finding it directly are close to zero. Without having thought about this much .. are you 110% sure there is no way your patche can move loads past stores or dirty helpers, or move gets (including GetI) past puts (including PutI), in either direction ? That kind of thing has caught me out in the past. J |
|
From: Florian K. <br...@ac...> - 2012-02-05 17:24:42
|
On 02/05/2012 11:56 AM, Bart Van Assche wrote:
> On Sun, Feb 5, 2012 at 4:33 PM, Florian Krohm <br...@ac...> wrote:
>> So here is a work-in-progress patch for the IR optimizer.
>
> Maybe the follow-up patch below makes sense ?
>
> diff --git a/priv/ir_opt.c b/priv/ir_opt.c
> index 9a0d071..d0ffd71 100644
> --- a/priv/ir_opt.c
> +++ b/priv/ir_opt.c
> @@ -893,6 +893,7 @@ static Bool sameIRExprs ( IRExpr** env, IRExpr*
> e1, IRExpr* e2 )
>
> switch (e1->tag) {
> case Iex_RdTmp:
> + if (e2->tag != Iex_RdTmp) return False;
> if (e1->Iex.RdTmp.tmp == e2->Iex.RdTmp.tmp) return True;
> #if 1
> if (env[e1->Iex.RdTmp.tmp] && env[e2->Iex.RdTmp.tmp]) {
>
But we already check at the very beginning that the expr tags are the same:
static Bool sameIRExprs ( IRExpr** env, IRExpr* e1, IRExpr* e2 )
{
if (e1->tag != e2->tag) return False; // <----<<
switch (e1->tag) {
case Iex_RdTmp:
if (e1->Iex.RdTmp.tmp == e2->Iex.RdTmp.tmp) return True;
#if 1
Florian
|
|
From: Bart V. A. <bva...@ac...> - 2012-02-05 17:29:05
|
On Sun, Feb 5, 2012 at 5:24 PM, Florian Krohm <br...@ac...> wrote:
> But we already check at the very beginning that the expr tags are the same:
>
> static Bool sameIRExprs ( IRExpr** env, IRExpr* e1, IRExpr* e2 )
> {
> if (e1->tag != e2->tag) return False; // <----<<
>
> switch (e1->tag) {
> case Iex_RdTmp:
> if (e1->Iex.RdTmp.tmp == e2->Iex.RdTmp.tmp) return True;
Yes, I just noticed that myself too ...
The good news is if that if I run the Valgrind regression tests on an
x86_64 system with your patch applied that not only
memcheck/tests/bug287260 passes but also that the output of all other
regression tests is unaffected. The output of "make regtest" is the
same as what I got before on openSUSE 12.1:
== 606 tests, 2 stderr failures, 0 stdout failures, 0 stderrB
failures, 0 stdoutB failures, 0 post failures ==
memcheck/tests/origin5-bz2 (stderr)
memcheck/tests/overlap (stderr)
Bart.
|
|
From: Florian K. <br...@ac...> - 2012-02-07 04:26:37
|
On 02/05/2012 11:33 AM, Florian Krohm wrote:
> On 02/01/2012 05:08 AM, Julian Seward wrote:
>>
>>
>> This is a flattened version of the expression that Florian's patch aims to
>> find. But the entire compilation pipeline operates on flattened IR -- the
>> tree building stage is (almost) the last stage, and is only there so that
>> the instruction selectors have the opportunity to convert multiple expression
>> nodes into single instructions if they want.
>>
>
> [snip]
>
>> This should be fixed in the IR optimiser, though, not in the front ends,
>> since doing it in the front ends means duplicating functionality.
>>
>
> So here is a work-in-progress patch for the IR optimizer.
>
Consider this:
PUT(0) = 0xF:I64
t1 = GET:I64(0)
PUT(4) = 0x1E:I32
t2 = GET:I64(0)
t3 = And64(t2,t1)
sameIRExpr(t2,t1) will return true. But the computed values are
different. Salvaging this will be more difficult, as guest state access
is not in SSA.
I looked at a failing example and the above scenario is indeed
happening.
Florian
|
|
From: Julian S. <js...@ac...> - 2012-02-07 11:32:55
|
> Consider this: > > PUT(0) = 0xF:I64 > t1 = GET:I64(0) > PUT(4) = 0x1E:I32 > t2 = GET:I64(0) > t3 = And64(t2,t1) > > sameIRExpr(t2,t1) will return true. But the computed values are > different. Salvaging this will be more difficult, as guest state access > is not in SSA. Yeah, you have a partial overlap on the Puts and Gets. I'd suggest that you limit sameIRExpr to recursing only over Un/Bin/Tri/QuadOps, constants and IRTemps, and simply give up on anything else. That will solve the original false positive problem and should be safe for the example above, since it will never conclude that two Gets produce the same value. Similarly it avoids us having to deal with complexities of equal-value proofs in the presence of loads, stores and dirty helpers. IR transformation passes that happen before constant folding include Put-to-Get forwarding (that handles the above overlap cases safely), so such a limitation for sameIRExpr is not as bad as it might at first sound. Come to think of it, iropt also contains a CSE pass (completely unrelated to the constant folder) which must also deal with this same equivalance-detection problem. You might like to have a look at it. I'm sure it uses the same give-up-if-we're-not-sure hac^H^H^Hstrategy. It would also possibly have solved this entire problem a different way (if we CSEd before folding) but is mostly not used, because it is expensive. If you want even more fun IR optimisation hacking you could rewrite it to be much faster :-) J |
|
From: Florian K. <br...@ac...> - 2012-02-07 15:24:03
Attachments:
vex-iropt-final-patch
|
On 02/07/2012 06:26 AM, Julian Seward wrote: > >> Consider this: >> >> PUT(0) = 0xF:I64 >> t1 = GET:I64(0) >> PUT(4) = 0x1E:I32 >> t2 = GET:I64(0) >> t3 = And64(t2,t1) >> >> sameIRExpr(t2,t1) will return true. But the computed values are >> different. Salvaging this will be more difficult, as guest state access >> is not in SSA. > > Yeah, you have a partial overlap on the Puts and Gets. > > I'd suggest that you limit sameIRExpr to recursing only over > Un/Bin/Tri/QuadOps, constants and IRTemps, and simply give up > on anything else. Yep. That is what the attached patch does. Regtested on x86_64 and s390x. > Come to think of it, iropt also contains a CSE pass (completely > unrelated to the constant folder) which must also deal with this > same equivalance-detection problem. You might like to have a look > at it. I'm sure it uses the same give-up-if-we're-not-sure > hac^H^H^Hstrategy. It would also possibly have solved this > entire problem a different way (if we CSEd before folding) > but is mostly not used, because it is expensive. If you > want even more fun IR optimisation hacking you could rewrite > it to be much faster :-) Well... I've stared a lot at before/after optimization IR and it looks impressive. Beating that won't be easy. Florian |
|
From: Julian S. <js...@ac...> - 2012-02-07 22:33:34
|
> > I'd suggest that you limit sameIRExpr to recursing only over
> > Un/Bin/Tri/QuadOps, constants and IRTemps, and simply give up
> > on anything else.
>
> Yep. That is what the attached patch does. Regtested on x86_64 and s390x.
Looks good. A couple of comments:
+ case Iex_Get:
+ case Iex_Load:
+ /* Guest state / memory could have changed in the meantime. */
+ return False;
vs
default:
}
+
+ return False;
case Iex_GetI also fetches guest state. It will be handled by the
default case, but to make the commenty bit more complete, could you
pls add it alongside case Iex_Get and Iex_Load.
+ case Iex_Const: {
+ IRConst *c1 = e1->Iex.Const.con;
+ IRConst *c2 = e2->Iex.Const.con;
+ vassert(c1->tag == c2->tag);
What makes this assertion true? I think it must be that e1 and e2
always have the same type. But is that always true? Could we ever
end up comparing constants of different types? /me thinks not, but
is not sure.
Final q is the effect on performance of the jit, which is already
way to slow for its own good. Am somewhat concerned about the possibility
of spending a lot of time in some pathological case, chasing very deep
expression trees. I guess most trees lead pretty quickly to a Load
or Get, in which case it stops, so maybe it's not so bad. Still, maybe
we ought to put in a simple recursion depth check that stops it going
nuts in the worst case.
J
|
|
From: Florian K. <br...@ac...> - 2012-02-07 23:13:06
|
On 02/07/2012 05:26 PM, Julian Seward wrote:
>
> Looks good. A couple of comments:
>
> + case Iex_Get:
> + case Iex_Load:
> + /* Guest state / memory could have changed in the meantime. */
> + return False;
> vs
> default:
>
> }
> +
> + return False;
>
> case Iex_GetI also fetches guest state. It will be handled by the
> default case, but to make the commenty bit more complete, could you
> pls add it alongside case Iex_Get and Iex_Load.
>
Yes, good point. Will do.
>
> + case Iex_Const: {
> + IRConst *c1 = e1->Iex.Const.con;
> + IRConst *c2 = e2->Iex.Const.con;
> + vassert(c1->tag == c2->tag);
>
> What makes this assertion true? I think it must be that e1 and e2
> always have the same type. But is that always true? Could we ever
> end up comparing constants of different types? /me thinks not, but
> is not sure.
>
I don't think that assertion can ever be true. Hey, but if you're not
sure, then it should definitely be there :)
> Final q is the effect on performance of the jit, which is already
> way to slow for its own good. Am somewhat concerned about the possibility
> of spending a lot of time in some pathological case, chasing very deep
> expression trees. I guess most trees lead pretty quickly to a Load
> or Get, in which case it stops, so maybe it's not so bad.
When I was debugging this I saw trees with 5 levels and the leaves were
all IRTemps. That does suggest it could get worse.
I want to do some more runs for performance and also to see what this
buys us in terms of saved instructions. Clearly, it won't be earth
shattering, but I'm a still curious...
> Still, maybe
> we ought to put in a simple recursion depth check that stops it going
> nuts in the worst case.
Yeah, we should do that.. I was planning to use a fixed depth across all
iropt levels. If you'd like something more fancy (command line flag to
specify the depth, different settings for different optim levels), let
me know.
Florian
|
|
From: Julian S. <js...@ac...> - 2012-02-08 08:12:08
|
> I don't think that assertion can ever be true. Hey, but if you're not > sure, then it should definitely be there :) True :) > > we ought to put in a simple recursion depth check that stops it going > > nuts in the worst case. > > Yeah, we should do that.. I was planning to use a fixed depth across all > iropt levels. Fine. > I want to do some more runs for performance and also to see what this > buys us in terms of saved instructions. Clearly, it won't be earth > shattering, but I'm a still curious... I think it's likely to be a net loss -- the extra jitting cost is almost never outweighed by the improvements to generated code. For me, the win in this is that it makes it possible to get rid of some Memcheck false positives and hence make that tool more reliable. J |
|
From: Florian K. <br...@ac...> - 2012-02-11 16:46:48
Attachments:
patch-with-limit
|
On 02/08/2012 03:05 AM, Julian Seward wrote:
>
>> I don't think that assertion can ever be true. Hey, but if you're not
>> sure, then it should definitely be there :)
>
> True :)
>
>>> we ought to put in a simple recursion depth check that stops it going
>>> nuts in the worst case.
>>
>> Yeah, we should do that.. I was planning to use a fixed depth across all
>> iropt levels.
>
I ended up using the number of nodes instead of the recursion depth.
So I did some measurements with the attached patch on an x86_64
system. Runtimes measured with vgperf --reps=10
before:
bigcode1: no: 6.4s (13.1x, -----) me:11.8s (24.0x, -----)
bigcode2: no:15.2s (29.8x, -----) me:29.3s (57.4x, -----)
bz2: no: 6.3s ( 6.0x, -----) me:27.2s (25.7x, -----)
fbench: no: 3.1s ( 3.8x, -----) me:11.6s (14.5x, -----)
ffbench: no: 1.7s ( 3.3x, -----) me: 6.4s (12.3x, -----)
sarp: no: 0.7s (17.5x, -----) me: 6.2s (156.0x, -----)
heap: no: 2.0s ( 6.0x, -----) me:14.8s (43.4x, -----)
tinycc: no: 5.3s (15.6x, -----) me:29.3s (86.2x, -----)
after:
bigcode1: no: 6.4s (12.8x, -----) me:11.8s (23.6x, -----)
bigcode2: no:15.2s (31.1x, -----) me:29.4s (60.0x, -----)
bz2: no: 6.4s ( 6.1x, -----) me:27.2s (26.2x, -----)
fbench: no: 3.1s ( 3.8x, -----) me:11.7s (14.6x, -----)
ffbench: no: 1.7s ( 3.3x, -----) me: 6.4s (12.3x, -----)
sarp: no: 0.7s (18.0x, -----) me: 6.2s (156.2x, -----)
heap: no: 2.1s ( 5.9x, -----) me:14.7s (42.0x, -----)
tinycc: no: 5.3s (14.8x, -----) me:29.3s (81.3x, -----)
Runtimes are basically the same. In these runs the number of nodes
that sameIRExprs was allowed to traverse was unrestricted.
This result is not a big surprise if we look a bit closer at
sameIRExprs. The tables below show some statistics about the
frequency and effectiveness of recursing through IRTemp assignments
for none and memcheck, respectively.
#invocations column
- left number: the total number of invocations of sameIRExprs
- right number: the number of invocations that recursed through
an IRTemp assignment
#equal column
- left number: how often two IRExprs were compared equal
- right number: how often out of those recursion through
IRTemps was necessary to prove equality
max #nodes: the size of the largest IRExpr tree ever traversed
saved insns: number of saved instructions according to --stats=yes
--tool=none
max saved
| #invocations | #equal | #nodes | insns
----------+--------------+-----------+--------+-------
bigcode1: | 110378 26003 | 2375 2011 | 16 | 2807
bigcode2: | 406629 99130 | 8000 7636 | 16 | 10307
bz2: | 32506 6591 | 798 337 | 14 | 786
fbench: | 15435 2229 | 682 206 | 14 | 427
ffbench: | 13488 1937 | 568 154 | 14 | 332
sarp: | 10015 1443 | 441 111 | 14 | 244
heap: | 12096 1711 | 529 134 | 14 | 294
tinycc: | 42252 3258 | 1100 347 | 16 | 798
----------+--------------+-----------+--------+-------
Saved insns were in the 0.1% range.
--tool=memcheck
max saved
| #invocations | #equal | #nodes | insns
----------+---------------+-----------+--------+-------
bigcode1: | 139979 48772 | 2429 2017 | 18 | 15260
bigcode2: | 514976 185647 | 8054 7642 | 18 | 49010
bz2: | 39016 11100 | 801 329 | 20 | 8924
fbench: | 20032 4756 | 741 215 | 19 | 5794
ffbench: | 17509 4628 | 609 153 | 25 | 4242
sarp: | 13298 2788 | 497 118 | 18 | 3335
heap: | 14194 2938 | 525 125 | 18 | 3549
tinycc: | 45907 6358 | 1101 340 | 20 | 4388
----------+---------------+-----------+--------+-------
Saved insns were in the 0.25% range.
Clearly, in the large majority of cases sameIRExprs can tell
that the trees are not the same without examining subtrees.
Equal trees are rare but recursing is often instrumental in
proving equality. But result vary a lot. For bigcode2 95%
of equality proofs required recursing. For ffbench that number
is only 25%.
Putting an upper bound on the number of nodes we allow sameIREXprs
to recurse into makes sense. But since proving equality was key
to fixing #287260 and recursing is not the norm I suggest to use
a large limit. The patch sets it to 100.
I left the code for gathering those stats in the patch. Might become
useful again. It can be enabled by setting STATS_IROPT to 1.
Florian
|