|
From: Florian K. <fl...@ei...> - 2013-09-06 16:11:25
Attachments:
vex-patch
|
This patch introduces an enhancement in VEX's tree builder. Consider this code snippet: t0 = GET:I64(960) PUT(24) = 0x0:I64 PUT(200) = t0 Assuming that t0 is used once, we could substitute it to obtain: PUT(24) = 0x0:I64 PUT(200) = GET:I64(960) But today we don't do that, because we don't figure out that PUT(24) does not invalidate GET:I64(960). This is what attached patch is all about. Basically, we record the interval of bytes in the guest state that are modified by a PUT/I or read by one (or more) GET/Is and then check whether those overlap. I've regtested this on x86-64 and s390. There is no difference in runtime (all within noise margin). Looking at insn counts, we can see that s390 benefits a bit more than x86-64 due to its memory-to-memory insns. s390 trunk bigcode1 transtab: new 35,455 (430,066 -> 7,445,720; ratio 173:10) bigcode2 transtab: new 136,705 (1,585,062 -> 27,181,970; ratio 171:10) bz2 transtab: new 3,696 (116,886 -> 1,870,486; ratio 160:10) fbench transtab: new 2,162 (60,884 -> 1,125,454; ratio 184:10) ffbench transtab: new 1,862 (52,044 -> 975,670; ratio 187:10) heap transtab: new 1,641 (43,280 -> 833,664; ratio 192:10) heap-pdb4 transtab: new 1,673 (43,968 -> 846,842; ratio 192:10) many-loss transtab: new 1,754 (46,712 -> 893,104; ratio 191:10) many-xpts transtab: new 1,587 (42,186 -> 804,578; ratio 190:10) sarp transtab: new 1,497 (40,026 -> 770,008; ratio 192:10) tinycc transtab: new 2,095 (57,946 -> 1,091,104; ratio 188:10) s390 with patch bigcode1 transtab: new 35,455 (430,066 -> 7,406,236; ratio 172:10) bigcode2 transtab: new 136,705 (1,585,062 -> 27,037,486; ratio 170:10) bz2 transtab: new 3,696 (116,886 -> 1,861,750; ratio 159:10) fbench transtab: new 2,162 (60,884 -> 1,119,844; ratio 183:10) ffbench transtab: new 1,862 (52,044 -> 970,574; ratio 186:10) heap transtab: new 1,641 (43,280 -> 829,254; ratio 191:10) heap-pdb4 transtab: new 1,673 (43,968 -> 842,354; ratio 191:10) many-loss transtab: new 1,754 (46,712 -> 888,470; ratio 190:10) many-xpts transtab: new 1,587 (42,186 -> 800,284; ratio 189:10) sarp transtab: new 1,497 (40,026 -> 765,920; ratio 191:10) tinycc transtab: new 2,095 (57,946 -> 1,085,690; ratio 187:10) x86-64 trunk bigcode1 transtab: new 29,964 (300,054 -> 6,210,444; ratio 206:10) bigcode2 transtab: new 114,338 (1,076,216 -> 22,867,340; ratio 212:10) bz2 transtab: new 3,616 (97,783 -> 1,514,046; ratio 154:10) fbench transtab: new 2,233 (53,045 -> 835,890; ratio 157:10) ffbench transtab: new 2,032 (50,401 -> 790,210; ratio 156:10) heap transtab: new 1,753 (39,058 -> 627,094; ratio 160:10) heap-pdb4 transtab: new 1,782 (39,571 -> 635,983; ratio 160:10) many-loss transtab: new 1,876 (42,100 -> 674,835; ratio 160:10) many-xpts transtab: new 1,720 (37,992 -> 612,868; ratio 161:10) sarp transtab: new 1,631 (36,809 -> 586,997; ratio 159:10) tinycc transtab: new 2,237 (52,382 -> 833,449; ratio 159:10) x86-64 with patch bigcode1 transtab: new 29,964 (300,049 -> 6,207,650; ratio 206:10) bigcode2 transtab: new 114,338 (1,076,226 -> 22,864,539; ratio 212:10) bz2 transtab: new 3,616 (97,783 -> 1,509,375; ratio 154:10) fbench transtab: new 2,233 (53,035 -> 832,296; ratio 156:10) ffbench transtab: new 2,032 (50,406 -> 786,406; ratio 156:10) heap transtab: new 1,753 (39,053 -> 624,290; ratio 159:10) heap-pdb4 transtab: new 1,782 (39,566 -> 633,142; ratio 160:10) many-loss transtab: new 1,876 (42,095 -> 670,636; ratio 159:10) many-xpts transtab: new 1,720 (37,992 -> 610,108; ratio 160:10) sarp transtab: new 1,631 (36,809 -> 584,247; ratio 158:10) tinycc transtab: new 2,237 (52,382 -> 829,494; ratio 158:10) Florian |
|
From: Julian S. <js...@ac...> - 2013-09-10 10:57:21
|
On 09/06/2013 06:11 PM, Florian Krohm wrote:
> This patch introduces an enhancement in VEX's tree builder.
Seems plausible.
My main concern with this is that it doesn't change the
carefully-balanced and completely-untested (AFAIK) precise-exception
semantics, as controlled by
--vex-iropt-register-updates=sp-at-mem-access
|unwindregs-at-mem-access [default]
|allregs-at-mem-access
|allregs-at-each-insn
In particular, need to verify that (1) the patch does not remove any
PUTs, loads or stores, and (2) it does not reorder any PUTs with
respect to loads or stores, and (3) it doesn not reorder any stores
with respect to loads or other stores.
IOW .. imagine we chop up the post-treebuild IRStmts into groups,
in which each group contains a single Store, a single Put, or
arbitrary Loads, Gets and arithmetic, but no Stores or Puts.
(ignoring IRDirty and other complexities). Then, the question is:
does the patch change the sequence?
I haven't studied the patch + surrounding code enough to know
the answer to that. +1 to land if you can convince yourself
that PX semantics won't be changed. +2 if you can also figure
out how to test the behaviour of the --vex-iropt-register-updates
flag.
See https://bugzilla.mozilla.org/show_bug.cgi?id=913876 for
the kinds of dirty tricks that people do, requiring precise
exceptions.
re comments on the patch itself:
+typedef
+ struct {
+ Bool present;
+ Int low;
+ Int high;
+ }
+ Interval;
That has to be at least 12 bytes, or 16 on a 64 bit target.
struct { UShort low; UShort high; Bool present; } would make
it 8 bytes on all targets. re UShort vs Short, is there any
need to have the offsets signed?
+ /* Assume all guest state */
A bit cryptic, but from reading the lines after, you mean something
like "Assume all guest state is written" (or some such).
J
|
|
From: Florian K. <fl...@ei...> - 2013-09-10 18:29:51
|
On 09/10/2013 12:56 PM, Julian Seward wrote: > > My main concern with this is that it doesn't change the > carefully-balanced and completely-untested (AFAIK) precise-exception > semantics, as controlled by > > --vex-iropt-register-updates=sp-at-mem-access > |unwindregs-at-mem-access [default] > |allregs-at-mem-access > |allregs-at-each-insn > > In particular, need to verify that (1) the patch does not remove any > PUTs, loads or stores, and (2) it does not reorder any PUTs with > respect to loads or stores, and (3) it doesn not reorder any stores > with respect to loads or other stores. The patch does none of those things. What is changed is the logic to decide whether or not an expression containing one or more GET[I]s and preceding a PUT statement can be substituted into a statement following the PUT. Doing such a substitution eliminates a WrTmp statement but otherwise does not remove or reorder any statement. > > IOW .. imagine we chop up the post-treebuild IRStmts into groups, > in which each group contains a single Store, a single Put, or > arbitrary Loads, Gets and arithmetic, but no Stores or Puts. > (ignoring IRDirty and other complexities). Then, the question is: > does the patch change the sequence? I cannot imagine how it could. > > +2 if you can also figure > out how to test the behaviour of the --vex-iropt-register-updates > flag. We need some way to feed an insn sequence into VEX, run it through the pipeline with various options and observe the output. https://bugs.kde.org/show_bug.cgi?id=272405 has some thoughts on that. See the run-insn.c attachment there. But one would also need a much more flexible testing harness for that. vg-regtest is not flexible enough. The BZ discusses that also to some extent. > > See https://bugzilla.mozilla.org/show_bug.cgi?id=913876 for > the kinds of dirty tricks that people do, requiring precise > exceptions. Yikes > > re comments on the patch itself: > > +typedef > + struct { > + Bool present; > + Int low; > + Int high; > + } > + Interval; > > That has to be at least 12 bytes, or 16 on a 64 bit target. Did you test your hypothesis ? :) That struct uses 12 bytes on ILP32 and LP64. > struct { UShort low; UShort high; Bool present; } would make > it 8 bytes on all targets. .. would make it 6 bytes on all targets. > re UShort vs Short, is there any > need to have the offsets signed? I use an "Int" because that is what is used for "offset into guest state" and I was attempting to be consistent. Cf. libvex_ir.h - definition of the Get and Put variants and many other places elsewhere. IMNSHO those offsets should all be unsigned entities. And, yes, a "short" would be more than enough. > > + /* Assume all guest state */ > > A bit cryptic, but from reading the lines after, you mean something > like "Assume all guest state is written" (or some such). Yes, I'll change that. Florian |
|
From: Julian S. <js...@ac...> - 2013-09-11 08:36:47
|
On 09/10/2013 08:29 PM, Florian Krohm wrote: > The patch does none of those things. Then fine, pls commit. >> That has to be at least 12 bytes, or 16 on a 64 bit target. > > Did you test your hypothesis ? :) Err. No. It was late at night (or other lame excuse :) > would make it 8 bytes on all targets. > .. would make it 6 bytes on all targets. See above. Your original formulation sounds fine :) > IMNSHO those offsets should all be unsigned entities. I agree, really. Negative offsets are nonsensical. Changing them to UShort would be no bad thing. But that's a job for another day. J |
|
From: Philippe W. <phi...@sk...> - 2013-09-10 19:36:21
|
On Tue, 2013-09-10 at 12:56 +0200, Julian Seward wrote: > On 09/06/2013 06:11 PM, Florian Krohm wrote: > > This patch introduces an enhancement in VEX's tree builder. > > Seems plausible. > > My main concern with this is that it doesn't change the > carefully-balanced and completely-untested (AFAIK) precise-exception > semantics, as controlled by The command line arg value -vex-iropt-register-updates=allregs-at-mem-access is used in 3 tests, but there is no explicit verification of the behaviour. I believe the 3 other values are used (implicitely): cachegrind and callgrind activates by default sp-at-mem-access memcheck uses the default unwindregs-at-mem-access --vgdb=full activates allregs-at-each-insn BTW, looking at this, it means that the --help-debug default is not ok, as the default depends on the tool (but I do not see how to make the default tool dependent) Apart of having all values used, I do not think there is somewhere a test which verifies e.g. that trapping SEGV and restarting the instruction works with a certain value and fails with e.g. sp-at-mem-access. > I haven't studied the patch + surrounding code enough to know > the answer to that. +1 to land if you can convince yourself > that PX semantics won't be changed. +2 if you can also figure > out how to test the behaviour of the --vex-iropt-register-updates > flag. > > See https://bugzilla.mozilla.org/show_bug.cgi?id=913876 for > the kinds of dirty tricks that people do, requiring precise > exceptions. > > re comments on the patch itself: > > +typedef > + struct { > + Bool present; > + Int low; > + Int high; > + } > + Interval; > > That has to be at least 12 bytes, or 16 on a 64 bit target. > struct { UShort low; UShort high; Bool present; } would make > it 8 bytes on all targets. re UShort vs Short, is there any > need to have the offsets signed? If memory is important for this type, the "present" can I guess be avoided by switching to semi-open interval semantic and use low=high to indicate not present. In other words, a write at 100, length 8 would be represented by low = 100, high = 108. A not present interval will be e.g. low = 0, high = 0. Also, the ATmpInfo components might be re-ordered maybe ? Did not look much at the rest of the patch. Philippe |