|
From: <sv...@va...> - 2006-04-09 01:23:37
|
Author: njn
Date: 2006-04-09 02:23:29 +0100 (Sun, 09 Apr 2006)
New Revision: 5840
Log:
Redid the --trace-mem=3Dyes option of Lackey properly. Updated some rela=
ted
stuff along with it, such as the NEWS file.
Modified:
trunk/NEWS
trunk/lackey/docs/lk-manual.xml
trunk/lackey/lk_main.c
Modified: trunk/NEWS
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- trunk/NEWS 2006-04-08 16:52:42 UTC (rev 5839)
+++ trunk/NEWS 2006-04-09 01:23:29 UTC (rev 5840)
@@ -1,22 +1,24 @@
Release 3.2.0 (?? April 2006)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3.2.0 is a feature release with a number of significant improvements:
-Performance (especially of Memcheck) is much improved, XXX...
-In detail:
+Performance (especially of Memcheck) is much improved, Addrcheck has bee=
n
+removed, Callgrind has been added, PPC64/Linux support has been added,
+Lackey has been improved, and MPI support has been added. In detail:
=20
- Performance is much improved: programs typically run 1.20--1.40 times
faster under Memcheck (much more for some unusual programs) with an
average of about 1.30 for the programs we tested it on. The improveme=
nts
for Nulgrind are similar. We haven't measured Cachegrind and Massif, =
they
should be also be faster, but with smaller improvements. We are
- interested to hear what speed-ups users get.
+ interested to hear what improvements users get.
=20
-- Memcheck uses much less memory. The amount of shadow memory used -- w=
hich
- accounts for a large percentage of all of Memcheck's memory overhead -=
-
- has been reduced by a factor of more than 4 on most programs. This me=
ans
- you should be able to run programs that use more memory than before
- without hitting problems. This memory size reduction also contributes=
to
- the speed improvements.
+ Also, Memcheck uses much less memory, due to the introduction of a
+ "compressed V bits" representation for Memcheck's shadow memory. The
+ amount of shadow memory used -- which accounts for a large percentage =
of
+ Memcheck's memory overhead -- has been reduced by a factor of more tha=
n 4
+ on most programs. This means you should be able to run programs that =
use
+ more memory than before without hitting problems. This change in
+ representation also contributes to the speed improvements.
=20
- Addrcheck has been removed. It has not worked since version 2.4.0, an=
d
with the speed and memory improvements to Memcheck it is no longer wor=
th
@@ -24,15 +26,27 @@
undefined value errors, you can use the new Memcheck option
--undef-value-errors=3Dno to obtain this behaviour.
=20
+- Josef Weidendorfer's popular Callgrind tool has been added. [XXX:
+ more details] [XXX: say something about KCachegrind and why it has not
+ been folded in... I guess because its development is quite independen=
t]
+
- Valgrind now works on PPC64/Linux. [XXX: more details...]
=20
-- XXX: others...
+- Lackey, the example tool, has been improved:
=20
-Other user-visible changes:
+ * It has a new option --detailed-counts (off by default) which causes
+ it to print out a count of loads, stores and ALU operations done, an=
d
+ their sizes.
=20
-- Callgrind has been folded in. [XXX: more details]
+ * It has a new option --trace-mem (off by default) which causes it to
+ print out a trace of all memory accesses performed by a program. It=
's a
+ good starting point for building Valgrind tools that need to track
+ memory accesses. Read the comments at the top of the file
+ lackey/lk_main.c for details.
=20
-- Valgrind now has the ability to intercept and wrap arbitrary functions=
.
+ * The original instrumentation (counting numbers of instructions, jump=
s,
+ etc) is now controlled by a new option --basic-counts. It is on by
+ default.
=20
- MPI support: partial support for debugging distributed applications
using the MPI library specification has been added. Valgrind is=20
@@ -40,6 +54,18 @@
functions, and will carefully check data passed to the (P)MPI_
interface.
=20
+- XXX: others...
+
+Please note that Helgrind is still not working. We have made an importa=
nt
+step towards making it work again, however, with the addition of functio=
n
+wrapping (see below).
+
+Other user-visible changes:
+
+- Valgrind now has the ability to intercept and wrap arbitrary functions=
.
+ This is a preliminary step towards making Helgrind work again, and
+ was required for MPI support.
+
- There are some changes to Memcheck's client requests. Some of them ha=
ve
changed names:
=20
@@ -62,6 +88,7 @@
which is like MAKE_MEM_DEFINED but only affects a byte if the byte is
already addressable.
=20
+
BUGS FIXED:
=20
XXX
Modified: trunk/lackey/docs/lk-manual.xml
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- trunk/lackey/docs/lk-manual.xml 2006-04-08 16:52:42 UTC (rev 5839)
+++ trunk/lackey/docs/lk-manual.xml 2006-04-09 01:23:29 UTC (rev 5840)
@@ -19,31 +19,32 @@
program's code. It is primarily intended to be of use as an example
tool.</para>
=20
-<para>It measures and reports:</para>
+<para>It measures and reports various things.</para>
=20
<orderedlist>
=20
<listitem>
- <para>The number of calls to
- <computeroutput>_dl_runtime_resolve()</computeroutput>, the
- function in glibc's dynamic linker that resolves function
- references to shared objects.</para>
- <para>You can change the name of the function with command line
- option <computeroutput>--fnname=3D<name></computeroutput>.</para=
>
- </listitem>
+ <para>When command line option
+ <computeroutput>--basic-counts=3Dyes</computeroutput> is specified,
+ it prints the following statistics and information about the execution=
of
+ the client program:
=20
- <listitem>
- <para>The number of conditional branches encountered and the
- number and proportion of those taken.</para>
- </listitem>
+ <orderedlist>
=20
- <listitem>
+ <listitem>
+ <para>The number of calls to
+ <computeroutput>_dl_runtime_resolve()</computeroutput>, the
+ function in glibc's dynamic linker that resolves function
+ references to shared objects.</para>
+ <para>You can change the name of the function tracekd with command l=
ine
+ option <computeroutput>--fnname=3D<name></computeroutput>.</pa=
ra>
+ </listitem>
=20
- <para>Statistics about the amount of work done during the execution
- of the client program:</para> =20
+ <listitem>
+ <para>The number of conditional branches encountered and the
+ number and proportion of those taken.</para>
+ </listitem>
=20
- <orderedlist>
-
<listitem>
<para>The number of basic blocks entered and completed by the
program. Note that due to optimisations done by the JIT, this
@@ -62,31 +63,29 @@
</listitem>
=20
<listitem>
- <para>When command line option
- <computeroutput>--detailed-counts=3Dyes</computeroutput> is
- specified, a table is printed with counts of loads, stores and ALU
- operations for various types of operands.</para>
-
- <para>The types are identified by their IR name ("I1" ... "I128",
- "F32", "F64", and "V128").</para>
+ <para>The exit code of the client program.</para>
</listitem>
=20
- <listitem>
- <para>When command line option
- <computeroutput>--trace-mem=3Dyes</computeroutput> is
- specified, it prints out the size and address of almost every load a=
nd
- store made by the program. See Section 3.3.7 of Nicholas Nethercote=
's
- PhD dissertation "Dynamic Binary Analysis and Instrumentation", 2004=
,
- for details about the few loads and stores that it misses, and other
- caveats about the accuracy of the address trace.</para>
- </listitem>
-
</orderedlist>
=20
+ <listitem>
+ <para>When command line option
+ <computeroutput>--detailed-counts=3Dyes</computeroutput> is
+ specified, a table is printed with counts of loads, stores and ALU
+ operations for various types of operands.</para>
+
+ <para>The types are identified by their IR name ("I1" ... "I128",
+ "F32", "F64", and "V128").</para>
</listitem>
=20
<listitem>
- <para>The exit code of the client program.</para>
+ <para>When command line option
+ <computeroutput>--trace-mem=3Dyes</computeroutput> is
+ specified, it prints out the size and address of almost every load and
+ store made by the program. See the comments at the top of the file
+ <computeroutput>lackey/lk_main.c</computeroutput> for details about
+ the output format, how it works, and inaccuracies in the address trace=
.
+ </para>
</listitem>
=20
</orderedlist>
Modified: trunk/lackey/lk_main.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- trunk/lackey/lk_main.c 2006-04-08 16:52:42 UTC (rev 5839)
+++ trunk/lackey/lk_main.c 2006-04-09 01:23:29 UTC (rev 5840)
@@ -51,19 +51,86 @@
//
// Specific Details about --trace-mem=3Dyes
// --------------------------------------
-// The address trace produced by --trace-mem=3Dyes is good, but not perf=
ect;
-// see Section 3.3.7 of Nicholas Nethercote's PhD dissertation "Dynamic
-// Binary Analysis and Instrumentation", 2004, for details about the few
-// loads and stores that it misses, and other caveats about the accuracy=
of
-// the address trace.
+// Lackey's --trace-mem code is a good starting point for building Valgr=
ind
+// tools that act on memory loads and stores. It also could be used as =
is,
+// with its output used as input to a post-mortem processing step. Howe=
ver,
+// because memory traces can be very large, online analysis is generally
+// better.
//
-// [Actually, the traces aren't quite right because instructions that mo=
dify
-// a memory location are treated like a load followed by a store.]
+// It prints memory data access traces that look like this:
//
+// instr : 0x0023C790, 2 # instruction read at 0x0023C790 of size 2
+// instr : 0x0023C792, 5
+// store : 0xBE80199C, 4 # data store at 0xBE80199C of size 4
+// instr : 0x0025242B, 3
+// load : 0xBE801950, 4 # data load at 0xBE801950 of size 4
+// instr : 0x0023D476, 7
+// modify: 0x0025747C, 1 # data modify at 0x0025747C of size 1
+// instr : 0x0023DC20, 2
+// load : 0x00254962, 1
+// load : 0xBE801FB3, 1
+// instr : 0x00252305, 1
+// load : 0x00254AEB, 1
+// store : 0x00257998, 1
+//
+// Every instruction executed has an "instr" event representing it.
+// Instructions that do memory accesses are followed by one or more "loa=
d",
+// "store" or "modify" events. Some instructions do more than one load =
or
+// store, as in the last two examples in the above trace.
+//
+// Here are some examples of x86 instructions that do different combinat=
ions
+// of loads, stores, and modifies.
+//
+// Instruction Memory accesses Event sequen=
ce
+// ----------- --------------- ------------=
--
+// add %eax, %ebx No loads or stores instr
+//
+// movl (%eax), %ebx loads (%eax) instr, load
+//
+// movl %eax, (%ebx) stores (%ebx) instr, store
+//
+// incl (%ecx) modifies (%ecx) instr, modif=
y
+//
+// cmpsb loads (%esi), loads(%edi) instr, load,=
load
+//
+// call*l (%edx) loads (%edx), stores -4(%esp) instr, load,=
store
+// pushl (%edx) loads (%edx), stores -4(%esp) instr, load,=
store
+// movsw loads (%esi), stores (%edi) instr, load,=
store
+//
+// Instructions using x86 "rep" prefixes are traced as if they are repea=
ted
+// N times.
+//
+// Lackey with --trace-mem gives good traces, but they are not perfect, =
for
+// the following reasons:
+//
+// - It does not trace into the OS kernel, so system calls and other ker=
nel
+// operations (eg. some scheduling and signal handling code) are ignor=
ed.
+//
+// - Valgrind replaces some code with its own, notably parts of code for
+// scheduling operations and signal handling. This code is not traced=
.
+//
+// - There is no consideration of virtual-to-physical address mapping.
+// This may not matter for many purposes.
+//
+// - Valgrind modifies the instruction stream in some very minor ways. =
For
+// example, on x86 the bts, btc, btr instructions are incorrectly
+// considered to always touch memory (this is a consequence of these
+// instructions being very difficult to simulate).
+//
+// - Valgrind tools layout memory differently to normal programs, so the
+// addresses you get will not be typical. Thus Lackey (and all Valgri=
nd
+// tools) is suitable for getting relative memory traces -- eg. if you
+// want to analyse locality of memory accesses -- but is not good if
+// absolute addresses are important.
+//
+// Despite all these warnings, Dullard's results should be good enough f=
or a
+// wide range of purposes. For example, Cachegrind shares all the above
+// shortcomings and it is still useful.
+//
// For further inspiration, you should look at cachegrind/cg_main.c whic=
h
-// handles memory accesses in a more sophisticated way -- it groups them
-// together for processing into twos and threes so that fewer C calls ar=
e
-// made and things run faster.
+// uses the same basic technique for tracing memory accesses, but also g=
roups
+// events together for processing into twos and threes so that fewer C c=
alls
+// are made and things run faster.
=20
#include "pub_tool_basics.h"
#include "pub_tool_tooliface.h"
@@ -111,23 +178,25 @@
" --trace-mem=3Dno|yes trace all loads and stores [no]\n"
" --fnname=3D<name> count calls to <name> (only used if\n"
" --basic-count=3Dyes) [_dl_runtime_resolv=
e]\n"
- =20
);
}
=20
static void lk_print_debug_usage(void)
{ =20
+ VG_(printf)(
+" (none)\n"
+ );
}
=20
/*------------------------------------------------------------*/
-/*--- Data and helpers for --basic-counts ---*/
+/*--- Stuff for --basic-counts ---*/
/*------------------------------------------------------------*/
=20
/* Nb: use ULongs because the numbers can get very big */
static ULong n_func_calls =3D 0;
static ULong n_BBs_entered =3D 0;
static ULong n_BBs_completed =3D 0;
-static ULong n_IRStmts =3D 0;
+static ULong n_IRStmts =3D 0;
static ULong n_guest_instrs =3D 0;
static ULong n_Jccs =3D 0;
static ULong n_Jccs_untaken =3D 0;
@@ -168,7 +237,7 @@
}
=20
/*------------------------------------------------------------*/
-/*--- Data and helpers for --detailed-counts ---*/
+/*--- Stuff for --detailed-counts ---*/
/*------------------------------------------------------------*/
=20
/* --- Operations --- */
@@ -244,7 +313,6 @@
}
=20
/* Summarize and print the details. */
-
static void print_details ( void )
{
Int typeIx;
@@ -265,20 +333,192 @@
=20
=20
/*------------------------------------------------------------*/
-/*--- Data and helpers for --trace-mem ---*/
+/*--- Stuff for --trace-mem ---*/
/*------------------------------------------------------------*/
=20
+#define MAX_DSIZE 512
+
+typedef
+ IRExpr=20
+ IRAtom;
+
+typedef=20
+ enum { Event_Ir, Event_Dr, Event_Dw, Event_Dm }
+ EventKind;
+
+typedef
+ struct {
+ EventKind ekind;
+ IRAtom* addr;
+ Int size;
+ }
+ Event;
+
+/* Up to this many unnotified events are allowed. Must be at least two,
+ so that reads and writes to the same address can be merged into a mod=
ify.
+ Beyond that, larger numbers just potentially induce more spilling due=
to
+ extending live ranges of address temporaries. */
+#define N_EVENTS 4
+
+/* Maintain an ordered list of memory events which are outstanding, in
+ the sense that no IR has yet been generated to do the relevant
+ helper calls. The BB is scanned top to bottom and memory events
+ are added to the end of the list, merging with the most recent
+ notified event where possible (Dw immediately following Dr and
+ having the same size and EA can be merged).
+
+ This merging is done so that for architectures which have
+ load-op-store instructions (x86, amd64), the instr is treated as if
+ it makes just one memory reference (a modify), rather than two (a
+ read followed by a write at the same address).
+
+ At various points the list will need to be flushed, that is, IR
+ generated from it. That must happen before any possible exit from
+ the block (the end, or an IRStmt_Exit). Flushing also takes place
+ when there is no space to add a new event.
+
+ If we require the simulation statistics to be up to date with
+ respect to possible memory exceptions, then the list would have to
+ be flushed before each memory reference. That's a pain so we don't
+ bother.
+
+ Flushing the list consists of walking it start to end and emitting
+ instrumentation IR for each event, in the order in which they
+ appear. */
+
+static Event events[N_EVENTS];
+static Int events_used =3D 0;
+
+
+static VG_REGPARM(2) void trace_instr(Addr addr, SizeT size)
+{
+ VG_(printf)("instr : %08p, %d\n", addr, size);
+}
+
static VG_REGPARM(2) void trace_load(Addr addr, SizeT size)
{
- VG_(printf)("load : %p, %d\n", addr, size);
+ VG_(printf)(" load : %08p, %d\n", addr, size);
}
=20
static VG_REGPARM(2) void trace_store(Addr addr, SizeT size)
{
- VG_(printf)("store: %p, %d\n", addr, size);
+ VG_(printf)(" store : %08p, %d\n", addr, size);
}
=20
+static VG_REGPARM(2) void trace_modify(Addr addr, SizeT size)
+{
+ VG_(printf)(" modify: %08p, %d\n", addr, size);
+}
=20
+
+static void flushEvents(IRBB* bb)
+{
+ Int i;
+ Char* helperName;
+ void* helperAddr;
+ IRExpr** argv;
+ IRDirty* di;
+ Event* ev;
+
+ for (i =3D 0; i < events_used; i++) {
+
+ ev =3D &events[i];
+ =20
+ // Decide on helper fn to call and args to pass it.
+ switch (ev->ekind) {
+ case Event_Ir: helperName =3D "trace_instr";
+ helperAddr =3D trace_instr; break;
+
+ case Event_Dr: helperName =3D "trace_load";
+ helperAddr =3D trace_load; break;
+
+ case Event_Dw: helperName =3D "trace_store";
+ helperAddr =3D trace_store; break;
+
+ case Event_Dm: helperName =3D "trace_modify";
+ helperAddr =3D trace_modify; break;
+ default:
+ tl_assert(0);
+ }
+
+ // Add the helper.
+ argv =3D mkIRExprVec_2( ev->addr, mkIRExpr_HWord( ev->size ) );
+ di =3D unsafeIRDirty_0_N( /*regparms*/2,=20
+ helperName, VG_(fnptr_to_fnentry)( helpe=
rAddr ),
+ argv );
+ addStmtToIRBB( bb, IRStmt_Dirty(di) );
+ }
+
+ events_used =3D 0;
+}
+
+// WARNING: If you aren't interested in instruction reads, you can omit=
the
+// code that adds calls to trace_instr() in flushEvents(). However, you
+// must still call this function, addEvent_Ir() -- it is necessary to ad=
d
+// the Ir events to the events list so that merging of paired load/store
+// events into modify events works correctly.
+static void addEvent_Ir ( IRBB* bb, IRAtom* iaddr, UInt isize )
+{
+ Event* evt;
+ tl_assert( (VG_MIN_INSTR_SZB <=3D isize && isize <=3D VG_MAX_INSTR_SZ=
B)
+ || VG_CLREQ_SZB =3D=3D isize );
+ if (events_used =3D=3D N_EVENTS)
+ flushEvents(bb);
+ tl_assert(events_used >=3D 0 && events_used < N_EVENTS);
+ evt =3D &events[events_used];
+ evt->ekind =3D Event_Ir;
+ evt->addr =3D iaddr;
+ evt->size =3D isize;
+ events_used++;
+}
+
+static
+void addEvent_Dr ( IRBB* bb, IRAtom* daddr, Int dsize )
+{
+ Event* evt;
+ tl_assert(isIRAtom(daddr));
+ tl_assert(dsize >=3D 1 && dsize <=3D MAX_DSIZE);
+ if (events_used =3D=3D N_EVENTS)
+ flushEvents(bb);
+ tl_assert(events_used >=3D 0 && events_used < N_EVENTS);
+ evt =3D &events[events_used];
+ evt->ekind =3D Event_Dr;
+ evt->addr =3D daddr;
+ evt->size =3D dsize;
+ events_used++;
+}
+
+static
+void addEvent_Dw ( IRBB* bb, IRAtom* daddr, Int dsize )
+{
+ Event* lastEvt;
+ Event* evt;
+ tl_assert(isIRAtom(daddr));
+ tl_assert(dsize >=3D 1 && dsize <=3D MAX_DSIZE);
+
+ // Is it possible to merge this write with the preceding read?
+ lastEvt =3D &events[events_used-1];
+ if (events_used > 0
+ && lastEvt->ekind =3D=3D Event_Dr
+ && lastEvt->size =3D=3D dsize
+ && eqIRAtom(lastEvt->addr, daddr))
+ {
+ lastEvt->ekind =3D Event_Dm;
+ return;
+ }
+
+ // No. Add as normal.
+ if (events_used =3D=3D N_EVENTS)
+ flushEvents(bb);
+ tl_assert(events_used >=3D 0 && events_used < N_EVENTS);
+ evt =3D &events[events_used];
+ evt->ekind =3D Event_Dw;
+ evt->size =3D dsize;
+ evt->addr =3D daddr;
+ events_used++;
+}
+
+
/*------------------------------------------------------------*/
/*--- Basic tool functions ---*/
/*------------------------------------------------------------*/
@@ -296,19 +536,17 @@
=20
static
IRBB* lk_instrument ( VgCallbackClosure* closure,
- IRBB* bb_in,=20
+ IRBB* bbIn,=20
VexGuestLayout* layout,=20
VexGuestExtents* vge,
IRType gWordTy, IRType hWordTy )
{
- IRDirty* di;
- Int i;
- IRBB* bb;
- Char fnname[100];
- IRType type;
- IRExpr** argv;
- IRExpr* addr_expr;
- IRExpr* size_expr;
+ IRDirty* di;
+ Int i;
+ IRBB* bbOut;
+ Char fnname[100];
+ IRType type;
+ IRTypeEnv* tyenv =3D bbIn->tyenv;
=20
if (gWordTy !=3D hWordTy) {
/* We don't currently support this case. */
@@ -316,15 +554,15 @@
}
=20
/* Set up BB */
- bb =3D emptyIRBB();
- bb->tyenv =3D dopyIRTypeEnv(bb_in->tyenv);
- bb->next =3D dopyIRExpr(bb_in->next);
- bb->jumpkind =3D bb_in->jumpkind;
+ bbOut =3D emptyIRBB();
+ bbOut->tyenv =3D dopyIRTypeEnv(bbIn->tyenv);
+ bbOut->next =3D dopyIRExpr(bbIn->next);
+ bbOut->jumpkind =3D bbIn->jumpkind;
=20
// Copy verbatim any IR preamble preceding the first IMark
i =3D 0;
- while (i < bb_in->stmts_used && bb_in->stmts[i]->tag !=3D Ist_IMark) =
{
- addStmtToIRBB( bb, bb_in->stmts[i] );
+ while (i < bbIn->stmts_used && bbIn->stmts[i]->tag !=3D Ist_IMark) {
+ addStmtToIRBB( bbOut, bbIn->stmts[i] );
i++;
}
=20
@@ -333,11 +571,15 @@
di =3D unsafeIRDirty_0_N( 0, "add_one_BB_entered",=20
VG_(fnptr_to_fnentry)( &add_one_BB_ente=
red ),
mkIRExprVec_0() );
- addStmtToIRBB( bb, IRStmt_Dirty(di) );
+ addStmtToIRBB( bbOut, IRStmt_Dirty(di) );
}
=20
- for (/*use current i*/; i < bb_in->stmts_used; i++) {
- IRStmt* st =3D bb_in->stmts[i];
+ if (clo_trace_mem) {
+ events_used =3D 0;
+ }
+
+ for (/*use current i*/; i < bbIn->stmts_used; i++) {
+ IRStmt* st =3D bbIn->stmts[i];
if (!st || st->tag =3D=3D Ist_NoOp) continue;
=20
if (clo_basic_counts) {
@@ -345,17 +587,25 @@
di =3D unsafeIRDirty_0_N( 0, "add_one_IRStmt",=20
VG_(fnptr_to_fnentry)( &add_one_IRSt=
mt ),=20
mkIRExprVec_0() );
- addStmtToIRBB( bb, IRStmt_Dirty(di) );
+ addStmtToIRBB( bbOut, IRStmt_Dirty(di) );
}
=20
switch (st->tag) {
+ case Ist_NoOp:
+ case Ist_AbiHint:
+ case Ist_Put:
+ case Ist_PutI:
+ case Ist_MFence:
+ addStmtToIRBB( bbOut, st );
+ break;
+
case Ist_IMark:
if (clo_basic_counts) {
/* Count guest instruction. */
di =3D unsafeIRDirty_0_N( 0, "add_one_guest_instr",
VG_(fnptr_to_fnentry)( &add_on=
e_guest_instr ),=20
mkIRExprVec_0() );
- addStmtToIRBB( bb, IRStmt_Dirty(di) );
+ addStmtToIRBB( bbOut, IRStmt_Dirty(di) );
=20
/* An unconditional branch to a known destination in the
* guest's instructions can be represented, in the IRBB t=
o
@@ -378,53 +628,17 @@
0, "add_one_func_call",=20
VG_(fnptr_to_fnentry)( &add_one_func_call )=
,=20
mkIRExprVec_0() );
- addStmtToIRBB( bb, IRStmt_Dirty(di) );
+ addStmtToIRBB( bbOut, IRStmt_Dirty(di) );
}
}
- addStmtToIRBB( bb, st );
- break;
-
- case Ist_Exit:
- if (clo_basic_counts) {
- /* Count Jcc */
- di =3D unsafeIRDirty_0_N( 0, "add_one_Jcc",=20
- VG_(fnptr_to_fnentry)( &add_on=
e_Jcc ),=20
- mkIRExprVec_0() );
- addStmtToIRBB( bb, IRStmt_Dirty(di) );
- }
-
- addStmtToIRBB( bb, st );
-
- if (clo_basic_counts) {
- /* Count non-taken Jcc */
- di =3D unsafeIRDirty_0_N( 0, "add_one_Jcc_untaken",=20
- VG_(fnptr_to_fnentry)(
- &add_one_Jcc_untaken ),
- mkIRExprVec_0() );
- addStmtToIRBB( bb, IRStmt_Dirty(di) );
- }
- break;
-
- case Ist_Store:
- // Add a call to trace_store() if --trace-mem=3Dyes.
if (clo_trace_mem) {
- addr_expr =3D st->Ist.Store.addr;
- size_expr =3D mkIRExpr_HWord(=20
- sizeofIRType(
- typeOfIRExpr(bb->tyenv, st->Ist.Store.dat=
a)));
- argv =3D mkIRExprVec_2( addr_expr, size_expr );
- di =3D unsafeIRDirty_0_N( /*regparms*/2,=20
- "trace_store",
- VG_(fnptr_to_fnentry)( trace_stor=
e ),=20
- argv );
- addStmtToIRBB( bb, IRStmt_Dirty(di) );
+ // WARNING: do not remove this function call, even if you
+ // aren't interested in instruction reads. See the comme=
nt
+ // above the function itself for more detail.
+ addEvent_Ir( bbOut, mkIRExpr_HWord( (HWord)st->Ist.IMark.=
addr ),
+ st->Ist.IMark.len );
}
- if (clo_detailed_counts) {
- type =3D typeOfIRExpr(bb->tyenv, st->Ist.Store.data);
- tl_assert(type !=3D Ity_INVALID);
- instrument_detail( bb, OpStore, type );
- }
- addStmtToIRBB( bb, st );
+ addStmtToIRBB( bbOut, st );
break;
=20
case Ist_Tmp:
@@ -432,40 +646,92 @@
if (clo_trace_mem) {
IRExpr* data =3D st->Ist.Tmp.data;
if (data->tag =3D=3D Iex_Load) {
- addr_expr =3D data->Iex.Load.addr;
- size_expr =3D mkIRExpr_HWord( sizeofIRType(data->Iex.L=
oad.ty) );
- argv =3D mkIRExprVec_2( addr_expr, size_expr );
- di =3D unsafeIRDirty_0_N( /*regparms*/2,=20
- "trace_load",
- VG_(fnptr_to_fnentry)( trace_l=
oad ),=20
- argv );
- addStmtToIRBB( bb, IRStmt_Dirty(di) );
+ addEvent_Dr( bbOut, data->Iex.Load.addr,
+ sizeofIRType(data->Iex.Load.ty) );
}
}
if (clo_detailed_counts) {
IRExpr* expr =3D st->Ist.Tmp.data;
- type =3D typeOfIRExpr(bb->tyenv, expr);
+ type =3D typeOfIRExpr(bbOut->tyenv, expr);
tl_assert(type !=3D Ity_INVALID);
switch (expr->tag) {
case Iex_Load:
- instrument_detail( bb, OpLoad, type );
+ instrument_detail( bbOut, OpLoad, type );
break;
case Iex_Unop:
case Iex_Binop:
case Iex_Triop:
case Iex_Qop:
case Iex_Mux0X:
- instrument_detail( bb, OpAlu, type );
+ instrument_detail( bbOut, OpAlu, type );
break;
default:
break;
}
}
- addStmtToIRBB( bb, st );
+ addStmtToIRBB( bbOut, st );
break;
=20
+ case Ist_Store:
+ if (clo_trace_mem) {
+ IRExpr* data =3D st->Ist.Store.data;
+ addEvent_Dw( bbOut, st->Ist.Store.addr,
+ sizeofIRType(typeOfIRExpr(tyenv, data)) );
+ }
+ if (clo_detailed_counts) {
+ type =3D typeOfIRExpr(bbOut->tyenv, st->Ist.Store.data);
+ tl_assert(type !=3D Ity_INVALID);
+ instrument_detail( bbOut, OpStore, type );
+ }
+ addStmtToIRBB( bbOut, st );
+ break;
+
+ case Ist_Dirty: {
+ Int dsize;
+ IRDirty* d =3D st->Ist.Dirty.details;
+ if (d->mFx !=3D Ifx_None) {
+ // This dirty helper accesses memory. Collect the detail=
s.
+ tl_assert(d->mAddr !=3D NULL);
+ tl_assert(d->mSize !=3D 0);
+ dsize =3D d->mSize;
+ if (d->mFx =3D=3D Ifx_Read || d->mFx =3D=3D Ifx_Modify)
+ addEvent_Dr( bbOut, d->mAddr, dsize );
+ if (d->mFx =3D=3D Ifx_Write || d->mFx =3D=3D Ifx_Modify)
+ addEvent_Dw( bbOut, d->mAddr, dsize );
+ } else {
+ tl_assert(d->mAddr =3D=3D NULL);
+ tl_assert(d->mSize =3D=3D 0);
+ }
+ addStmtToIRBB( bbOut, st );
+ break;
+ }
+
+ case Ist_Exit:
+ if (clo_basic_counts) {
+ /* Count Jcc */
+ di =3D unsafeIRDirty_0_N( 0, "add_one_Jcc",=20
+ VG_(fnptr_to_fnentry)( &add_on=
e_Jcc ),=20
+ mkIRExprVec_0() );
+ addStmtToIRBB( bbOut, IRStmt_Dirty(di) );
+ }
+ if (clo_trace_mem) {
+ flushEvents(bbOut);
+ }
+
+ addStmtToIRBB( bbOut, st ); // Original statement
+
+ if (clo_basic_counts) {
+ /* Count non-taken Jcc */
+ di =3D unsafeIRDirty_0_N( 0, "add_one_Jcc_untaken",=20
+ VG_(fnptr_to_fnentry)(
+ &add_one_Jcc_untaken ),
+ mkIRExprVec_0() );
+ addStmtToIRBB( bbOut, IRStmt_Dirty(di) );
+ }
+ break;
+
default:
- addStmtToIRBB( bb, st );
+ tl_assert(0);
}
}
=20
@@ -474,10 +740,15 @@
di =3D unsafeIRDirty_0_N( 0, "add_one_BB_completed",=20
VG_(fnptr_to_fnentry)( &add_one_BB_comp=
leted ),
mkIRExprVec_0() );
- addStmtToIRBB( bb, IRStmt_Dirty(di) );
+ addStmtToIRBB( bbOut, IRStmt_Dirty(di) );
}
=20
- return bb;
+ if (clo_trace_mem) {
+ /* At the end of the bbIn. Flush outstandings. */
+ flushEvents(bbOut);
+ }
+
+ return bbOut;
}
=20
static void lk_fini(Int exitcode)
|