You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
|
1
(14) |
2
(8) |
3
(7) |
|
4
(7) |
5
(7) |
6
(6) |
7
(11) |
8
(10) |
9
(14) |
10
(10) |
|
11
(13) |
12
(15) |
13
(6) |
14
(8) |
15
(6) |
16
(6) |
17
(6) |
|
18
(6) |
19
(11) |
20
(15) |
21
(14) |
22
(11) |
23
(7) |
24
(17) |
|
25
(14) |
26
(28) |
27
(21) |
28
(23) |
29
(21) |
30
(17) |
31
(8) |
|
From: <sv...@va...> - 2007-03-27 23:40:49
|
Author: njn
Date: 2007-03-28 00:40:46 +0100 (Wed, 28 Mar 2007)
New Revision: 6679
Log:
- Cleaned up get_XCon and related code. Now easier to understand (broken
into smaller pieces, better variable names, more and better comments, etc)
and should be more robust. No confusing 0xFFFFFFFE entries either.
Correctly removes --alloc-fn functions (and everything above them) and
everything below 'main'. Aborts gracefully if every entry in the trace is
an --alloc-fn function.
- Removed some more dead code.
- Improved some other comments.
- Reinstated stats printing (with --verbose)
Modified:
branches/MASSIF2/massif/ms_main.c
Modified: branches/MASSIF2/massif/ms_main.c
===================================================================
--- branches/MASSIF2/massif/ms_main.c 2007-03-27 08:08:16 UTC (rev 6678)
+++ branches/MASSIF2/massif/ms_main.c 2007-03-27 23:40:46 UTC (rev 6679)
@@ -77,6 +77,12 @@
// - "show me the extra allocations from last-snapshot"
// - "start/stop logging" (eg. quickly skip boring bits)
//
+// Docs:
+// - need to explain that --alloc-fn changed slightly -- now if an entry
+// matches an alloc-fn, that entry *and all above it* are removed. So you
+// can cut out allc-fn chains at the bottom, rather than having to name
+// all of them, which is better.
+//
//---------------------------------------------------------------------------
// Memory profiler. Produces a graph, gives lots of information about
@@ -231,21 +237,27 @@
/*------------------------------------------------------------*/
// An XPt represents an "execution point", ie. a code address. Each XPt is
-// part of a tree of XPts (an "execution tree", or "XTree"). Each
-// top-to-bottom path through an XTree gives an execution context ("XCon"),
-// and is equivalent to a traditional Valgrind ExeContext.
+// part of a tree of XPts (an "execution tree", or "XTree").
//
-// The XPt at the top of an XTree (but below "alloc_xpt") is called a
-// "top-XPt". The XPts are the bottom of an XTree (leaf nodes) are
-// "bottom-XPTs". The number of XCons in an XTree is equal to the number of
-// bottom-XPTs in that XTree.
+// The root of the tree is 'alloc_xpt', which represents all allocation
+// functions, eg:
+// - malloc/calloc/realloc/memalign/new/new[];
+// - user-specified allocation functions (using --alloc-fn);
+// - custom allocation (MALLOCLIKE) points
+// It's a bit of a fake XPt (ie. its 'ip' is zero), and is only used because
+// it makes the code simpler.
//
-// All XCons have the same top-XPt, "alloc_xpt", which represents all
-// allocation functions like malloc(). It's a bit of a fake XPt, though,
-// and is only used because it makes some of the code simpler.
+// Any child of 'alloc_xpt' is called a "top-XPt". The XPts are the bottom
+// of an XTree (leaf nodes) are "bottom-XPTs". The number of XCons in an
+// XTree is equal to the number of bottom-XPTs in that XTree.
//
-// XTrees are bi-directional.
+// Each path from a top-XPt to a bottom-XPt through an XTree gives an
+// execution context ("XCon"), ie. a stack trace. (And sub-paths represent
+// stack sub-traces.)
//
+// alloc_xpt XTrees are bi-directional.
+// | ^
+// v |
// > parent < Example: if child1() calls parent() and child2()
// / | \ also calls parent(), and parent() calls malloc(),
// | / \ | the XTree will look like this.
@@ -262,11 +274,12 @@
// Nb: this value goes up and down as the program executes.
UInt curr_szB;
- // n_children and max_children are 32-bit integers, not 16-bit, because
- // a very big program might have more than 65536 allocation points
- // (Konqueror startup has 1800).
XPt* parent; // pointer to parent XPt
+ // Children.
+ // n_children and max_children are 32-bit integers, not 16-bit, because
+ // a very big program might have more than 65536 allocation points (ie.
+ // top-XPts) -- Konqueror starting up has 1800.
UInt n_children; // number of children
UInt max_children; // capacity of children array
XPt** children; // pointers to children XPts
@@ -302,20 +315,8 @@
// calculation that just sums the totals; ie. it assumes all samples are
// the same distance apart).
-#define MAX_SNAPSHOTS 32
-
typedef
struct {
- XPt* xpt;
- UInt space;
- }
- XPtSnapshot;
-
-// An XTree snapshot is stored as an array of of XPt snapshots.
-typedef XPtSnapshot* XTreeSnapshot;
-
-typedef
- struct {
Int ms_time; // Int: must allow -1
SizeT total_szB; // Size of all allocations at that census time
}
@@ -326,7 +327,8 @@
// HP_Chunks, XPt 'space' fields are incremented (at allocation) and
// decremented (at deallocation).
//
-// Nb: first two fields must match core's VgHashNode.
+// Nb: first two fields must match core's VgHashNode. [XXX: is that still
+// true?]
typedef
struct _HP_Chunk {
struct _HP_Chunk* next;
@@ -349,15 +351,14 @@
// - 15,000 XPts 800,000 XPts
// - 1,800 top-XPts
-// XXX: check if we still need all these...
static UInt n_xpts = 0;
-static UInt n_bot_xpts = 0;
static UInt n_allocs = 0;
static UInt n_zero_allocs = 0;
static UInt n_frees = 0;
static UInt n_children_reallocs = 0;
-//static UInt n_snapshot_frees = 0;
+static UInt n_getXCon_redo = 0;
+
static UInt n_halvings = 0;
static UInt n_real_censi = 0;
static UInt n_fake_censi = 0;
@@ -376,7 +377,6 @@
#define BUF_LEN 1024 // general purpose
static Char buf [BUF_LEN];
static Char buf2[BUF_LEN];
-//static Char buf3[BUF_LEN];
// Make these signed so things are more obvious if they go negative.
static SSizeT sigstacks_szB = 0; // Current signal stacks space sum
@@ -397,6 +397,8 @@
static UInt n_alloc_fns = 10;
static Char* alloc_fns[MAX_ALLOC_FNS] = {
"malloc",
+ // XXX: maybe these four shouldn't be in here? Someone might want to see
+ // inside them...
"operator new(unsigned)",
"operator new[](unsigned)",
"operator new(unsigned, std::nothrow_t const&)",
@@ -470,7 +472,7 @@
}
/*------------------------------------------------------------*/
-/*--- Execution contexts ---*/
+/*--- XPts ---*/
/*------------------------------------------------------------*/
// Fake XPt representing all allocation functions like malloc(). Acts as
@@ -499,180 +501,234 @@
return (void*)(hp - n_bytes);
}
-
-
-static XPt* new_XPt(Addr ip, XPt* parent, Bool is_bottom)
+static XPt* new_XPt(Addr ip, XPt* parent)
{
+ // XPts are never freed, so we can use perm_malloc to allocate them.
+ // Note that we cannot use perm_malloc for the 'children' array, because
+ // that needs to be resizable.
XPt* xpt = perm_malloc(sizeof(XPt));
xpt->ip = ip;
xpt->curr_szB = 0;
xpt->parent = parent;
- // Check parent is not a bottom-XPt
- tl_assert(parent == NULL || 0 != parent->max_children);
-
+ // We don't initially allocate any space for children. We let that
+ // happen on demand. Many XPts (ie. all the bottom-XPts) don't have any
+ // children anyway.
xpt->n_children = 0;
+ xpt->max_children = 0;
+ xpt->children = NULL;
- // If a bottom-XPt, don't allocate space for children. This can be 50%
- // or more, although it tends to drop as --depth increases (eg. 10% for
- // konqueror with --depth=20).
- if ( is_bottom ) {
- xpt->max_children = 0;
- xpt->children = NULL;
- n_bot_xpts++;
- } else {
- xpt->max_children = 4;
- xpt->children = VG_(malloc)( xpt->max_children * sizeof(XPt*) );
- }
-
// Update statistics
n_xpts++;
return xpt;
}
-static Bool is_alloc_fn(Addr ip)
+static void add_child_xpt(XPt* parent, XPt* child)
{
+ // Expand 'children' if necessary.
+ tl_assert(parent->n_children <= parent->max_children);
+ if (parent->n_children == parent->max_children) {
+ if (parent->max_children == 0) {
+ parent->max_children = 4;
+ parent->children = VG_(malloc)( parent->max_children * sizeof(XPt*) );
+ } else {
+ parent->max_children *= 2; // Double size
+ parent->children = VG_(realloc)( parent->children,
+ parent->max_children * sizeof(XPt*) );
+ }
+ n_children_reallocs++;
+ }
+
+ // Insert new child XPt in parent's children list.
+ parent->children[ parent->n_children++ ] = child;
+}
+
+// Reverse comparison for a reverse sort -- biggest to smallest.
+static Int XPt_revcmp_curr_szB(void* n1, void* n2)
+{
+ XPt* xpt1 = *(XPt**)n1;
+ XPt* xpt2 = *(XPt**)n2;
+ return ( xpt1->curr_szB < xpt2->curr_szB ? 1
+ : xpt1->curr_szB > xpt2->curr_szB ? -1
+ : 0);
+}
+
+/*------------------------------------------------------------*/
+/*--- XCons ---*/
+/*------------------------------------------------------------*/
+
+// This is the limit on the number of removed alloc-fns that can be in a
+// single XCon.
+#define MAX_OVERESTIMATE 50
+#define MAX_IPS (MAX_DEPTH + MAX_OVERESTIMATE)
+
+static Bool is_alloc_fn(Char* fnname)
+{
Int i;
+ for (i = 0; i < n_alloc_fns; i++) {
+ if (VG_STREQ(fnname, alloc_fns[i]))
+ return True;
+ }
+ return False;
+}
- if ( VG_(get_fnname)(ip, buf, BUF_LEN) ) {
- for (i = 0; i < n_alloc_fns; i++) {
- if (VG_STREQ(buf, alloc_fns[i]))
- return True;
- }
+// XXX: look at the "(below main)"/"__libc_start_main" mess (m_stacktrace.c
+// and m_demangle.c). Don't hard-code "(below main)" in here.
+static Bool is_main_or_below_main(Char* fnname)
+{
+ Int i;
+
+ for (i = 0; i < n_alloc_fns; i++) {
+ if (VG_STREQ(fnname, "main")) return True;
+ if (VG_STREQ(fnname, "(below main)")) return True;
}
return False;
}
-// XXX: check, improve this!
-// Returns an XCon, from the bottom-XPt. Nb: the XPt returned must be a
-// bottom-XPt now and must always remain a bottom-XPt. We go to some effort
-// to ensure this in certain cases. See comments below.
-static XPt* get_XCon( ThreadId tid, Bool custom_malloc )
+// Get the stack trace for an XCon, filtering out uninteresting entries:
+// alloc-fns and entries above alloc-fns, and entries below
+// main-or-below-main.
+// Eg: alloc-fn1 / alloc-fn2 / a / b / main / (below main) / c
+// becomes: a / b / main
+static
+Int get_IPs( ThreadId tid, Bool is_custom_malloc, Addr ips[], Int max_ips)
{
- // Static to minimise stack size. +1 for added ~0 IP
- // XXX: MAX_ALLOC_FNS isn't the right number to use here -- that's the
- // total number of them, we want the number that might occur in a
- // stacktrace (if there were repeats...)
- static Addr ips[MAX_DEPTH + MAX_ALLOC_FNS + 1];
+ Int n_ips, i, n_alloc_fns_removed = 0;
+ Int overestimate;
+ Bool fewer_IPs_than_asked_for = False;
+ Bool removed_below_main = False;
+ Bool enough_IPs_after_filtering = False;
- XPt* xpt = alloc_xpt;
- UInt n_ips, L, A, B, nC;
- UInt overestimate;
- Bool reached_bottom;
+ // XXX: get this properly
+ Bool should_hide_below_main = /*!VG_(clo_show_below_main)*/True;
+ // We ask for a few more IPs than clo_depth suggests we need. Then we
+ // remove every entry that is an alloc-fns or above an alloc-fn, and
+ // remove anything below main-or-below-main functions. Depending on the
+ // circumstances, we may need to redo it all, asking for more IPs.
+ // Details:
+ // - If the original stack trace is smaller than asked-for, redo=False
+ // - Else if we see main-or-below-main in the stack trace, redo=False
+ // - Else if after filtering we have more than clo_depth IPs, redo=False
+ // - Else redo=True
+ // In other words, to redo, we'd have to get a stack trace as big as we
+ // asked for, remove more than 'overestimate' alloc-fns, and not hit
+ // main-or-below-main.
-//---------------------------------------------------------------------------
-// simplified Algorithm
-// - get the biggest stack-trace possible: ips[n]
-// - filter out alloc-fns: --> ips[n2], n2<=n
-// - curr_xpt = alloc_xpt
-// - foreach ip in ips[]:
-// - if ip is in curr_xpt->children[]
-// - then: curr_xpt = the matching child
-// - else: add new child (with ip) to curr_xpt->children[],
-// curr_xpt = the new child
-// - return curr_xpt as the bottom-XPt
-//
-// Notes:
-// - a bottom-XPt should never become a non-bottom-XPt, because its curr_szB
-// would get mucked up. Eg. if we have an XCon A/B/C, we should never see
-// a later XCon A/B/C/D, because C would no longer be a bottom-XPt. It
-// doesn't seem like this should ever happen, but it's hard to know for
-// sure.
-// [XXX: if main is recursive, you could imagine getting main/A,
-// then main/main/A...]
-// [XXX: actually, not true -- the curr_szB wouldn't be mucked up.
-//
-//---------------------------------------------------------------------------
+ // Main loop
+ for (overestimate = 3; True; overestimate += 6) {
+ // This should never happen -- would require MAX_OVERESTIMATE
+ // alloc-fns to be removed from the stack trace.
+ if (overestimate > MAX_OVERESTIMATE)
+ VG_(tool_panic)("get_IPs: ips[] too small, inc. MAX_OVERESTIMATE?");
- // Want at least clo_depth non-alloc-fn entries in the snapshot.
- // However, because we have 1 or more (an unknown number, at this point)
- // alloc-fns ignored, we overestimate the size needed for the stack
- // snapshot. Then, if necessary, we repeatedly increase the size until
- // it is enough.
- overestimate = 2;
- while (True) {
+ // Ask for more than clo_depth suggests we need.
n_ips = VG_(get_StackTrace)( tid, ips, clo_depth + overestimate );
+ tl_assert(n_ips > 0);
- // Now we add a dummy "unknown" IP at the end. This is only used if we
- // run out of IPs before hitting clo_depth. It's done to ensure the
- // XPt we return is (now and forever) a bottom-XPt. If the returned XPt
- // wasn't a bottom-XPt (now or later) it would cause problems later (eg.
- // the parent's approx_ST wouldn't be equal [or almost equal] to the
- // total of the childrens' approx_STs).
- ips[ n_ips++ ] = ~((Addr)0);
+ // If we got fewer IPs than we asked for, redo=False
+ if (n_ips < clo_depth + overestimate)
+ fewer_IPs_than_asked_for = True;
- // Skip over alloc functions in ips[].
- for (L = 0; is_alloc_fn(ips[L]) && L < n_ips; L++) { }
+ // Filter uninteresting entries out of the stack trace. n_ips is
+ // updated accordingly.
+ for (i = n_ips-1; i >= 0; i--) {
+ if (VG_(get_fnname)(ips[i], buf, BUF_LEN)) {
+ // If it's a main-or-below-main function, we (may) want to
+ // ignore everything after it.
+ // If we see one of these functions, redo=False.
+ if (should_hide_below_main && is_main_or_below_main(buf)) {
+ n_ips = i+1; // Ignore everything below here.
+ removed_below_main = True;
+ }
+
+ // If it's an alloc-fn, we want to delete it and everything
+ // before it.
+ if (is_alloc_fn(buf)) {
+ Int j;
+ if (i+1 >= n_ips) {
+ // This occurs if removing an alloc-fn and entries above
+ // it results in an empty stack trace.
+ VG_(message)(Vg_UserMsg,
+ "User error: nothing but alloc-fns in stack trace");
+ VG_(message)(Vg_UserMsg,
+ "Try removing --alloc-fn=%s option and try again.", buf);
+ VG_(message)(Vg_UserMsg,
+ "Exiting.");
+ VG_(exit)(1);
+ }
+ n_alloc_fns_removed = i+1;
+
+ for (j = 0; j < n_ips; j++) { // Shuffle the rest down.
+ ips[j] = ips[j + n_alloc_fns_removed];
+ }
+ n_ips -= n_alloc_fns_removed;
+ break;
+ }
+ }
+ }
+
// Must be at least one alloc function, unless client used
- // MALLOCLIKE_BLOCK
- if (!custom_malloc) tl_assert(L > 0);
+ // MALLOCLIKE_BLOCK.
+ if (!is_custom_malloc) tl_assert(n_alloc_fns_removed > 0);
- // Should be at least one non-alloc function. If not, try again.
- if (L == n_ips) {
- overestimate += 2;
- if (overestimate > MAX_ALLOC_FNS)
- VG_(tool_panic)("No stk snapshot big enough to find non-alloc fns");
+ // Did we get enough IPs after filtering? If so, redo=False.
+ if (n_ips >= clo_depth) {
+ n_ips = clo_depth; // Ignore any IPs below --depth.
+ enough_IPs_after_filtering = True;
+ }
+
+ if (fewer_IPs_than_asked_for ||
+ removed_below_main ||
+ enough_IPs_after_filtering)
+ {
+ return n_ips;
+
} else {
- break;
+ n_getXCon_redo++;
}
}
- A = L;
- B = n_ips - 1;
- reached_bottom = False;
+}
- // By this point, the IPs we care about are in ips[A]..ips[B]
+// Gets an XCon and puts it in the tree. Returns the XCon's bottom-XPt.
+static XPt* get_XCon( ThreadId tid, Bool is_custom_malloc )
+{
+ static Addr ips[MAX_IPS]; // Static to minimise stack size.
+ Int i;
+ XPt* xpt = alloc_xpt;
+ // After this call, the IPs we want are in ips[0]..ips[n_ips-1].
+ Int n_ips = get_IPs(tid, is_custom_malloc, ips, MAX_IPS);
+
// Now do the search/insertion of the XCon. 'L' is the loop counter,
// being the index into ips[].
- while (True) {
+ for (i = 0; i < n_ips; i++) {
+ Addr ip = ips[i];
+ Int ch;
// Look for IP in xpt's children.
// XXX: linear search, ugh -- about 10% of time for konqueror startup
- // XXX: tried cacheing last result, only hit about 4% for konqueror
+ // XXX: tried caching last result, only hit about 4% for konqueror
// Nb: this search hits about 98% of the time for konqueror
+ for (ch = 0; True; ch++) {
+ if (ch == xpt->n_children) {
+ // IP not found in the children.
+ // Create and add new child XPt, then stop.
+ XPt* new_child_xpt = new_XPt(ip, xpt);
+ add_child_xpt(xpt, new_child_xpt);
+ xpt = new_child_xpt;
+ break;
- // If we've searched/added deep enough, or run out of EIPs, this is
- // the bottom XPt.
- if (L - A + 1 == clo_depth || L == B)
- reached_bottom = True;
-
- nC = 0;
- while (True) {
- if (nC == xpt->n_children) {
- // not found, insert new XPt
- // XXX: assertion can fail (eg. bug 89061). Apparently caused
- // by getting an IP in the stack trace that is ~0 (eg.
- // 0xffffffff).
- tl_assert(xpt->max_children != 0);
- tl_assert(xpt->n_children <= xpt->max_children);
- // Expand 'children' if necessary
- if (xpt->n_children == xpt->max_children) {
- xpt->max_children *= 2;
- xpt->children = VG_(realloc)( xpt->children,
- xpt->max_children * sizeof(XPt*) );
- n_children_reallocs++;
- }
- // Make new XPt for IP, insert in list
- xpt->children[ xpt->n_children++ ] =
- new_XPt(ips[L], xpt, reached_bottom);
+ } else if (ip == xpt->children[ch]->ip) {
+ // Found the IP in the children, stop.
+ xpt = xpt->children[ch];
break;
}
- if (ips[L] == xpt->children[nC]->ip) break; // found the IP
- nC++; // keep looking
}
-
- // Return found/built bottom-XPt.
- if (reached_bottom) {
- tl_assert(0 == xpt->children[nC]->n_children); // Must be bottom-XPt
- return xpt->children[nC];
- }
-
- // Descend to next level in XTree, the newly found/built non-bottom-XPt
- xpt = xpt->children[nC];
- L++;
}
+ tl_assert(0 == xpt->n_children); // Must be bottom-XPt XXX: really?
+ return xpt;
}
// Update 'curr_szB' of every XPt in the XCon, by percolating upwards.
@@ -694,16 +750,6 @@
alloc_xpt->curr_szB += space_delta;
}
-// Reverse comparison for a reverse sort -- biggest to smallest.
-static Int XPt_revcmp_curr_szB(void* n1, void* n2)
-{
- XPt* xpt1 = *(XPt**)n1;
- XPt* xpt2 = *(XPt**)n2;
- return ( xpt1->curr_szB < xpt2->curr_szB ? 1
- : xpt1->curr_szB > xpt2->curr_szB ? -1
- : 0);
-}
-
/*------------------------------------------------------------*/
/*--- Heap management ---*/
/*------------------------------------------------------------*/
@@ -1430,17 +1476,7 @@
depth_str[depth*2+1] = ' ';
depth_str[depth*2+2] = '\0';
}
- if (child->n_children > 0 &&
- // XXX: horrible -- need to totally overhaul below-main checking,
- // do it in m_stacktrace.c. [Ah, but we don't know the function
- // names at that point, just the IPs...]
- !VG_(strstr)(ip_desc, " main (")
-# if defined(VGO_linux)
- && !VG_(strstr)(ip_desc, "__libc_start_main") // glibc glibness
- && !VG_(strstr)(ip_desc, "generic_start_main") // Yellow Dog doggedness
-# endif
- )
- {
+ if (child->n_children > 0) {
pp_snapshot_child_XPts(child, depth+1, depth_str, depth_str_len,
curr_heap_szB, curr_total_szB);
} else {
@@ -1488,7 +1524,8 @@
P("(No heap memory currently allocated)\n");
} else {
P("Heap tree:\n");
- P("%6s: (heap allocation functions) malloc, new, new[], etc.\n",
+ P("%6s: (heap allocation functions) malloc/new/new[],"
+ " --alloc-fn functions, etc.\n",
make_perc(curr_heap_szB, curr_total_szB));
pp_snapshot_child_XPts(alloc_xpt, 0, depth_str, depth_str_len,
@@ -1510,6 +1547,24 @@
// Output.
write_text_graph();
+
+ // Stats
+ if (VG_(clo_verbosity) > 1) {
+ tl_assert(n_xpts > 0); // always have alloc_xpt
+ VG_(message)(Vg_DebugMsg, " allocs: %u", n_allocs);
+ VG_(message)(Vg_DebugMsg, "zeroallocs: %u (%d%%)", n_zero_allocs,
+ n_zero_allocs * 100 / n_allocs );
+ VG_(message)(Vg_DebugMsg, " frees: %u", n_frees);
+ VG_(message)(Vg_DebugMsg, " XPts: %u (%d B)", n_xpts,
+ n_xpts*sizeof(XPt));
+ VG_(message)(Vg_DebugMsg, " top-XPts: %u (%d%%)", alloc_xpt->n_children,
+ alloc_xpt->n_children * 100 / n_xpts);
+ VG_(message)(Vg_DebugMsg, "c-reallocs: %u", n_children_reallocs);
+ VG_(message)(Vg_DebugMsg, "fake censi: %u", n_fake_censi);
+ VG_(message)(Vg_DebugMsg, "real censi: %u", n_real_censi);
+ VG_(message)(Vg_DebugMsg, " halvings: %u", n_halvings);
+ VG_(message)(Vg_DebugMsg, "XCon_redos: %u", n_getXCon_redo);
+ }
}
/*------------------------------------------------------------*/
@@ -1572,7 +1627,7 @@
malloc_list = VG_(HT_construct)( 80021 ); // prime, big
// Dummy node at top of the context structure.
- alloc_xpt = new_XPt(0, NULL, /*is_bottom*/False);
+ alloc_xpt = new_XPt(/*ip*/0, /*parent*/NULL);
tl_assert( VG_(getcwd)(base_dir, VKI_PATH_MAX) );
}
|
|
From: Julian S. <js...@ac...> - 2007-03-27 22:55:23
|
Josef > > If I understand your problem correctly, what you describe is the same > > problem that people have when writing bytecode interpreters. > > Hmm.. Yes, should be quite similar. > > > This is > > a quite-well-studied problem. See > > http://www.csc.uvic.ca/~csc586a/slides/BranchPredict.pdf > > for a good discussion and suggestions. > > Thanks for the pointer! Welcome. If you get any speedups as a result of these tricks I'd be interested to hear about them, because the main valgrind dispatcher (m_dispatch) suffers from exactly the same problem. J |
|
From: Julian S. <js...@ac...> - 2007-03-27 22:50:07
|
> Or you could just disallow return values in conditional calls? Yes, that sounds simpler. Good. > More generally, how do you represent a conditional move in SSA? Maybe with a ternary ?-: style operator? In Vex it's done using Mux0X. J |
|
From: Nicholas N. <nj...@cs...> - 2007-03-27 21:57:03
|
On Tue, 27 Mar 2007, Julian Seward wrote: >> (*2*) One question about guards for dirty helper calls: Is it correct >> that using a guard together with a return value is useless? >> As a temporary register can be only written once (SSA constrain), what >> is the value of the temporary register for the return value if there >> is no call happening? It would be useful to provide a IRExpr to >> assign to the temporary when the call was not done. > > Looks a genuine semantic hole in IR afaics. (!) > > I have no good answer. I never thought of this before. > > I looked at the x86 backend to see what it would do in that situation. It > generates code to do the call (or not), and then copies %eax into the register > assigned to the temporary, regardless of whether the call happened. > Which means it will be garbage if the call did not happen. > (VEX/priv/host-x86/isel.c:3566) > > Maybe a dirty helper should specify a default value for the temporary if > the call is not taken. That would make it safe. If the backend can prove > the call is always taken then it can omit computation/assignment of the > default and so avoid performance loss in the common cases. Hmm, this is > all a bit messy. Need to think about it more. Or you could just disallow return values in conditional calls? That would be fine with all the current uses, and with Josef's proposed usage too, if I understand correctly. More generally, how do you represent a conditional move in SSA? I don't think you can, conditional moves don't really make sense when you don't have specific locations (eg. registers) for holding values. N |
|
From: Nicholas N. <nj...@cs...> - 2007-03-27 21:53:06
|
On Tue, 27 Mar 2007, Josef Weidendorfer wrote: > recently I was playing with the idea to speed up cache simulation > by using 2 cores in these (not that new) dual-core processors. > > The idea is to run the cache simulation in another thread (*1*), > feeded by memory access events written into a buffer by > cachegrind generated as usual. There's a paper "SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance" by Wallace and Hazelwood that discusses this topic. > In a first step, separating the event producer and the consumer via > a simple buffer works quite nicely, still in sequential mode; > the patch even is quite minimal, as I have a 1:1 relation between > log_* simulation calls and event types, producing the same result. > Cachegrinds instrumentation does not call the simulation routines > directly, but writes the data into a buffer. At beginning of a SB, > it is checked if there is enough space available for all events > which could be produced by this SB. > If the buffer is full, a helper is run which does simulation for > all events sitting in the buffer, clearing it afterwards. The nice > thing is that this helper call is the only C call needed in the > instrumentation, and it is guarded by the "full" condition (*2*). > > Using a buffer of 1 KB, I thought that this scheme should run > roughly at the seem speed as original cachegrind. > > However, quite surprisingly, the small event dispatcher loop in the > helper takes around 30% time with some cache friendly client code > (I used "rpm -q <package>"). The measurement was done with OProfile, > and should be quite fine. > > It looks like branch mispredictions are responsible for this, and > I checked this against measurement with according hardware performance > counters. My example did around 30 million events per second, so a > branch misprediction at every event can explain this 30% figure. > > Cachegrinds instrumentation itself never sees a misprediction, as the > generated instrumentation knows which simulation functions to call. > However, when dispatching events from the buffer, there is no > possibility to predict the next event. So what was the overall slow-down of this producer/consumer version vs original? I tried the same thing a few years ago, and found that the number of instructions was greatly reduced (by roughly 25% IIRC) but that the increase in branch mispredictions neutralized that and speed was similar to the original, and the producer/consumer version was more complex. > Currently I am searching for a way to transfer the knowledge > of the event producer (cachegrinds instrumentation) to the consumer > (the dispatch loop) to get less mispredictions there. Perhaps someone > has better ideas? > > Idea (1): > Coding multiple events into bigger ones would reduce the event count, > and thus, number of mispredictions. However, the number of dispatch cases > explodes, and makes the code bigger. > > Idea (2): > Use VEX to generate code pieces for the consumer which match one client > SB each. The generated code has exactly the knowledge, which memory > access events happen in a given client SB. > This totally would get rid of mispredictions, and has the nice effect > to reduce the number of data which has to be sent to the event consumer, > as all fixed values (instruction addresses, sizes, data sizes) are > already incorporated in the generated code. > > However, I have no idea how to make this work. Generating a further > IRSB while instrumenting a client SB is easy, but how to run VEX from > there to generate real code? The generated code has to be put into > the translation cache, but the tool should have some way to control > its live cycle. At least, there has to be some hook for the tool > before such a code would be discarded by Valgrind. > Or would it be useful to have a tool-controlled separate translation > cache? > > Perhaps this idea is crazy, and in the end not worthwhile. > For the visioned parallelization to be useful, the communication > overhead to the other core needs to be really low, which > means to minimize MESI transactions. It, like a lot of parallel programming tasks, sounds difficult and fragile. N |
|
From: Josef W. <Jos...@gm...> - 2007-03-27 15:40:50
|
On Tuesday 27 March 2007, Julian Seward wrote: > > > It looks like branch mispredictions are responsible for this, and > > I checked this against measurement with according hardware performance > > counters. My example did around 30 million events per second, so a > > branch misprediction at every event can explain this 30% figure. > > If I understand your problem correctly, what you describe is the same > problem that people have when writing bytecode interpreters. Hmm.. Yes, should be quite similar. > This is > a quite-well-studied problem. See > http://www.csc.uvic.ca/~csc586a/slides/BranchPredict.pdf > for a good discussion and suggestions. Thanks for the pointer! Josef |
|
From: Julian S. <js...@ac...> - 2007-03-27 15:12:55
|
> It looks like branch mispredictions are responsible for this, and > I checked this against measurement with according hardware performance > counters. My example did around 30 million events per second, so a > branch misprediction at every event can explain this 30% figure. If I understand your problem correctly, what you describe is the same problem that people have when writing bytecode interpreters. This is a quite-well-studied problem. See http://www.csc.uvic.ca/~csc586a/slides/BranchPredict.pdf for a good discussion and suggestions. J |
|
From: Julian S. <js...@ac...> - 2007-03-27 15:05:17
|
> It looks like branch mispredictions are responsible for this, and
> I checked this against measurement with according hardware performance
> counters. My example did around 30 million events per second, so a
> branch misprediction at every event can explain this 30% figure.
Maybe you need to give the branch predictors more code to correlate the
dispatcher loop branch(es) against. One possibility is to unroll by hand
the dispatcher loop a few times, so that each switch can be correlated
to some extent with branches in earlier switches:
before:
for (/*iterate over events*) {
switch (event[i]) {
...
}
}
-->
for (/*iterate over events*) {
switch (event[i]) {
...
}
switch (event[i+1]) {
...
}
switch (event[i+2]) {
...
}
switch (event[i+3]) {
...
}
}
I would add only that I have tried games like this before (in m_dispatch/*.S)
and although it is possible sometimes to get large wins, I found it difficult
to do that consistently. I found it to be like chasing ghosts.
> (*2*) One question about guards for dirty helper calls: Is it correct
> that using a guard together with a return value is useless?
> As a temporary register can be only written once (SSA constrain), what
> is the value of the temporary register for the return value if there
> is no call happening? It would be useful to provide a IRExpr to
> assign to the temporary when the call was not done.
Looks a genuine semantic hole in IR afaics. (!)
I have no good answer. I never thought of this before.
I looked at the x86 backend to see what it would do in that situation. It
generates code to do the call (or not), and then copies %eax into the register
assigned to the temporary, regardless of whether the call happened.
Which means it will be garbage if the call did not happen.
(VEX/priv/host-x86/isel.c:3566)
Maybe a dirty helper should specify a default value for the temporary if
the call is not taken. That would make it safe. If the backend can prove
the call is always taken then it can omit computation/assignment of the
default and so avoid performance loss in the common cases. Hmm, this is
all a bit messy. Need to think about it more.
J
|
|
From: Josef W. <Jos...@gm...> - 2007-03-27 14:07:28
|
Hi, recently I was playing with the idea to speed up cache simulation by using 2 cores in these (not that new) dual-core processors. The idea is to run the cache simulation in another thread (*1*), feeded by memory access events written into a buffer by cachegrind generated as usual. In a first step, separating the event producer and the consumer via a simple buffer works quite nicely, still in sequential mode; the patch even is quite minimal, as I have a 1:1 relation between log_* simulation calls and event types, producing the same result. Cachegrinds instrumentation does not call the simulation routines directly, but writes the data into a buffer. At beginning of a SB, it is checked if there is enough space available for all events which could be produced by this SB. If the buffer is full, a helper is run which does simulation for all events sitting in the buffer, clearing it afterwards. The nice thing is that this helper call is the only C call needed in the instrumentation, and it is guarded by the "full" condition (*2*). Using a buffer of 1 KB, I thought that this scheme should run roughly at the seem speed as original cachegrind. However, quite surprisingly, the small event dispatcher loop in the helper takes around 30% time with some cache friendly client code (I used "rpm -q <package>"). The measurement was done with OProfile, and should be quite fine. It looks like branch mispredictions are responsible for this, and I checked this against measurement with according hardware performance counters. My example did around 30 million events per second, so a branch misprediction at every event can explain this 30% figure. Cachegrinds instrumentation itself never sees a misprediction, as the generated instrumentation knows which simulation functions to call. However, when dispatching events from the buffer, there is no possibility to predict the next event. Currently I am searching for a way to transfer the knowledge of the event producer (cachegrinds instrumentation) to the consumer (the dispatch loop) to get less mispredictions there. Perhaps someone has better ideas? Idea (1): Coding multiple events into bigger ones would reduce the event count, and thus, number of mispredictions. However, the number of dispatch cases explodes, and makes the code bigger. Idea (2): Use VEX to generate code pieces for the consumer which match one client SB each. The generated code has exactly the knowledge, which memory access events happen in a given client SB. This totally would get rid of mispredictions, and has the nice effect to reduce the number of data which has to be sent to the event consumer, as all fixed values (instruction addresses, sizes, data sizes) are already incorporated in the generated code. However, I have no idea how to make this work. Generating a further IRSB while instrumenting a client SB is easy, but how to run VEX from there to generate real code? The generated code has to be put into the translation cache, but the tool should have some way to control its live cycle. At least, there has to be some hook for the tool before such a code would be discarded by Valgrind. Or would it be useful to have a tool-controlled separate translation cache? Perhaps this idea is crazy, and in the end not worthwhile. For the visioned parallelization to be useful, the communication overhead to the other core needs to be really low, which means to minimize MESI transactions. Josef (*1*) Actually, I started adding a new VG_(createThread) to coregrind, which uses clone() to spawn a helper thread for a tool. However, I do not really know how to do this; it works somehow, but not really well: I get an assertion failure when pressing Ctrl-C from Valgrind. Should there be signal handlers for such a helper thread which pass any signals simply back to VG? (*2*) One question about guards for dirty helper calls: Is it correct that using a guard together with a return value is useless? As a temporary register can be only written once (SSA constrain), what is the value of the temporary register for the return value if there is no call happening? It would be useful to provide a IRExpr to assign to the temporary when the call was not done. |
|
From: <js...@ac...> - 2007-03-27 11:02:52
|
Nightly build on minnie ( SuSE 10.0, ppc32 ) started at 2007-03-27 09:00:02 BST Results unchanged from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 219 tests, 10 stderr failures, 6 stdout failures, 0 posttest failures == memcheck/tests/leak-tree (stderr) memcheck/tests/leakotron (stdout) memcheck/tests/pointer-trace (stderr) memcheck/tests/stack_changes (stderr) memcheck/tests/xml1 (stderr) none/tests/faultstatus (stderr) none/tests/fdleak_cmsg (stderr) none/tests/mremap (stderr) none/tests/mremap2 (stdout) none/tests/ppc32/jm-fp (stdout) none/tests/ppc32/jm-fp (stderr) none/tests/ppc32/round (stdout) none/tests/ppc32/round (stderr) none/tests/ppc32/test_fx (stdout) none/tests/ppc32/test_fx (stderr) none/tests/ppc32/test_gx (stdout) |
|
From: <sv...@va...> - 2007-03-27 08:08:18
|
Author: njn
Date: 2007-03-27 09:08:16 +0100 (Tue, 27 Mar 2007)
New Revision: 6678
Log:
Add some notes, remove some dead code.
Modified:
branches/MASSIF2/massif/ms_main.c
Modified: branches/MASSIF2/massif/ms_main.c
===================================================================
--- branches/MASSIF2/massif/ms_main.c 2007-03-27 07:43:51 UTC (rev 6677)
+++ branches/MASSIF2/massif/ms_main.c 2007-03-27 08:08:16 UTC (rev 6678)
@@ -266,6 +266,7 @@
// a very big program might have more than 65536 allocation points
// (Konqueror startup has 1800).
XPt* parent; // pointer to parent XPt
+
UInt n_children; // number of children
UInt max_children; // capacity of children array
XPt** children; // pointers to children XPts
@@ -550,6 +551,9 @@
static XPt* get_XCon( ThreadId tid, Bool custom_malloc )
{
// Static to minimise stack size. +1 for added ~0 IP
+ // XXX: MAX_ALLOC_FNS isn't the right number to use here -- that's the
+ // total number of them, we want the number that might occur in a
+ // stacktrace (if there were repeats...)
static Addr ips[MAX_DEPTH + MAX_ALLOC_FNS + 1];
XPt* xpt = alloc_xpt;
@@ -557,6 +561,31 @@
UInt overestimate;
Bool reached_bottom;
+
+//---------------------------------------------------------------------------
+// simplified Algorithm
+// - get the biggest stack-trace possible: ips[n]
+// - filter out alloc-fns: --> ips[n2], n2<=n
+// - curr_xpt = alloc_xpt
+// - foreach ip in ips[]:
+// - if ip is in curr_xpt->children[]
+// - then: curr_xpt = the matching child
+// - else: add new child (with ip) to curr_xpt->children[],
+// curr_xpt = the new child
+// - return curr_xpt as the bottom-XPt
+//
+// Notes:
+// - a bottom-XPt should never become a non-bottom-XPt, because its curr_szB
+// would get mucked up. Eg. if we have an XCon A/B/C, we should never see
+// a later XCon A/B/C/D, because C would no longer be a bottom-XPt. It
+// doesn't seem like this should ever happen, but it's hard to know for
+// sure.
+// [XXX: if main is recursive, you could imagine getting main/A,
+// then main/main/A...]
+// [XXX: actually, not true -- the curr_szB wouldn't be mucked up.
+//
+//---------------------------------------------------------------------------
+
// Want at least clo_depth non-alloc-fn entries in the snapshot.
// However, because we have 1 or more (an unknown number, at this point)
// alloc-fns ignored, we overestimate the size needed for the stack
@@ -883,45 +912,6 @@
static Census censi[MAX_N_CENSI];
static UInt curr_census = 0; // Points to where next census will go.
-static UInt get_xtree_size(XPt* xpt, UInt ix)
-{
- UInt i;
-
- // If no memory allocated at all, nothing interesting to record.
- if (alloc_xpt->curr_szB == 0) return 0;
-
- // Ignore sub-XTrees that account for a miniscule fraction of current
- // allocated space.
- if (xpt->curr_szB / (double)alloc_xpt->curr_szB > 0.002) {
- ix++;
-
- // Count all (non-zero) descendent XPts
- for (i = 0; i < xpt->n_children; i++)
- ix = get_xtree_size(xpt->children[i], ix);
- }
- return ix;
-}
-
-static
-UInt do_space_snapshot(XPt xpt[], XTreeSnapshot xtree_snapshot, UInt ix)
-{
- UInt i;
-
- // Structure of this function mirrors that of get_xtree_size().
-
- if (alloc_xpt->curr_szB == 0) return 0;
-
- if (xpt->curr_szB / (double)alloc_xpt->curr_szB > 0.002) {
- xtree_snapshot[ix].xpt = xpt;
- xtree_snapshot[ix].space = xpt->curr_szB;
- ix++;
-
- for (i = 0; i < xpt->n_children; i++)
- ix = do_space_snapshot(xpt->children[i], xtree_snapshot, ix);
- }
- return ix;
-}
-
static UInt ms_interval;
static UInt do_every_nth_census = 30;
@@ -1593,3 +1583,4 @@
/*--- end ---*/
/*--------------------------------------------------------------------*/
+
|
|
From: <sv...@va...> - 2007-03-27 07:43:54
|
Author: njn
Date: 2007-03-27 08:43:51 +0100 (Tue, 27 Mar 2007)
New Revision: 6677
Log:
update
Modified:
trunk/docs/internals/roadmap.txt
Modified: trunk/docs/internals/roadmap.txt
===================================================================
--- trunk/docs/internals/roadmap.txt 2007-03-27 07:41:33 UTC (rev 6676)
+++ trunk/docs/internals/roadmap.txt 2007-03-27 07:43:51 UTC (rev 6677)
@@ -9,17 +9,34 @@
-----------------------------------------------------------------------------
3.3.0
-----------------------------------------------------------------------------
-Scheduled for 2007.
+Scheduled for mid-to-late 2007?
* Add ppc{32,64}/AIX5 support [Done by Julian]
* Add some more experimental tools?
+* Rework Massif [Nick]
+
-----------------------------------------------------------------------------
+3.2.4
+-----------------------------------------------------------------------------
+Scheduled for mid-2007?
+
+-----------------------------------------------------------------------------
+3.2.3
+-----------------------------------------------------------------------------
+[Was released on Jan 29, 2007]
+
+-----------------------------------------------------------------------------
+3.2.2
+-----------------------------------------------------------------------------
+[Was released on Jan 22, 2007]
+
+-----------------------------------------------------------------------------
3.2.1
-----------------------------------------------------------------------------
Scheduled for Dec 06?
-Fix bugs in 3.2.0.
+[Was released on 16 Sep, 2006]
-----------------------------------------------------------------------------
3.2.0
|
|
From: <sv...@va...> - 2007-03-27 07:41:33
|
Author: njn Date: 2007-03-27 08:41:33 +0100 (Tue, 27 Mar 2007) New Revision: 6676 Log: Augment a comment. Modified: trunk/include/pub_tool_mallocfree.h Modified: trunk/include/pub_tool_mallocfree.h =================================================================== --- trunk/include/pub_tool_mallocfree.h 2007-03-27 07:05:31 UTC (rev 6675) +++ trunk/include/pub_tool_mallocfree.h 2007-03-27 07:41:33 UTC (rev 6676) @@ -33,6 +33,8 @@ #define __PUB_TOOL_MALLOCFREE_H // These can be for allocating memory used by tools. +// Nb: the allocators *always succeed* -- they never return NULL (Valgrind +// will abort if they can't allocate the memory). extern void* VG_(malloc) ( SizeT nbytes ); extern void VG_(free) ( void* p ); extern void* VG_(calloc) ( SizeT n, SizeT bytes_per_elem ); |
|
From: <sv...@va...> - 2007-03-27 07:05:34
|
Author: njn
Date: 2007-03-27 08:05:31 +0100 (Tue, 27 Mar 2007)
New Revision: 6675
Log:
- Added some notes at the top, and in other places.
- Increased the max number of alloc-fns, and some imperfections in how they
are recorded (bug 142491). Also printing them with --verbose.
- Changed default --depth from 3 to 8
Modified:
branches/MASSIF2/massif/ms_main.c
Modified: branches/MASSIF2/massif/ms_main.c
===================================================================
--- branches/MASSIF2/massif/ms_main.c 2007-03-27 06:46:03 UTC (rev 6674)
+++ branches/MASSIF2/massif/ms_main.c 2007-03-27 07:05:31 UTC (rev 6675)
@@ -28,9 +28,56 @@
The GNU General Public License is contained in the file COPYING.
*/
+//---------------------------------------------------------------------------
// XXX:
-// - separate content from presentation by dumping all results to a file and
-// then post-processing with a separate program, a la Cachegrind?
+//---------------------------------------------------------------------------
+// Separate content from presentation by dumping all results to a file and
+// then post-processing with a separate program, a la Cachegrind?
+// - work out the file format
+// - allow two decimal places in percentages (Kirk Johnson says people want
+// it)
+// - allow truncation of long fnnames if the exact line number is
+// identified?
+//
+// Examine and fix bugs on bugzilla:
+// IGNORE:
+// 112163 nor MASSIF crashed with signal 7 (SIGBUS) after running 2 days
+// - weird, crashes in VEX, ignore
+// 82871 nor Massif output function names too short
+// - on .ps graph, now irrelevant, ignore
+// 129576 nor Massif loses track of memory, incorrect graphs
+// - dunno, hard to reproduce, ignore
+// 132132 nor massif --format=html output does not do html entity escaping
+// - only for HTML output, irrelevant, ignore
+//
+// FIXED:
+// 142197 nor massif tool ignores --massif:alloc-fn parameters in .valg...
+// - fixed in trunk
+// 142491 nor Maximise use of alloc_fns array
+// - addressed, using the patch (with minor changes) from the bug report
+//
+// TODO:
+// 89061 cra Massif: ms_main.c:485 (get_XCon): Assertion `xpt->max_chi...
+// 141631 nor Massif: percentages don't add up correctly
+// 142706 nor massif numbers don't seem to add up
+// 143062 cra massif crashes on app exit with signal 8 SIGFPE
+// - occurs with no allocations -- ensure that case works
+//
+// Work out when to take periodic snapshots.
+// - If I separate content from presentation I don't have to thin out the
+// old ones (but not doing so takes space...)
+//
+// Work out how to take the peak.
+// - exact peak, or within a certain percentage?
+// - include the stack? makes it harder
+//
+// Michael Meeks:
+// - wants an interactive way to request a dump (callgrind_control-style)
+// - "profile now"
+// - "show me the extra allocations from last-snapshot"
+// - "start/stop logging" (eg. quickly skip boring bits)
+//
+//---------------------------------------------------------------------------
// Memory profiler. Produces a graph, gives lots of information about
// allocation contexts, in terms of space.time values (ie. area under the
@@ -343,10 +390,10 @@
// Current directory at startup.
static Char base_dir[VKI_PATH_MAX];
-#define MAX_ALLOC_FNS 32 // includes the builtin ones
+#define MAX_ALLOC_FNS 128 // includes the builtin ones
// First few filled in, rest should be zeroed. Zero-terminated vector.
-static UInt n_alloc_fns = 11;
+static UInt n_alloc_fns = 10;
static Char* alloc_fns[MAX_ALLOC_FNS] = {
"malloc",
"operator new(unsigned)",
@@ -370,7 +417,7 @@
static Bool clo_heap = True;
static UInt clo_heap_admin = 8;
static Bool clo_stacks = True;
-static Bool clo_depth = 3;
+static Bool clo_depth = 8;
static Bool ms_process_cmd_line_option(Char* arg)
{
@@ -381,12 +428,21 @@
else VG_BNUM_CLO(arg, "--depth", clo_depth, 1, MAX_DEPTH)
else if (VG_CLO_STREQN(11, arg, "--alloc-fn=")) {
- alloc_fns[n_alloc_fns] = & arg[11];
- n_alloc_fns++;
+ int i;
+
+ // Check first if the function is already present.
+ for (i = 0; i < n_alloc_fns; i++) {
+ if ( VG_STREQ(alloc_fns[i], & arg[11]) )
+ return True;
+ }
+ // Abort if we reached the limit.
if (n_alloc_fns >= MAX_ALLOC_FNS) {
VG_(printf)("Too many alloc functions specified, sorry");
VG_(err_bad_option)(arg);
}
+ // Ok, add the function.
+ alloc_fns[n_alloc_fns] = & arg[11];
+ n_alloc_fns++;
}
else
@@ -401,7 +457,7 @@
" --heap=no|yes profile heap blocks [yes]\n"
" --heap-admin=<number> average admin bytes per heap block [8]\n"
" --stacks=no|yes profile stack(s) [yes]\n"
-" --depth=<number> depth of contexts [3]\n"
+" --depth=<number> depth of contexts [8]\n"
" --alloc-fn=<name> specify <fn> as an alloc function [empty]\n"
);
VG_(replacement_malloc_print_usage)();
@@ -557,7 +613,10 @@
while (True) {
if (nC == xpt->n_children) {
// not found, insert new XPt
- tl_assert(xpt->max_children != 0);
+ // XXX: assertion can fail (eg. bug 89061). Apparently caused
+ // by getting an IP in the stack trace that is ~0 (eg.
+ // 0xffffffff).
+ tl_assert(xpt->max_children != 0);
tl_assert(xpt->n_children <= xpt->max_children);
// Expand 'children' if necessary
if (xpt->n_children == xpt->max_children) {
@@ -1281,15 +1340,14 @@
#endif
// Nb: uses a static buffer, each call trashes the last string returned.
-static Char* make_perc(ULong spacetime, ULong total_spacetime)
+static Char* make_perc(ULong x, ULong y)
{
static Char mbuf[32];
-// UInt p = 10;
- tl_assert(0 != total_spacetime);
-// percentify(spacetime * 100 * p / total_spacetime, p, 5, mbuf);
// XXX: I'm not confident that VG_(percentify) works as it should...
- VG_(percentify)(spacetime, total_spacetime, 1, 5, mbuf);
+ VG_(percentify)(x, y, 1, 5, mbuf);
+ // XXX: this is bogus if the denominator was zero -- resulting string is
+ // something like "0 --%")
if (' ' == mbuf[0]) mbuf[0] = '0';
return mbuf;
}
@@ -1470,6 +1528,14 @@
static void ms_post_clo_init(void)
{
+ Int i;
+ if (VG_(clo_verbosity) > 1) {
+ VG_(message)(Vg_DebugMsg, "alloc-fns:");
+ for (i = 0; i < n_alloc_fns; i++) {
+ VG_(message)(Vg_DebugMsg, " %d: %s", i, alloc_fns[i]);
+ }
+ }
+
ms_interval = 1;
// We don't take a census now, because there's still some core
|
|
From: <sv...@va...> - 2007-03-27 06:46:06
|
Author: njn
Date: 2007-03-27 07:46:03 +0100 (Tue, 27 Mar 2007)
New Revision: 6674
Log:
clarify comment
Modified:
trunk/include/pub_tool_tooliface.h
Modified: trunk/include/pub_tool_tooliface.h
===================================================================
--- trunk/include/pub_tool_tooliface.h 2007-03-26 23:53:25 UTC (rev 6673)
+++ trunk/include/pub_tool_tooliface.h 2007-03-27 06:46:03 UTC (rev 6674)
@@ -368,7 +368,9 @@
/* Tool defines its own command line options? */
extern void VG_(needs_command_line_options) (
// Return True if option was recognised. Presumably sets some state to
- // record the option as well.
+ // record the option as well. Nb: tools can assume that the argv will
+ // never disappear. So they can, for example, store a pointer to a string
+ // within an option, rather than having to make a copy.
Bool (*process_cmd_line_option)(Char* argv),
// Print out command line usage for options for normal tool operation.
|
|
From: Kumar R. <kum...@gm...> - 2007-03-27 05:27:16
|
Hi, I am a newbie to Valgrind, so my knowledge is limited there. But have handled similar issues in other tools. One way to handle delay slots is by unraveling the instruction, ie in the translated set, move the delay slot above the branch and insert a NOP in the delay slot. Of course in trickier platforms like PARISC (in which the delay slot will be conditionally executed, based on how the branch instruction gets evaluated) unraveling is more complex than just moving the instruction up. Not sure if MIPS has such conditional evaluation of delay slots. Hope this helps. :-) (alias) Kumar Kumar Rangarajan http://www.s7solutions.com Porting and Migration Solutions for the World. From India. -- The problem with winning the rat race is that, even if u win, u r still a rat! On 3/26/07, ghita aurelian <aur...@ya...> wrote: > Hello! > > I have a project at the university and I have to adapt libvex to work with > MIPS. My job is writing the guest part. I don't know exactly how to > translate mips branch and > jump instructions (because of their tricky "delay slot"). I want to know if > it is ok to treat both jump/branch instruction and the instruction in the > delay slot as a single big instruction (and raport in DisResult a single > instruction with length = 8 bytes). What if I have a jump to the instruction > in the delay slot ? > > Aur > > ________________________________ > Need Mail bonding? > Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users. > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > > |
|
From: Tom H. <th...@cy...> - 2007-03-27 02:31:35
|
Nightly build on alvis ( i686, Red Hat 7.3 ) started at 2007-03-27 03:15:03 BST Results unchanged from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 256 tests, 27 stderr failures, 1 stdout failure, 0 posttest failures == memcheck/tests/addressable (stderr) memcheck/tests/badjump (stderr) memcheck/tests/describe-block (stderr) memcheck/tests/erringfds (stderr) memcheck/tests/leak-0 (stderr) memcheck/tests/leak-cycle (stderr) memcheck/tests/leak-pool-0 (stderr) memcheck/tests/leak-pool-1 (stderr) memcheck/tests/leak-pool-2 (stderr) memcheck/tests/leak-pool-3 (stderr) memcheck/tests/leak-pool-4 (stderr) memcheck/tests/leak-pool-5 (stderr) memcheck/tests/leak-regroot (stderr) memcheck/tests/leak-tree (stderr) memcheck/tests/long_namespace_xml (stderr) memcheck/tests/match-overrun (stderr) memcheck/tests/partial_load_dflt (stderr) memcheck/tests/partial_load_ok (stderr) memcheck/tests/partiallydefinedeq (stderr) memcheck/tests/pointer-trace (stderr) memcheck/tests/sigkill (stderr) memcheck/tests/stack_changes (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/x86/scalar_supp (stderr) memcheck/tests/x86/xor-undef-x86 (stderr) memcheck/tests/xml1 (stderr) none/tests/mremap (stderr) none/tests/mremap2 (stdout) |
|
From: Tom H. <th...@cy...> - 2007-03-27 02:30:33
|
Nightly build on lloyd ( x86_64, Fedora Core 3 ) started at 2007-03-27 03:05:06 BST Results differ from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 291 tests, 6 stderr failures, 2 stdout failures, 0 posttest failures == memcheck/tests/pointer-trace (stderr) memcheck/tests/stack_switch (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/x86/scalar_supp (stderr) memcheck/tests/xml1 (stderr) none/tests/mremap (stderr) none/tests/mremap2 (stdout) none/tests/tls (stdout) ================================================= == Results from 24 hours ago == ================================================= Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 291 tests, 6 stderr failures, 3 stdout failures, 0 posttest failures == memcheck/tests/pointer-trace (stderr) memcheck/tests/stack_switch (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/x86/scalar_supp (stderr) memcheck/tests/xml1 (stderr) none/tests/mremap (stderr) none/tests/mremap2 (stdout) none/tests/pth_detached (stdout) none/tests/tls (stdout) ================================================= == Difference between 24 hours ago and now == ================================================= *** old.short Tue Mar 27 03:18:09 2007 --- new.short Tue Mar 27 03:30:27 2007 *************** *** 8,10 **** ! == 291 tests, 6 stderr failures, 3 stdout failures, 0 posttest failures == memcheck/tests/pointer-trace (stderr) --- 8,10 ---- ! == 291 tests, 6 stderr failures, 2 stdout failures, 0 posttest failures == memcheck/tests/pointer-trace (stderr) *************** *** 16,18 **** none/tests/mremap2 (stdout) - none/tests/pth_detached (stdout) none/tests/tls (stdout) --- 16,17 ---- |
|
From: Tom H. <th...@cy...> - 2007-03-27 02:23:23
|
Nightly build on dellow ( x86_64, Fedora Core 6 ) started at 2007-03-27 03:10:04 BST Results differ from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 291 tests, 4 stderr failures, 1 stdout failure, 0 posttest failures == memcheck/tests/pointer-trace (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/xml1 (stderr) none/tests/mremap (stderr) none/tests/mremap2 (stdout) ================================================= == Results from 24 hours ago == ================================================= Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 291 tests, 4 stderr failures, 2 stdout failures, 0 posttest failures == memcheck/tests/pointer-trace (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/xml1 (stderr) none/tests/mremap (stderr) none/tests/mremap2 (stdout) none/tests/pth_detached (stdout) ================================================= == Difference between 24 hours ago and now == ================================================= *** old.short Tue Mar 27 03:16:47 2007 --- new.short Tue Mar 27 03:23:16 2007 *************** *** 8,10 **** ! == 291 tests, 4 stderr failures, 2 stdout failures, 0 posttest failures == memcheck/tests/pointer-trace (stderr) --- 8,10 ---- ! == 291 tests, 4 stderr failures, 1 stdout failure, 0 posttest failures == memcheck/tests/pointer-trace (stderr) *************** *** 14,16 **** none/tests/mremap2 (stdout) - none/tests/pth_detached (stdout) --- 14,15 ---- |
|
From: Tom H. <th...@cy...> - 2007-03-27 02:14:49
|
Nightly build on gill ( x86_64, Fedora Core 2 ) started at 2007-03-27 03:00:03 BST Results unchanged from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 293 tests, 6 stderr failures, 1 stdout failure, 0 posttest failures == memcheck/tests/pointer-trace (stderr) memcheck/tests/stack_switch (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/x86/scalar_supp (stderr) none/tests/fdleak_fcntl (stderr) none/tests/mremap (stderr) none/tests/mremap2 (stdout) |
|
From: <js...@ac...> - 2007-03-27 00:09:04
|
Nightly build on g5 ( SuSE 10.1, ppc970 ) started at 2007-03-27 02:00:01 CEST Results differ from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 226 tests, 6 stderr failures, 2 stdout failures, 0 posttest failures == memcheck/tests/deep_templates (stdout) memcheck/tests/leak-cycle (stderr) memcheck/tests/leak-tree (stderr) memcheck/tests/pointer-trace (stderr) none/tests/faultstatus (stderr) none/tests/fdleak_cmsg (stderr) none/tests/mremap (stderr) none/tests/mremap2 (stdout) ================================================= == Results from 24 hours ago == ================================================= Checking out valgrind source tree ... failed Last 20 lines of verbose log follow echo Checking out valgrind source tree ... svn co svn://svn.valgrind.org/valgrind/trunk -r {2007-03-26T02:00:01} valgrind svn: Unknown hostname 'svn.valgrind.org' ================================================= == Difference between 24 hours ago and now == ================================================= *** old.short Tue Mar 27 02:00:47 2007 --- new.short Tue Mar 27 02:09:01 2007 *************** *** 1,7 **** ! Checking out valgrind source tree ... failed ! Last 20 lines of verbose log follow echo - Checking out valgrind source tree ... svn co svn://svn.valgrind.org/valgrind/trunk -r {2007-03-26T02:00:01} valgrind - svn: Unknown hostname 'svn.valgrind.org' --- 1,18 ---- ! Checking out valgrind source tree ... done ! Configuring valgrind ... done ! Building valgrind ... done ! Running regression tests ... failed ! Regression test results follow ! ! == 226 tests, 6 stderr failures, 2 stdout failures, 0 posttest failures == ! memcheck/tests/deep_templates (stdout) ! memcheck/tests/leak-cycle (stderr) ! memcheck/tests/leak-tree (stderr) ! memcheck/tests/pointer-trace (stderr) ! none/tests/faultstatus (stderr) ! none/tests/fdleak_cmsg (stderr) ! none/tests/mremap (stderr) ! none/tests/mremap2 (stdout) |