|
From: Julian S. <js...@ac...> - 2006-10-24 20:27:25
|
> > hash = (hash << 61) | (hash >> 3);
> >
> > leads to a better distribution of the values and better runtime for the
> > testcase. However I do not know whether rotation by 3 really leads to a
> > good hash function for 64 bit.
>
> Yeh, rotating by 3 is stupid and doesn't mix the bits well for short
> stack traces. I just tried this, which mixes much better:
Below is my final patch. It works well (it seems) for both 32-bit
and 64-bit versions of your test program and contains a second
heuristic hack to guard against bad-case behaviour. Can you let me
know if it solves the problem convincingly? If so I will commit it.
J
Index: coregrind/m_execontext.c
===================================================================
--- coregrind/m_execontext.c (revision 6340)
+++ coregrind/m_execontext.c (working copy)
@@ -185,6 +185,13 @@
on the returned ExeContext* values themselves. Inspired by Hugs's
Text type.
*/
+static inline UWord ROLW ( UWord w, Int n )
+{
+ Int bpw = 8 * sizeof(void*);
+ w = (w << n) | (w >> (bpw-n));
+ return w;
+}
+
ExeContext* VG_(record_ExeContext) ( ThreadId tid )
{
Int i;
@@ -194,7 +201,10 @@
ExeContext* new_ec;
ExeContext* list;
UInt n_ips;
+ ExeContext *prev2, *prev;
+ static UInt ctr = 0;
+
init_ExeContext_storage();
vg_assert(VG_(clo_backtrace_size) >= 1 &&
VG_(clo_backtrace_size) <= VG_DEEPEST_BACKTRACE);
@@ -208,15 +218,19 @@
hash = 0;
for (i = 0; i < n_ips; i++) {
hash ^= ips[i];
- hash = (hash << 29) | (hash >> 3);
+ hash = ROLW(hash, 19);
}
+ if (0) VG_(printf)("hash 0x%lx ", hash);
hash = hash % N_EC_LISTS;
+ if (0) VG_(printf)("%lu\n", hash);
/* And (the expensive bit) look a matching entry in the list. */
ec_searchreqs++;
- list = ec_list[hash];
+ prev2 = NULL;
+ prev = NULL;
+ list = ec_list[hash];
while (True) {
if (list == NULL) break;
@@ -229,11 +243,22 @@
}
}
if (same) break;
- list = list->next;
+ prev2 = prev;
+ prev = list;
+ list = list->next;
}
if (list != NULL) {
- /* Yay! We found it. */
+ /* Yay! We found it. Once every 8 searches, move it one step
+ closer to the start of the list to make future searches
+ cheaper. */
+ if (prev2 && prev && 0 == ((ctr++) & 7)) {
+ vg_assert(prev2->next == prev);
+ vg_assert(prev->next == list);
+ prev2->next = list;
+ prev->next = list->next;
+ list->next = prev;
+ }
return list;
}
|
|
From: Christoph B. <bar...@or...> - 2006-10-24 21:08:13
|
Am Dienstag, 24. Oktober 2006 22:27 schrieb Julian Seward: > > > hash = (hash << 61) | (hash >> 3); > > > > > > leads to a better distribution of the values and better runtime for the > > > testcase. However I do not know whether rotation by 3 really leads to a > > > good hash function for 64 bit. > > > > Yeh, rotating by 3 is stupid and doesn't mix the bits well for short > > stack traces. I just tried this, which mixes much better: > > Below is my final patch. It works well (it seems) for both 32-bit > and 64-bit versions of your test program and contains a second > heuristic hack to guard against bad-case behaviour. Can you let me > know if it solves the problem convincingly? If so I will commit it. The patch works well for the testprogramm. Tomorrow I will be able to report whether it works for the real application. BTW, I see that valgrind is only compiled with -O -g and not -O2 -g. Is there a reason? Christoph |
|
From: Nicholas N. <nj...@cs...> - 2006-10-24 21:10:02
|
On Tue, 24 Oct 2006, Christoph Bartoschek wrote: > BTW, I see that valgrind is only compiled with -O -g and not -O2 -g. Is there > a reason? -O2 doesn't give improvements, IIRC. A lot of Valgrind time is in Valgrind-generated code. Although some parts of Valgrind are compiled at -O2 (memcheck? Vex?) Nick |
|
From: Christoph B. <bar...@or...> - 2006-10-25 08:12:44
|
Am Dienstag, 24. Oktober 2006 22:27 schrieb Julian Seward: > > > hash = (hash << 61) | (hash >> 3); > > > > > > leads to a better distribution of the values and better runtime for the > > > testcase. However I do not know whether rotation by 3 really leads to a > > > good hash function for 64 bit. > > > > Yeh, rotating by 3 is stupid and doesn't mix the bits well for short > > stack traces. I just tried this, which mixes much better: > > Below is my final patch. It works well (it seems) for both 32-bit > and 64-bit versions of your test program and contains a second > heuristic hack to guard against bad-case behaviour. Can you let me > know if it solves the problem convincingly? If so I will commit it. The main application runs faster now. One run of the algorithm: Old valgrind: 144765 seconds New valgrind: 17232 seconds Without valgrind: 251 seconds Although this is nearly ten times faster than the old version it is still quite slow for me with a slowdown factor of 68x. But I guess that memcheck cannot run faster. Thanks Christoph Bartoschek |
|
From: Julian S. <js...@ac...> - 2006-10-25 11:11:58
|
> Old valgrind: 144765 seconds > New valgrind: 17232 seconds > Without valgrind: 251 seconds > > Although this is nearly ten times faster than the old version it is still > quite slow for me with a slowdown factor of 68x. But I guess that memcheck > cannot run faster. 68:1 is right on the outer limits of what we expect. It would be typical of a program that did malloc/free continuously, but no useful work (eg perf/heap.c). More typically on amd64 we would expect 20-30x slowdown. So, send OProfile data for a run of the new valgrind and let's see if there is any other obvious bad thing visible. J |
|
From: Christoph B. <bar...@or...> - 2006-10-25 13:38:33
Attachments:
profile.bz2
|
Am Mittwoch, 25. Oktober 2006 13:11 schrieb Julian Seward: > > So, send OProfile data for a run of the new valgrind and let's see > if there is any other obvious bad thing visible. > I have attached a profiling session of about 2 hours. If you would like to have a profile for another event I will provide it. I guess the first two entries are from code generated by valgrind. They take about 51% of the runtime. Does this mean that the generated code is 34 times slower than the native one? Greetings Christoph |
|
From: Dallman, J. <jg...@ug...> - 2006-10-25 14:14:10
|
Christoph Bartoschek [bar...@or...] wrote: > I have attached a profiling session of about 2 hours. Please don't send profile sessions to the whole list!=20 We don't all need it. thanks, --=20 John Dallman, Parasolid Porting Engineer, +44-1223-371554=20 |
|
From: Dallman, J. <jg...@ug...> - 2006-10-25 14:51:34
|
John Dallman wrote:=20 > Christoph Bartoschek [bar...@or...] wrote: > > I have attached a profiling session of about 2 hours. > Please don't send profile sessions to the whole list!=20 > We don't all need it. Ignore me - well, mostly. The mailserver here stripped the=20 attachments, claiming they were ludicrously huge. Sending=20 huge attachments would be anti-social, but the mailserver was, in fact, wrong on this point.=20 --=20 John Dallman, Parasolid Porting Engineer, +44-1223-371554=20 |
|
From: Nicholas N. <nj...@cs...> - 2006-10-25 17:27:22
|
On Wed, 25 Oct 2006, Christoph Bartoschek wrote: > I have attached a profiling session of about 2 hours. If you would like to > have a profile for another event I will provide it. > > I guess the first two entries are from code generated by valgrind. They take > about 51% of the runtime. Does this mean that the generated code is 34 times > slower than the native one? I guess so. All the C calls are expensive, because they require register saving/restoring and argument setup. The profile looks a lot more like the ones we're used to seeing, in terms of what is taking up most of the time. Nick |
|
From: Srivatsan R. <jug...@ya...> - 2006-10-25 18:30:24
|
Hi David,
Thanks for responding.
"The source to ABCParser::parseString() would be
useful to get a better
idea of what is going on."
It indeed would, but this runs to over 2000 lines,
hence it would make it extremely laborious for
someone
else to figure out whats happening.
A 2000 line method? Maybe that's a problem in its own
right.
I tried adding the CXXFlags that u had suggested and
compiled the whole package anew....but valgrind still
reports
-------------------
by 0x1B97AB4E: ABCParser::parseSmileString()
(stl_vector.h:501)
==12744== by 0x1B975D34: ABCParser::execute()
(ABCParser.cpp:101)
--------------------
as you can see, it still says that the function
ABCParser caused the invalid read, which is called in
line no 101 in ABCParser.cpp.
However it doesn't give me the exact line no in the
file where the error occured.
Is there anything else that i can try?
Thanks again for ur help
Regs
vatsan.
--- David Eriksson <da...@2g...> wrote:
> > On Fri, 2006-10-20 at 06:53 -0700, Srivatsan
> > Ramanujam wrote:
> > > Hi People,
> > >
> > > I have been trying to run valgrind to detect
leaks
> > in
> > > my backend C++ algorithm.
> > >
> > > The algorithm is created as a shared library
> > object, I
> > > have writtten a small program testAlgo.cpp as an
> > > interface to this algorithm and I am running
> > valgring
> > > on this program using
> > >
> > > make runval <filename>
> > >
> > > where runval is defined as
> > >
> > > --------------------------------------
> > > runval: testAlgo
> > > $(ENVIRON) $(MOREPATHS) valgrind
> > --leak-check=full
> > > --error-limit=no --show-reachable=no ./testAlgo
> > $(FN)
> > > --------------------------------------
> > >
> > > [snip]
> >
> > The source to ABCParser::parseString() would be
> > useful to get a better
> > idea of what is going on.
> >
> > Maybe you should try to add the below defines to
the
> > CXXFLAGS for the
> > whole project, and rebuild everything. That should
> > add validations to
> > certain STL operations and maybe that will
pinpoint
> > the error better
> > than valgrind:
> >
> > -D_GLIBCXX_DEBUG -D_GLIBCXX_DEBUG_PEDANTIC
> >
> >
> > \David
> >
> >
>
"MAN'S EGO IS THE FOUNTAINHEAD OF HUMAN PROGRESS"
Ayn Rand.
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
|